SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
A Survey of Indexing Techniques for Sparse Matrices
UDO W. POOCH, AND AL NIEDER
Texas A & M Umversily,* College Statwn, Texas

A sparse matrix is defined to be a matrix containing a high proportion of elements that
are zeros. Sparse matrices of large order are of great interest and application in science
and industry; for example, electrical networks, structural engineering, power
distribution, reactor diffusion, and solutions to differential equations
While conclusions within this paper are primarily drawn considering orders of
greater than 1000, much ~s applicable to sparse matrices of smaller orders in the
hundreds.
Because of increasing use of large order sparse matrices and the tendency to
attempt to solve larger order problems, great attention must be focused on core
storage and execution time Every effort should be made to optimize both computer
memory allocation and executmn times, as these are the limiting factors that most
often dictate the practicahty of solving a given problem
Indexing algorithms are the subject of this paper, as they are generMly recognized
as the most ~mportant factor in fast and efficient processing of large order sparse
matrices.
Indexing schemes of main interest are the bit map, address map, row-column, and
the threaded list Major variations of the indexing techniques above mentioned are
noted, as well as the particular indexing scheme inherent in diagonal or band matrices.
The concluding section of the paper compares the types of methods, discusses their
suitabihty for different types of processing, and makes suggestions eoneernlng the
adaptability and flexibility of the maj or exmting methods of indexing algorithms for
application to user problems
Key Words and Phrases: Matrix, sparse matrix, matrix manipulation, indexing.

CR Categomes: 5 14, 5 19

I. INTRODUCTION

Computations involving sparse matrices
have been of widespread use since the 1950s,
becoming increasingly popular with the advent of faster cycle times and larger computer memories. One cycle time is the time
required for the central processing unit to
send and to receive a data signal from main
memory. Systems applications for sparse
matrices include electrical networks and
power distribution, structural engineering,
reactor diffusion, and solutions to differentim equations.
A sparse matrix is a matrix having few
nonzero elements. Matrix density is defined
as the number of nonzero elements of the
* D e p a r t m e n t of Industrial Engineering.

matrix divided by the total number of elements in the full matrix. Most available references utilizing sparse matrices for calculations [1-8] consider matrices of order 50, or
more [9, 10], with densities ranging from 15 %
to 25 % and decreasing steadily as the order
increases. This paper will accept these
boundary conditions as a strict definition of
a sparse matrix. Brayton, Gustavson, and
Willoughby [8] say that a typical large (implied to be in the hundreds) order sparse
matrix has 2 to 10 nonzero entries per row.
Hays [5] says that an average of 20 nonzero
elements per row is not an unreasonably
small number in quite large (implied to be
around 100 and greater) order. Livesley [1]
indicates that an average of 3 or 4 elements

Computing Surveys, Vol. 5, No. 2, June 1973
110

•

U. W. Pooch and A. Nieder
CONTENTS

I Introduction
II Bit Map Scheme
III Address Map Scheme
IV Row-Column Scheme
V Threaded List Scheme
¥I Diagonal or Band Indexing Scheme
VII Conclusion
Appendix A
Algorithm 1 Bit Map Scheme
Algorithm 2 Address Map Scheme
Algorithm 3. Address Map Scheme
Bibliography

109-112
112-114
114-116
116-119
119-122
122-123
123-127
127-132

132-133

C o p y r i g h t (~ 1973, A s s o c ~ a t m n for C o m p u t i n g
M a c h i n e r y , Inc. G e n e r a l p e r m i s s i o n to r e p u b h s h ,
b u t n o t for profit, all or p a r t of t h i s m a t e r i a l is
g r a n t e d , p r o v i d e d t h a t A C M ' s c o p y r i g h t n o t i c e is
g i v e n a n d t h a t r e f e r e n c e is m a d e to thJs p u b l i c a tion, to i t s d a t e of ~.ssue, a n d to t h e f a c t t h a t rep r m L i n g p r i v i l e g e s were g r a n t e d b y p e r m i s s i o n of
t h e A s s o c m t m n for C o m p u t i n g M a c h i n e r y .

Computing Sulvevs, Vol 5, No 2, June 1973

per row in a large (implied to be around
1000) order structural problem is a good
estimate.
If the order I of the matrix is reasonably
small, i.e., about order 50 or less, it would
make little difference if the full matrix were
kept in core. However, if the sparse matrix
is of larger order than about 50, it becomes
efficient in terms of execution time and core
allocation to store only the nonzero entries
of the matrix.
The efficiency of retaining only the nonzero elements becomes obvious in the exampie of a 500 X 500 matrix with 10 % density.
With one word of storage allocated for each
element, the matrix requires 250,000 words,
which is very often more than is physically
available. Storing only the nonzero elements
requires 25,000 words. If the full matrix were
multiplied by a similar full matrix a minimum of 500 X 500 X 500 = 125 X 106
arithmetic operations are required, compared
to a minimum of (500 X 10 %)3 = 125 X 103
arithmetic operations when only the nonzero
elements are retained. If both 500 X 500
matrices were to be retained in core as full
matrices, core allocation and execution time
would be prohibitive on many computers,
and the problem would be abandoned as infeasible for computer solution.
By storing the nonzero elements in some
reasonable manner, and using logical operations to decide when arithmetic operations
are necessary, Brayton, et al. [8] relate that
both the storage requirements and the required amount of arithmetic can often, in
practice, be decreased by a factor of I over
the full matrix.
Sparse matrices are classified generally by
the arrangement of the nonzero elements.
When the matrix is in random form, nonzero
elements appear in no specific pattern. A
matrix is said to be a band matrix, or in band
form, if its elements a~.~ = 0 for [ i - j I > m
(where m is a small integer, and usually
m ~ I) and where the nonzero elements form
a band along the mam diagonal. The band
width is the number of nonzero elements
that appear in one row of a band matrix
(i.e., 2m ~- 1). A block-diagonal form occurs
when submatrices of nonzero elements appear along the matrix diagonal. In block
Indexing Techniques for Sparse Matrices
form, the matrix has submatrices of nonzero
elements that occur in no specific pattern
throughout the full matrix. The block dimension is the order of a submatrix in a block or
block-diagonal matrix.
In electrical network and power distribution problems, the matrix is generally in
random, band, or block-diagonal form, with
the elements representing circuit voltages,
currents, impedances, power sources, or users
[9-10]; in structural engineering applications,
the sparse matrix is generally of band or
block form, with the band width or block
dimension representing the number of joints
per floor [3, 11]; in reactor diffusion problems
and differential equations, the band form of
matrix is most common, with the band width
being the number of points used in a pointdifference formula [12-14].
This paper, while not concerned with the
actual mathematical manipulations of sparse
matrices, is primarily concerned with the indexing algorithms employed in such calculations. If the sparse matrix is stored in a haphazard manner, elements can only be retrieved by a search of all the data, which
takes much time. If the sparse matrix is
stored in some very convenient form, execution time will be much less. Conservation of
execution time is of major importance in
selecting an indexing algorithm.
Another major consideration in selecting
a particular indexing method is the amount
of fast core the method requires in addition
to that used for the storage of the nonzero
data elements. For most applications, a small
difference in core allocation between two
methods is not a critical factor. In this case,
the critical consideration is the execution
time difference between the two methods.
Since execution times vary greatly with the
methods of indexing, an exact comparison of
execution times must reflect the type of
mathematical manipulation that is to be
performed on the sparse matrix.
One last major aspect of indexing algorithm selection concerns the adaptability
and flexibility of programming the selected
scheme. This depends in great part on the
type of machine, business or scientific; machine configuration; operating system capabilities; number of bits per word; access

•

111

times for peripheral devices; average instruction times; availability of the required instructions; the maximum row or column size
to be used; the expected matrix density; and
the availability and size of buffers.
As with most applications, the use of a
high-level programming language may provide relative ease of implementation for a
selected indexing scheme, but such use is frequently accompanied by penalties in execution time and storage requirements. However, on the positive side, use of high-level
languages may well result in a minimum of
elapsed time for problem solution with a
given programming staff, as well as overall
minimum cost, considering both personnel
and computer usage. Problems involving
large order sparse matrices focus their attention on core storage utilization and execution
time minimization, and therefore all but
eliminate the employment of high-level languages for indexing schemes.
In subsequent sections of this paper, current indexing schemes will be examined in an
attempt to isolate a "fast" indexing algorithm, with "fast" being defined as producing an optimization of execution time and
core storage for sparse matrices of large order. Particular advantages and disadvantages of each major type of indexing discussed will be brought to the attention of
the reader. Parts II through VI discuss aspects of particular indexing schemes, while
Part VII compares the requirements and advantages of the various schemes. Part VII, in
conclusion, also makes recommendations
concerning the adaptability and flexibility of
the major existing indexing algorithms for
application to user problems.
The authors have attempted, as much as
possible, to make their discussions machine
independent. However, the authors made use
of an IBM System 360/65 Model I in their
research and certain basic aspects of this machine, such as the 32-bit word, are alluded to
in the succeeding pages. The interested
reader should have httle difficulty in adapting the concepts presented to machines of
differing architecture.

Computing Surveys, Vol 5, N o

2, June 1973
112

U. W. Pooch and A. Nieder

•

II. BIT MAP SCHEME

I 0100 1010 I 1001 0101 I . . . . . . .

I n a bit m a p scheme, a Boolean form of the
matrix M is the basic indexing reference.
Whenever a nonzero entry occurs in the
sparse matrix, a 1 bit is placed in the bit map,
with null entries remaining as zeros in the
bit map. The position of each successive nonzero entry is found by counting over to the
next 1 bit in the map.
More rapid access to any element of a row
is achieved b y providing an additional row
index vector, where each element of t h a t
vector is the address of the first nonzero elem e n t of each row [16]. An additional column
index vector m a y also be applied for a more
rapid column access, but this will also necessitate storing each nonzero entry twice. I t
should be noted, however, t h a t any machine
based on word, rather t h a n bit, addressing
techniques will give much slower access in
one dimension of the matrix t h a n in the
other.
As an example, the following matrix M,
and its associated bit m a p and reduced Zvector is given.
M=

BM=

05

00
10
[3,2,5,4,7,1,8]

Z--

01
00
10

Figure 1 demonstrates a sample bit m a p supplemented with the row index vector V; the
Z elements are the nonzero elements of the
matrix.
The bit m a p in Figure 1 is a matrix conception of the bit map. To conserve core, instead of using one word for each row of the
bit map, all four rows (16 bits) are cornv
v(2)

•

2

• z(2)

V(3)

•

4

, Z(4)
z(5)

V(4)

)

z(3)

) Z(6)
z(7)

R WIndex ValueIndicates
O
f l r s t nonzero
element for row

FI~

1.

,

Z-vector
value

Sample bit m a p .

Computing Surveys, Vol 5, No 2, June 1973

Bit Map

byte 1

byte 2

]

byte 3

FIG. 2. Bit map of Figure 1 in core.
pacted into one word as shown in Figure 2
with byte (8 bits) boundaries marked.
F r o m Figure 2, it is simple to see t h a t the
bit map, being the Boolean form of the matrix, will, in fast core, require at least W =
I . J / B words, where I and J are the dimensions of the matrix and B is the n u m b e r of
bits per word; W is rounded up to the nearest
integer. The bit m a p uses at m i n i m u m
Emt Map ---- (100/B) %
of the storage requirements of the full matrix
for indexing. The additional row index vector
adds W= I . A / B more words, where A is
the n u m b e r of bits required for an address.
Supplemented with the row index vector,
E•lt Map -~- R O W I n d e x

= IO0/B (1 ~- ( A / J ) ) %
of the full matrix is required for the indexing.
Now, if the sparse matrix has less t h a n
65,536 nonzero elements, then A can be 16
bits in excess 32,768 notation. I n a 32-bitword machine for example, 16 bits m a y be
conveniently accessed if the instruction set
has a complement of half-word instructions.
Attention should be given to the number of
bits required for an address to range through
the m a x i m u m core size. If this number of
bits is not conveniently manipulated, it will
be necessary to use more than the m i n i m u m
a m o u n t of core to gain an execution advantage. Execution times for full word instructions are often less t h a n execution times for
half-word instructions. Therefore, when
choosing a convenient number of bits for A,
the n u m b e r of bits used for an address, it is
i m p o r t a n t to realize the tradeoff between
core conservation and access time.
Using B = 32 bits (word length), and
A = 16 bits (half-word length), for a 500 ×
500 matrix the bit m a p and row index vector
require 8313 words, or 3.325 % of the 250,000
words for the full matrix; if the matIix is
only 5 % dense, another 12,500 words are
required for the nonzero elements; the total
is 20,813 words, or 8.325 % of the full matrix.
Indexing Techniques for Sparse Matrices
In order to reference the M,~ element, it is
necessary to physically count across to the
j t h element in the zth "row" of the bit map.
The correct bit will lie in the S~ = ((i - 1) *
J + j + (B - 1))/B word of the bit map.
To isolate the required bit, it will be necessary to either shift the word the necessary
number of bits or mask all the other bits by
a logical operation. If a shift is used, then
repeated shifts perform a row operation when
the bit map is stored by rows. Algorithm 1
(see Appendix) isolates the correct beginning
word of a row in the bit map; a segment of
the code shifts through one entire row, in
preparation for a mathematical manipulation of the row.
Algorithm 1 with slight alteration will accommodate matrices up to order 100,000.
The restriction occurs in statement 06, where
the multiplication must not result in loss of
significant bits due to exceeding word size.
In practice, the algorithm is limited either by
the index vector being half-words, as indexing is provided for only 65,536 nonzero elements; or by 4095 rows or columns, the
maximum number used in the indexing in
statement 02.
When the bit map is stored by rows, as in
the algorithm above, then to perform a column operation it is necessary to count to the
correct j bit for all I rows. This means executing virtually the entire algorithm I times.
If more than a few column operations are to
be performed, then execution time will become an important factor. The execution
time is dependent on the density of the
sparse matrix, the order of the sparse matrix,
and the number of column operations to be
performed. The time factor is exemplified by
the following:
EXAMPLE 1: A 500 X 500 m a t r i x

exists, and it is necessary to perform 10
column operations when the matrix is
5 % dense. The average column
execution time will be that of the 250th
column. Assuming the entire algorithm
is executed for each row, the execution
time will be approximately:
500 rows X 10 column operations
[(time to locate beginning of each
row)

•

113

+ .05 density X 500/2 columns X
(to process 1 bits)
+ (1 - .05 density) X 500/2 columns
X (time to process 0 bits)
+ 500/2 columns X (time to locate
bit in bit map)
+ 500/2 words X (time to locate
word in bit map)]
which is about 10 seconds on the I B M
360/65, with additional microseconds incorporated for the mathematical operation
not listed in the coding. Had the same procedure been carried out on the transpose of
the bit map, that is, the bit map is now
column-oriented instead of row-oriented,
then the execution time would have been cut
by a factor of about 500, a considerable time
savings. Not taken into consideration is any
further computer processing, such as updating an index register after each 4095
characters or bytes, if necessary.
If the bit map of the sparse matrix can be
transposed and the data rearranged in less
time than the difference between the column
and row execution times, then the transpose
operation will conserve execution time. In
the above example, the difference between
column and row execution times is about 9.7
seconds.
For certain types of operations the bit map
is ideal. Being in Boolean form, which means
elements are either 1 or 0, true or false, or
plus or minus, the bit map is the most compact form for logical operations, such as
AND, OR, or E X C L U S I V E OR. Thus, if
matrices MA and M B exist, and it is necessary to determine which elements are nonzero in both matrices, it is necessary only to
A N D each word of bit map MA with the corresponding word of bit map MB. If the result
is zero, both are not present; if the result is
nonzero, the indicated elements appear in
both matrices. An E X C L U S I V E OR determines which elements are present in either,
but not both, of the matrices; an OR determines which elements appear in either or
both of the matrices. Logical operations performed on the bit map require about 1/~2 of
the execution time for the same logical operation on the full scale matrix, because the
bit map on a 32 bit-word machine condenses
32 pieces of data into 1 word. Additionally,
Computing Surveys, Vol. 5, No 2, June 1973
114

•

U. W. Pooch and A. N~eder

and often most importantly, the bit map
conserves core storage.
To determine how many elements will be
present in the sum of two rows, and their
order, an OR is performed on the two rows
of the bit map. Using similar techniques, the
feasibility of rearranging the matrix in a
form more convenient for the user, such as
diagonal form, where nonzero elements appear all along the diagonal, is determined.
Kettler and Well [15] discuss some of the
aspects of such a rearrangement algorithm.
M a n y references are found to endorse or
suggest the use of a bit map scheme for
sparse matrices [7, 15-20], but it is particularly difficult to ascertain the exact algorithms utilized, as most authors do not
include these in their papers.
While a bit map scheme appears convenient and fast, it is restricted by the amount
of fast core available for the bit map. In the
case where the sparse matrix is less dense
than the percentage of the full matrix that
the bit map scheme occupies, core storage
will be conserved by switching to an alternative method of indexing.
Givens [21] has suggested that the bit map
scheme would be more attractive to users if
some special instructions were designed and
implemented, to further decrease execution
times. One such instruction Givens references
is C L E A R TO ZERO, which would clear a
large block of core, e.g., the bit map, from a
first to a last address. Another instruction
would be LOAD N E X T NONZERO, which
would fetch the address of the next nonzero
entry of the bit map, given the previous nonzero element, thereby eliminating the necessity of counting through all the zero bits.
These special instructions would be implemented as microprogrammed subroutines
[21]. To define a microprogram, it is necessary to understand that the execution of each
assembly language instruction involves a
specific sequence of transfers of information
from one register in the processor to another;
some of the transfers take place directly, and
some through an adder or other logical circuit. Each of these steps defines a microinstruction and the complete set of steps necessary to execute the assembly language instruction constitutes a microprogram [22].

Computing Surveys, Vol 5, No 2, June 1973

IlL ADDRESS MAP SCHEME

The address map is similar in form to the
bit map, the main difference being that the
address map stores an address or address displacement for each matrix element. If the
matrix element is zero then a zero address is
stored. The bit map requires only one bit for
each matrix element.
Since an address or address displacement
requires more than one bit for each matrix
element, the address map scheme will require
N times more core storage than the bit map
scheme, where N is the number of bits used
for an address or address displacement. If
address displacements instead of full-length
addresses are used, then the address map
must be augmented by a row index vector,
as with the bit map.
Assuming there are less than 256 nonzero
entries per row, for example, an address displacement would require only 8 bits (a
common character size). If a particular computer allows character operations that are
faster than the access time to an individual
bit map entry, the improved column access
time of the address map can warrant the
increased core expenditure. On a system with
6 bit characters, up to 64 nonzero row entries
can be accommodated.
The overall percentage storage requirement of the full matrix required for the address map with the row index vector will be
EAdd.... Map = 100/B (C + A / J ) %
where B is the number of bits per word; C is
the number of bits used for an address displacement; A is the number of bits used for
an element of the row index vector; and J is
the number of columns of the matrix. Using
C = 8 bits; A = 32 bits; B = 32 bits; and
J = 1000 columns, the address map and row
index vector require 25.1% of the full matrix, that is 251,000 words compared to 1
million for the full matrix. In addition, if the
matrix is 5% dense, an additional 50,000
words are required for the storage of the
nonzero elements.
In order to isolate the M,~ element, it is
necessary to access the S, = C / B (i -- 1).
J -t- j character (or byte). In terms of words,
S, = {C[(i -- 1). J + (j - 1)] + B } / B . )
Indexing Techniques for Sparse Matrices
where i and 3 are respectively the row and
column of interest. If the S~ character (byte)
is zero, it is a null entry; otherwise, the content of the S~ character (byte) is added to the
row index element to give the address of the
nonzero element.
The address map scheme is subject to
many of the same limitations of the bit map
scheme, and requires a larger amount of core
storage for indexing. A sample coding, Algorithm 2, which has the same characteristics
as the example used in the bit map method
(Algorithm 1) illustrates that fewer arithmetic operations than the bit map method
are required when the computer is equipped
with character addressing capabilities. If the
computer used does not allow convenient
arithmetic manipulation of individual characters, then the coding enclosed in brackets
in Algorithm 2 must be added to overcome
this difficulty. The bracketed coding requires
much of the algorithm time, so if a computer
has built-in arithmetic character manipulation, then the algorithm becomes increasingly faster.
With an example similar to Example 1, we
find that the execution time, with the bracketed coding included, is drastically different
from the bit map time. This is primarily because of the easy access to any character. To
access by column instead of by row, only the
first row location of the correct column need
be found. To find the correct location of the
character in row 2, it is sufficient to add just
the column dimension. This process is continued until the end of the matrix is encountered.
For a column manipulation, then, we easily obtain Algorithm 3, similar to Algorithm 2.
EXAMPLE 2. As in Example 1, a 500 X
500 matrix exists with 5 % density, and
it is necessary to perform 10 column
operations. It is therefore necessary to
execute Algorithm 3, 10 times, so the
execution time will be approximately
10 column operations X
[(initialization time to lobate
beginning of each row)
500 rows X (time to locate bit in
bit map)

•

115

+ (1 -- .05 density) X (time to
process 0 bits)
+ .05 density X (time to process 1
bits)]
which is about 30 msec on the I B M 360/65,
and has incorporated 2 additional ~sec that
were included for the mathematical operation
not listed in the coding. As with Algorithm
1, the limitations are due to the use of halfwords for the index vector, and to the use of
an index register. Note that there is a considerable time savings, but at the expense of
computer memory. Again, not taken into
consideration is any further computer processing, other than the above coding, such as
updating index registers, which may be necessary and require more time.
Unhke the bit map scheme, where the entire row of the bit map up to the desired element must be scanned for nonzero entries
before data manipulation can occur, the address map method requires only a reference
to the desired element. Because the storage
location of a data element is found independently of all except the desired address
displacement, the address map method
blends well with the concept of parallel
processing. Parallel processing involves the
s~multaneous execution of a sequence of operations by dependent central processing
units. Thus, using the address map method,
4 separate central processing units could simultaneously execute the required arithmetic on 4 different elements of the matrix;
at best, using the bit map method, different
steps in the execution of 1 matrix element
would be shared by the 4 central processing
units. Employing the address map method,
the processing units could work independently, except for the final results; while the
bit map method would require transfers of
information from one processing unit to the
other processing units to execute the shared
steps, which introduces an additional time
lag.
While no references have been found to
explicitly endorse or suggest this method,
and comparatively large core requirements
exist, the address map scheme m a y prove
useful with some future computer t h a t features both very fast core of a few million
characters and a multitude of parallel proc-

Computing Surveys, Vol. 5, No. 2, June 1973
116

•

U. W . Pooch and A . Nieder

0

2

0

0

the row designation and another specified
number of bits for the column designation
(Figure 4).
If computations are to be performed in a
row manner, it is highly practical and efficient to order the nonzero entries first by
rows and then by columns. Ordering the entries by rows makes it unnecessary to maintain the row index for every nonzero element;
only the row need be identified for the first
nonzero element of each row, as it is known
t h a t all the following entries up to the next
row indicator belong to the same row. In
order to create the row marker, a check bit,
such as a minus sign bit, can be set in the
first column index word of each row (Figure
5), or as is usually done, an additional and
separate row index vector can be created
(Figure 6). The row index element generally
contains the address or index number of the
first column index for the row. The same syst e m m a y be applied to ordering the entries

I: O 4 °
oo 1
o
7

9
FIG. 3

v(]>
v(2)
v(3)
v(4)
v(5)
v(6)
v(7)

5

l

2

÷

2

Z(1)

2

l

÷

6

Z(2)

2

3

+

4

Z(3)

3

l

÷

3

Z(4)

4

l

÷

4

2

÷

7
9

z(s)
z(6)

4

4

÷

5

Z(7)

Row
FIG. 4
nators

0
Sample matrix.

Col umn

Indexing with row and column deslg-

V(2)
essing units. Hoffman and McCormick [22]
state t h a t at present the value of parallel
processing on a large scale is debatable as
far as manipulating sparse matrices, as there
are virtually no available computers with
more t h a n just a few parallel central processing units, and the field is quite unexplored.
IV. R O W - C O L U M N

2

V(1)

SCHEME

Row-column indexing schemes refer to methods relying on paired vectors of some type;
generally one vector contains the nonzero
elements, which are most often ordered by
rows or columns, and the other vector maintains the indexing information. Row-column
indexing schemes are sometimes referred to
as block index, row, or column packing
schemes, depending on the author's description of how the indexing algorithm works
[7, 15, 17, 20, 23-24].
I n the simplest, but not the most core- and
time-efficient form, each nonzero element of
the matrix has a corresponding index word
t h a t contains a specified number of bits for

Computing Surveys, Vol 5, No 2, June 1973

V(3)

V(4)
V(5)
V(6)

V(7)

÷ ~

z(l)
Z(2)

-

1

÷

+

3

+

-

1

Z(4)

-

1

z(5)
~(6)

:

÷
+
2
+
4
Row Column
indicator
(Sign b i t )

z(3)

Z(7)

FIG. 5. Indexing with row m d m a t o r and column designation

VR(1) ~
!
VR(2)
VR(3)
VR(4)
First
column
index
for
each
row
(halfword)

V(1)
V(2)
1 1
V(3)
3 1 ÷
V(4)
1 : ÷
V(5)
" '
V(6) 2 ~
÷
V(7)
4 i ÷
Column
(halfword)

2
6
4
3
7
9
5

z(1)
z(2)
z(3)
z(4)
z(5)
z(6)
z(7)

FIG 6 Indexing with row vector and column
index vector.
Indexing Techniques for Sparse Matrices
by columns if column operations are to be
performed.
Figures 3 through 6 depict sample vectors
for the row-column schemes described above.
The index vectors are V and VR; the nonzero entries are contained in vector Z. The
data matrix used in Figures 4 through 6 is
displayed in Figure 3. The nonzero entries
of the data matrix are stored by rows, in
order of increasing column number. All index
vectors are full words unless otherwise noted.
From the above figures it is evident, there
exists a wide possibility of variation in the
row-column scheme of indexing. Further
variations and adaptations can occur as a
result of optimizing peculiar computer characteristics, or as a result of making calculations on special forms of sparse matrices,
such as block matrices.
However, caution is advised, for such
optimizations may result in a useless program whenever system changes occur, and
should therefore only be used when they are
critical economies of the calculations.
In the instance of computer peculiarities,
Smith [17] states that a particular type of
second generation IBM computer did not
utilize the bits of the second word in extended-precision floating-point calculations
that were normally used as the exponent
bits in single precision floating-point calculations. A sparse matrix row-column indexing algorithm was developed that employed these otherwise wasted 8 to 9 bits as
the row or column indices, and could accommodate matrices up to order 255 and 511
respectively.
For the case of a special sparse matrix, the
row-column indexing scheme for a blockdiagonal matrix could become a blocked
indexing scheme. The blocked indexing
scheme would be identical to the row-column
method, except that the large sparse matrix
is partitmned into several smaller submatrices (blocks). Then each submatrix is
identified with a separate row-column
scheme of some sort.
A blocked indexing scheme may also be
used to refer to combining several column
indices into one block (word). For example,
one 64-bit word would contain 4 column
indices, each index of 16 bits. When a row

•

117

operation is performed, then, 4 nonzero
elements can be readied for processing at
the expense of a loading time for only one
block [17].
I t should be noted t h a t for many computers and algorithms more time is required
to load a referenced word for arithmetic
processing than is required to perform the
necessary arithmetic to isolate the required
bits of the referenced word. Likewise, more
time is required to load extended-precision
words than ordinary ,words. Also, since
most computers are geared to utilize arithmetic data primarily by words, more time
is required to load a half-word for arithmetic processing than is required to load a
full word.
Another major variation, known as delta
or displacement indexing, is also popular,
and is somewhat similar to the address map
form of indexing. For one particular example
of a delta indexing scheme, one 64-bit extended-precision word contains one 16-bit
index and six 8-bit displacements to the index. Therefore, the column indices of 7
elements can be referred to by loading and
processing one extended-precision word,
which can result in both a considerable time
and core savings. For a delta of 8 bits, it is
possible for 2 nonzero entries of the same
row to be a maximum of 255 columns apart.
If elements can appear farther apart than
255 columns, then a greater number of bits
must be allocated for each delta or the
method must be abandoned. To determine
the column number of the first element
paired with the 64-bit index word, the first
16 bits of the index word are used. In order
to determine subsequent column numbers
for any other element paired with the 64-bit
index word, the appropriate delta is added
to the first 16 bits and the sum of deltas in
between.
Smith [17] also states that delta indexing
is more efficient for large order (implied
order about 250) sparse matrices than a
blocked index form. Figures 7 and 8 depict
the blocked and delta indexed word mentioned above, and are equivalent.
EXAMPLE 3. From Figure 7, column
index 3 = 1078. From Figure 8, column
index 3 = 1027 + 20 -t- 31 = 1078.

Computing Surveys, Vol, 5, No 2, June 1973
118

*

U. W. Pooch and A. Nieder

1027
Column
index 1

1047
1078
1095
Column
Column
Column
index 2
index 3
index 4
(16 bits each index)
FIG. 7 Blocked index word.

For the row-column indexing method,
using a column index for each nonzero
entry and a row index vector, there is a
required minimum for indexing W =
I / B ( J . T . D + V) words; where I is the
number of rows; J is the number of
columns; T is the number of bits used
for a column index element; D is the
density of the matrix; V is the number
of bits used for a row index element; and
B is the number of bits per word. In
reality, however, for matrices up to
order 65,535 (in excess 32,768 notation),
half-words may be most conveniently
and efficiently used for all the row and
column indices. Half-word indices are
used to increase core savings at a
generally tolerable increase in execution
time; few it any matrices of order 30,000
or greater have been of notable use.
Using half-word indices, then, the abovementioned indexing scheme requires a
minimum core storage of
ERow-co~umn = ( 1 / 2 J + D ) %
of the full matrix for indexing.
To access an M , element, it is necessary
to refer to the ith row index, which points
to the first nonzero element of the ith row.
The column indices between the ~th and
i + 1st row indices are searched for j. If the
column indices searched do not contain j,
the M , element is zero; otherwise the data
element paired with the j column index is
fetched and processed.
For row operations, as long as the matrix
remains ordered, execution time is very fast.
For more than a few column operations,
however, on a matrix of order greater than

about 200, it is almost always more convenient and efficient to transpose the entire
matrix and reorder all the data elements
before performing the desired arithmetic.
Again, the same situation exists as with the
bit map; if the data and indexing scheme
can be transposed in less time than the difference between the column and row execution times, then the transpose operation will
conserve execution time.
Unlike the bit map and address map
schemes, which have constant core requirements for indexing, the row-column method
has a core requirement for indexing directly
proportional to the matrix density. Since
each nonzero element has a paired column
index, only the number of elements in the
row index vector is constant. For example,
adding two 50 X 50 sparse matrices, M A and
MB, does not in general produce the result
that the total number of resulting nonzero
elements is the sum of the nonzero elements
for each matrix before the matrix addition:
if M A has 250 data elements and M B has
450, the sum of matrices MA and M B will
not, in most cases, have 700 elements, i n
the sum of matrices M A and MB, the only
surety is t h a t there will still be 50 row index
elements. A variable amount of core for indexing creates core allocation difficulties
t h a t m a y not be readily acceptable to the
user.
In comparison to the bit map method, the
row-column indexing method is noted for its
fast execution time, when data elements are
properly ordered, and its ease of programming, even for matrices of very large
order (in the thousands). A wide variety of
references endorse (or imply an endorsement
of) a row-column techmque for indexing [15,
17, 25-30], or a block-diagonal method [3134], especially for particular applications, as
noted in the Introduction, or for special
matrices, such as symmetric matrices. I t
should be noted that a symmetric matrix

1027

20

31

17

Column

delta

delta

delta

m

delta

index 1
(16 bits)
(8 bits each index)

FIG 8. Delta index word.

Computing

S u r v e y s , Vol

5, N o

2, J u n e 1973

__

__

delta

delta
Indexing Techniques for Sparse Matrices
decreases by almost 50 % the core requirements in the row-column technique, both for
the data elements and for the indexing
elements.
Two of the more general sets of algorithms
encountered for processing random, and
some special, sparse matrices and employing
the row-column indexing technique are
MATLAN [29], an I B M product, and Algorithm 408 [30], a more recent private effort.
As these algorithms are readily available
and are of general interest, a particular
coding example is not given for the rowcolumn indexing technique. Both these
algorithms were intended for use on sparse
matrices of order less than about 32,700, and
are more efficient for orders less than (about)
1,000.
MATLAN is a programming system, operating under the control of Operating
System/360, and has a very wide applicability. MATLAN includes many supplementary features, such as different versions
for an all-core problem and for a segmented
problem, three overlay structures for core
storage, and options on precision. A segmented problem exists when portions of the
problem under consideration are stored in
core and on tapes or disks, an all-core problem exists when the storage requirement is
such that the entire problem is stored in fast
memory. Because of the variable precision
option and the all-core or segmented feature,
it is difficult to assess execution times. Array
dimensions are limited to 32,756, which indicates half-words are used for indexing
purposes.
Algorithm 408 uses a variation of the indexing algorithm depicted in Figure 6.
Instead of having the row index vector contain the address or index number of the first
column index for the row, the row index
vector contains the number of stored elements in the row. In addition, the row index
vector is appended to the column index
vector by using the same array name, M.
While the scope of Algorithm 408 is not as
broad as ~¢IATLAN, Algorithm 408 has the
distinct advantage of being readily alterable: a section of the reference is devoted to
possible alterations, such as combining three
or more indices to a word of the M array.

•

119

Because of the great variation in coding,
at present it is not considered economically
worthwhile to compare actual core storage
and execution times to determine which of
the many different existing algorithms employing the row-column method is the most
efficient or optimal.
A good basis for examing some of the rowcolumn indexing scheme characteristics rests
on using half-word indices, with a row index
vector, for calculations. At worst, the
method (as typified by Algorithm 408) will
utilize less core than the full matrix up to a
density of slightly over 66%. Conservation
of core allocation and execution time increases as the density decreases.
It has been noted that the bit map method
employs approximately 4 % of the full matrix
for indexing. Therefore, it can easily be seen
that when the matrix density falls below
about 4%, the row-column method will
conserve more core than the bit map scheme.
In addition, the advantage of the faster indexing into the data by the row-column
method in this case almost excludes the use
of the bit map, except for special cases, such
as a Boolean problem.

V. THREADED LIST SCHEME

A threaded, or linked list, scheme contains
one element of an array in core for each nonzero element of the sparse matrix. Each
array element in a linked list method has at
least three components: one component
contains the row and column indices; another contains the matrix element (data);
and the third contains the address of, or a
pointer to, the next array element.
If the third component of an array element
were not present, the linked list scheme
would have, at an absolute minimum, the
same core requirement for indexing as the
row-column method. The third component
adds W = A*D/B more words for indexing
which gives a minimum total of W -- I / B
((J.T A- A)D A- V) words for indexing
a threaded list scheme: where I is the number
of rows; J is the number of columns; D is
the density of the matrix; T is the number of
bits used for a column index; V is the number

Computing Surveys, Vol. 5, No. 2, June 1973
120

•

U. W. Pooch and A. Nieder

of bits used for a row index; A is the number
of bits required for an address to range
through the entire amount of core used to
contain the complete threaded list; and B is
the number of bits per word. For any practical application, however, both the row and
column indices must be retained, which
gives an overall minimum core allocation
for indexing of W = I . J . D ( T + V + A ) / B
words.
As in the previously discussed methods of
indexing, half-words (16 bits) are used in
practice for both the row and column indices,
which give capabilities of a matrix of order
65,535 (in excess 32,768 notation). In addition, because of the great difficulty and
great time involved in manipulating addresses of less than full word size (refer to
Bit Map Scheme), full words (32 bits) are
conveniently used for addresses. These considerations now require for the overall minimum core storage for indexing, W =
2 . I . J . D words. As a percentage (E) of the
full matrix, this is
E L m k e d LI~t =

2*D %

necessary for indexing.
In order to reference an M , element, the
entire threaded list must be searched if the
nonzero elements are stored in a random
manner. Elements can be stored, except for
updates, and accessed more efficiently by
rows and colums, which can reduce access
time to particular elements or rows of elements. Elements need not be stored contiguously for reasonably efficient processing.
In one particular application of a threaded
list scheme, data elements were initially
stored by rows and columns, and a table of
pointers was kept. Each pointer addressed
the beginning element of a group of 8 elements. Any particular item, or row of items,
could be found by a binary search on the
list of pointers. Example 4 typifies the search
for a particular matrix element in this application of linked list indexing.
EXAMPLE 4. Matrix elements are
stored by rows and columns. The
element to be found is in the middle row
of the matrix, so the pointer in the
middle of the pointer list is selected.
The contents of the pointer word
Computing Surveys, Vol 5, No 2, June 1973

addresses an element of the linked list.
The element is then examined, to
compare the row and column components with the required row and
column numbers. Three separate cases
can now occur:
(1) If the row and column numbers
match, the correct element has been
found.
(2) The rest of the elements in the
group of 8 are searched, and if the
row matches, but not the column, it
is known that the correct group can
probably be found by a search on
the next few pointers about the
pointer last used. if the pointer
indexed an element whose column
number was greater than required,
then the next lower pointer is used.
(3) The rest of the elements in the
group of 8 are searched, and if the
row doesn't match, then a binary
search on the pointers is continued.
In a binary search, if the pointer
indexed an element whose row
number was greater than required,
the next pointer to be selected is
the one halfway between the last
pointer (upper bound pointer in
this case) and the lower bound
pointer (the first pointer in this
case).

When the procedure is iterated, (2) above,
and the appropriate groups are searched, but
the correct row and column cannot be found,
then it is known that the required matrix
element is the null element.
It should be noted that unless the data
elements are in reasonable order, the binary
search on the pointers is almost useless. The
particular value of a linked list is that there
is no longer the requirement that data elements be stored contiguously: updates, insertions, and deletions of matrix elements
are performed by altering the address component of the appropriate hnked elements.
However, a linked list expansion or contraction results in some pointer groups
having a greater number of link elements,
and some other pointer groups having fewer
link elements. The alterable number of link
Indexing Techniques for Sparse Matrices
elements in each pointer group necessitates
a periodic updating of the pointer table. A
pointer table update is vital to the efficiency of the binary search, and may require
a great amount of execution time. The
amount of execution time required for a
pointer table update depends directly on
the number of link elements to be grouped,
as each link element must be inspected m
order to find each successive link element.
For peak efficiency of the binary search,
every group should have the same number
of linked list elements.
Using the additional pointer table to
combat the otherwise slow execution time of
the linked list scheme, one pointer exists
for each 8 nonzero matrix elements. Employing a full word for each pointer, which
is an address, we now have a minimum indexing core requirement of W = 21/~*I*J*D
words, for
ELmked

List

--~

2 . 1 2 5 , D %

of the full matrix. This is a much greater
core requirement than the row-column
methods of the previous section require for
any matrix of order greater than three.
Figure 9 depicts a few elements of a linked
list, and the correlation between elements.
A pointer table is not included.
Not previously mentioned is the practical
necessity of maintaining a table of available
addresses, so that core allocation remain
conservative during the insertion and deleAddress
Address

I051

next

RW
O

Column

element

. . . . . . . . . . . . . . . . . . .

Data
element

*

Address
1162

.

F'2

I 3 1 9841

J
i
i . . . . . . . . . . .

Address

.

.

.

.

.

.

.

.

.

.

i.
.

"1

. . . . .

I

I

1273

.

I

i
i

I

H 41 1,4

FIG 9

f .6'2 J

Linkedhst elements.

f

•

121

tion of matrix elements. When matrix
elements are deleted, the address of the
deleted link element must be appended to
the table of available addresses. Not only
must the table be maintained in fast core
but the threaded list scheme additionally
requires a buffer area to be used for the inserted and/or deleted link elements. If such
a buffer area is not used or kept, then core
will not be conserved and the prime ~dvantage of the threaded list will have been
discarded.
Few references endorse, or suggest endorsement of, the linked list scheme as a
practical method for indexing sparse matrices [15, 34-37]. Only a few sources [15,
38-40] found in the literature survey actually utilized the threaded list scheme;
while the actual algorithms were seldom
described in great detail, the scheme basically followed the designs of Example 4.
Overall, the threaded list technique of indexing into sparse matrices requires a significant amount of execution time for processing
indices, in addition to the core requirements
of a buffer and two separate tables. Inherent
in the method, then, are considerable execution times for processing and considerable
core expenditure, in comparison with the bit
map and row-column schemes for identical
matrices. Offsetting these disadvantages,
however, the linked list scheme has the
distinct advantage of not requiring a significant amount of execution time to update the
linked list by insertion or deletion of single
matrix elements or series of matrix elements.
All other previously discussed indexing
techniques require a shifting of data when
an update is performed, which will take a
great amount of execution time when
numerous matrix elements have to be shifted
to make the appropriate word available for
the update. The linked list scheme is slow for
random processing of matrix elements; however, in many applications items are accessed sequentially by row or column. In
these applications, proper chains of pointers
speed up processing greatly. As with previous methods, a definite symmetry of the
sparse matrix reduces proportionately the
core requirements for indexing.

Computing Surveys, Vol. 5, No. 2, June 1973
122

•

U. W . P o o c h a n d A . N~eder

Vl. DIAGONAL OR BAND INDEXING SCHEME

/
-199

Band and diagonal matrices are special
types of matrices t h a t occur frequently in
electrical engineering,
structural
engineering, nuclear engineering and physics,
solutions to differential equations, and a
host of other fields, as mentioned in the
I n t r o d u c t i o n . Band and diagonal matrices,
while of frequent occurrence, should not be
mistaken as a general case of sparse m a t rices.
When band or diagonal matrices occur, a
special effort on the part of the user should
be made to a d a p t his processing a n d / o r indexing algorithms to the case at hand. This
adaptation should be made because of the
inherent simplicity of processing, manipulating, and solving band matrices, and also
because of the opportunity to minimize core
allocation and execution time.
In most cases, band or diagonal matrices
are processed either wholly by rows or columns, and httle or no processing of single
elements occurs. For a band matrix, a comm o n manipulation involves decreasing the
band width. I n such a manipulation, it is
normal procedure for one entire row (column) to operate on the row (column) immediately above or below it (or to either
side). With such a simple processing sequence, it is evident t h a t only a few rows
(columns) need be maintained in fast core
for immediate use.
If d a t a transmission rates are comparable
to the rate with which rows (columns) are
manipulated, then rows (columns) not in
immediate use can be stored on slower access
devices, such as tapes or disks. Storing data
on tapes or disks frees the more expensive
fast core. I n most machine configurations
there is a much larger amount of m e m o r y
available in the slower devices. When slow
devices can be used efficiently for processing
band matrices, the capability of manipulating large order sparse matrices is limited
by the m a x i m u m allowable execution time
and the desired accuracy limits of the results,
and not by the order of the matrix involved.
To further conserve execution time, but at
the expense of fast memory, the entire band
matrix can be stored in fast core. Preserving

Computing Surveys, Vol 5, No 2, June 1973

lO0 5

99

-199.

lOl

98 5

-199
98

lOl 5

0

-199, I02
97 5 -199
97

102 5
-199

I03

96,5 -199
0

96

103 5
-199

104

95 5 -199



/

FIG 10. Band matrix.
the entire matrix in fast core eliminates the
transmission times between fast core and
auxiliary devices, as well as the time required to restore elements in fast core,
which is done prior to data manipulation
and processing. Another prime a d v a n t a g e
directly involved with data transmission is
the use of overlapping channels in burst or
select mode. However, when the matrix is
fully maintained in fast core, channels will
then be available to other users on multi-user
computers.
If the band matrix has full bands, t h a t is,
no row has any zero elements within the
band, then the total number of elements to
be stored is the band width multiplied by
the number of rows in the matrix. Figure 10
depicts a band matrix with full bands (a
band width of 3 here):
EXAMPLE 5. Figure 10 is the resulting
9 X 9 matrix obtained by using a central
difference approximation (3 points) to
solve the boundary-value differential
equation 2 + 3t 2 = y + y' + y" using
10 intervals between the points y ( t =
0) = 0. a n d y ( t = 1) = 1.
A 5-point interpolation would yield a
band width of 5; 50 intervals would
result in a 49 X 49 matrix. N o t e that
the augment column, a constant
associated with each row of the matrix,
is not considered here as an integral part
of the sparse matrix. Accuracy of results
depends on the number of intervals,
n u m b e r of points in the interpolation
formula, and computer round off.
I n one particular application of processing
a band matrix by rows (columns), it is convenient and efficient to store elements in full
vectors, one vector for each super- or sub-
Indexing Techniques for Sparse Matrices
diagonal of the band matrix. Since the
diagonal has the greatest number of elements,
the vector for the diagonal will be the largest
vector. To avoid double indexing, which
takes greater execution time, an additional
table of addresses is created. Each element
of the address table contains the address of
the first element of the respective vector.
The indexing scheme in the algorithm used
to arithmetically manipulate the band
matrix is then altered to suit the storage
scheme.
If, for some reason, it is more convenient
to store elements in a row or column form,
e.g., because of a very difficult or time-consuming arithmetic manipulation, most of
the advantage of employing a band scheme
is lost, and other methods of indexing should
be considered.
Band matrices, as noted above, are unusual from an indexing standpoint because
of the very slight core requirements for
indexing. For the application described
above, only W = I , V / B words are required
for indexing; where I is the number of rows;
V is the number of bits used for a row index
element; and B is the number of bits per
word. As a percentage (E) of the full matrix,
this indexing requirement is
Ezand = 1 0 0 / J %
where J is the number of columns in the
matrix when full words are used for the
table of addresses. If hMf-words are adequate,
it decreases this requirement further by onehalf.
It should be brought to the attention of
the user that in the instance where bands do
contain zero elements, a decision should be
made whether to employ a band scheme,
which may not be very efficient in use of core
if a large number of null entries exists, or
some other particular scheme, such as a
block-diagonM scheme, which may not conserve execution time.
Many papers [4, 10, 34, 40-43] are concerned with band matrices, primarily, as
said, because of the prevalence of band
matrices in many specific fields of interest.
Also, many algorithms are readily available
for processing band matrices; FOaTRAN M

•

123

[44] being one of the more recent programming packages.

VII. CONCLUSION

In the previous sections four major types of
indexing methods were discussed, three of
which are in general use: the bit map scheme,
the row-column scheme, and the threaded
list scheme. Each major type, of course, has
many variations (the address map method is
not in general use at present, so no variations
occur). The important special case of the
band matrix is discussed as a separate entity,
because it is not a general case of a sparse
matrix, even though it has wide application.
As stated in the Introduction, one of the
major considerations in selecting a particular
indexing method is the amount of fast core
the method requires, in addition to the data
elements. The indexing in the bit map
method requires a fast core allocation of
approximately 4 % of the full matrix; in the
address map method indexing requires about
25 % of the full matrix. The row-column and
threaded list schemes have no definite core
requirements for indexing, and fast memory
for indexing is directly proportional to the
sparse matrix density. The percentage of the
full matrix required for indexing a rowcolumn scheme is about one times the matrix
density, and about twice the density is required for a threaded list scheme.
Previous discussion indicated that an
exact comparison of execution times must
reflect the type of mathematical manipulation being performed on the sparse matrix.
For example, the bit map method is of particular use when the matrix is used to produce an "optimal" ordering, so the matrix
inverse will not have a greatly increased
density. In contrast, the row-column method
is faster than other methods when manipulations involve one row (column) acting on
other rows (columns).
The second important aspect of indexing
scheme selection is the conservation of
execution time. If arithmetic operations are
to be performed on the data, primary consideration should first be given to a rowcolumn method; if Boolean arithmetic or

Computmg Surveys, Vol. 5, No 2, June 1973
124

•

U. W. Pooch and A. Nieder

reordering algorithms are to be performed,
the bit map scheme should be considered
first; and if a great number of data elements
are to be reordered, created, or annihilated,
a threaded list scheme deserves first consideration.
The bit map scheme has a definite core
allocation for indexing, offers a reasonable
row access time, is quite fast in execution
time when row operations are performed, is
core efficient when the matrix density is
greater than 4 %, and allows very fast manipulation of logical (Boolean) operations.
Logical operations can be conveniently used
to determine when arithmetic operations are
to be executed.
As to its disadvantages: the bit map
scheme has extremely poor column access
time when elements are ordered by rows,
which in most cases requires transposing
the bit map and reordering the data elements: it makes poor use of parallel processing, requires considerable time to reorder
data elements, and is not core efficient when
matrix density fails below 4 %.
The address map proves advantageous
when character addressing is available,
makes very efficient use of parallel processors, provides ready access to any element,
does not require an extensive amount of execution time (in comparison to the bit map
scheme) to reorder data elements, and exhibits a reasonable row and column execution time.
The primary disadvantages of the address
map method are: a large fast core requirement for indexing; and the relatively large
execution time, in comparison with the
threaded list scheme, to reorder matrix
elements.
Both bit and address maps require significant execution times to transpose the mat r i x - t h e map must be transposed, and all
the data elements must be reordered. Execution time to transpose the matrix is
directly proportional to the order of the
matrix and the matrix density.
Primary advantages of the row-column
schemes are: a very fast row access time in
comparison with the bit and address maps;
a relatively fast column access time in comparison to all other methods; conservation of

Computing Surveys, Vol 5, N o

2, June 1973

core with matrices of less than 4% density
when compared to the bit map method; an
increase in efficiency as the order of the
matrix increases, as more complex variations
become more efficient; and faster reordering
than the bit map or address map methods.
The main disadvantages of the row-column scheme are that column access time
and the time required to reorder elements
greatly increase as the matrix order a n d / o r
matrix density increases.
The threaded list technique is the sole
technique that allows a simple and fast executing method of reordering, adding, or
annihilating data elements.
The threaded list scheme exhibits a
variety of disadvantages, the primary ones
being a large core requirement for indexing
in comparison with the row-column method,
a slow access time for rows when elements
are stored by rows, and an even slower access
time for columns compared with the rowcolumn method. The inclusion of orthogonal
links, as discussed by K n u t h [35], removes
some of the column access difficulties, but
only at the price of additional storage.
For the special case of band matrices, a
scheme similar to the one described in Part
VI should be used unless either half or more
of the elements within the b, nd width are
null, or the nature of the mathematical
operations to be performed dictates otherwise (as described in Part VI). If the band
matrix scheme cannot he utilized, the user
must decide which characteristics of the
other types of indexing are considered vital
to the solution, and select a method on this
basis.
A final major aspect of indexing the user
must consider concerns the adaptability and
flexibility of programming the selected
scheme, which depends upon the factors
enumerated in the Introduction. The following suggestions and comments concerning
programming flexibility and adaptability
are offered.
None of the major types of indexing
schemes requires double indexing. Double
indexing involves using one register (adder)
to index across the row, and another register
to index down the column. Double indices
have at least three drawbacks: they require
Indexing Techniquesfor Sparse Matrices
more time than single indices; the computer
may have a built-in limit on the number of
characters or words that can be indexed by
one or both of the registers before a new
index (base) register must be designated;and
registers are at a premium, because of the
extremely fast register to register operation
time, and should be used for more vital arithmetic. In the last analysis, the increased
time involved in double indexing is the
critical factor.
In general, the larger the order of the
matrix, the lower the matrix density. Because of this the row-column method is
preferred for matrices with orders of 1000 or
more, especially when arithmetic manipulations or operations are to be performed.
As the order of the matrix increases, it
becomes more efficient to employ more complex variations of the major types. For instance, the delta indexing scheme (as described in Part VI) conserves a considerable
amount of fast core compared with the
simpler row-column schemes, without a
great increase in execution time, when the
order approaches 1000.
If the matrix requires more fast core than
is available, the user must decide either to
segment the matrix between fast and slow
core, or to reduce the complexity of the
problem. If the problem can be simplified,
or the matrix condensed or partitioned
(blocked), then it is not necessary to segment the matrix between fast and slow core.
Simplifying the matrix involves the real
consideration of whether or not it is economically feasible to reorder rows and/or
columns to produce a new matrix that can
be more efficiently processed. Many schemes
have been developed [7, 16, 18, 27] to attempt such an optimal ordering of matrix
elements. Condensing the matrix involves
the elimination of data elements that produce insignificant or negligible change in the
results. Such condensing can often be done
with reasonable competence by somebody
skilled in the nature of the problem to be
solved. If the matrix is of block-diagonal
form, each block can be processed as a
separate entity to produce a composite result.
The availability of a virtual memory

•

125

processor might lead the user to the erroneous conclusion that the benefits of a
proper indexing algorithm are negated. This
is not so; at some time during the processing
of a sparse matrix the matrix must reside in
physical memory. It then follows that the
fewer the number of pages occupied by the
sparse matrix, the fewer the page faults
generated, and therefore the less time involved in moving the matrix to and from
peripheral paging devices. In other words,
the same benefits accruing from indexing in
an ordinary processor apply in a virtual
memory processor.When such updating of data files is anticipated, the user should designate buffer
storage. When new matrix elements are
introduced, they should be stored in the
buffer area. When a considerable humber of
corrections to the data elements exist (about
5%), then the matrix is reordered. The
threaded list scheme requires no separate
buffer area, as a buffer is inherent in the indexing scheme.
The segments of coding that contain the
actual indexing algorithm should be programmed in a low-level language, such as
assembly language, to conserve execution
time. High-level languages, such as FOgWRhN
utilize a compiler, which may not produce
the most efficient coding. For instance, if a
division by 32,768 is necessary, the high-level
language may simply create a division by
32,768 in assembly language. If the highlevel compiler, however, recognized that a
division by 32,768 is identical to shifting an
accumulator right 16 bits, the assembly
language version would be a shift right
logical or shift right double logical. The first
version would require significantly more
execution time than the more efficient assembly language program version. A considerable savings is realized when the computation is performed perhaps as many as
several million times in a program.
The user should avoid making the indexing algorithm in a subroutine form,
especially in a high-level language, because
of the added linkage time during program
execution.
While a "fast" algorithm for indexing into
arbitrarily sparse matrices would allow very

Computing Surveys, Vol. 5, No. 2, June 1973
126

•

U. W . Pooch and A . N~eder

efficient core storage allocation and execution
times for matrix manipulations, it is also
evident that no such single algorithm exists,
at least at present. The advent of array
processors and pipeline computers may
eliminate the desire to handle sparse matrices in any special manner whatsoever.
However, it also appears that no matter how
large, or how fast and sophisticated, computing machines become, users will continue to strive for core storage conservation
and faster execution times. It remains to be
seen if sufficiently sophisticated indexing
algorithms will be developed to accomplish
those goals in array or pipeline machines;
or whether such machines will come into

Computing Surveys, Vol. 5, No 2, June 1973

general use and provide an environment
conducive to developing sparse matrix
indexing schemes.
For the present, the choice of an indexing
algorithm depends upon many considerations, with each major type of indexing
discussed here having particular advantages
and disadvantages. Careful selection of an
algorithm can satisfactorily achieve the
goals of conservation of core memory and
execution time. In addition, whenever there
exists some pattern to the nonzero entries,
the possibility of reorganizing the calculations as a means to handle some sparse
matrices should be carefully considered.
Indexing Techniques for Sparse Matrices

•

127

APPENDIX
ALGORITHM

1 BIT MAP SCHEME

Statement

Meaning
is t h e row n u m b e r t h a t will b e m a m p u l a t e d
v is t h e row i n d e x v e c t o r
b = n u m b e r of b i t s / w o r d
(* -- 1)
J is t h e n u m b e r of c o l u m n s in t h e m a t r i x
(z -- 1) * J
Save (z- l)*J
(((z1 ) * J ) 4- b - D
S, = (((~ - l) * J ) 4- b -- 1 ) / b w o r d c o n t a i n s t h e
first b i t of r e q u i r e d row
E n d of row c o u n t e r ( J )
S t a r t i n g w o r d of t h e r o w
D e t e r m i n e c o r r e c t n u m b e r of d i s p l a c e m e n t b i t s ;
M A S K = m a s k for m a x i m u m d i s p l a c e m e n t
bits
S h i f t to e l i m i n a t e i n c o r r e c t b i t s ( f r o m p r e v i o u s

01
02
03
04
05
06
07
08
09

R O W ~R I N D E X ~- v(~)
BITS ~ b
R O W ¢-- R O W - 1
C O L S ~- J
ROW e- ROW * COLS
S A V E (--- R O W
R O W ~- R O W 4- B I T S R O W ~- R O W / B I T S

10
ll
12

R O W E N D (-- C O L S
S T A R T ~- R O W
R O W E N D ~- R O W E N D
MASK

13

S T A R T *- S T A R T * 2 * * S A V E

14
15
16 C O U N T

R O W E N D *- R O W E N D
GO TO ROWSCAN
R O W E N D *-- B I T S

17
18
19 R O W S C A N

R O W ~- R O W 4- 1
W O R D ~-- b i t w o r d f r o m m a p
W O R D B 1 T *-- b i t f r o m b i t - w o r d

20

C O L N U M (-- C O L N U M 4- 1

Increment column number

21

WORDB1T

22

IF YES, GO TO MATH

Following statements are branch controls
Is t h e b i t n o n - z e r o ?
Yes, an element exists.

23 E N D R O W
24

COLNUM = COLS
~
IF YES, GO TO END1

Is t h e c o l u m n c o u n t e r e q u a l to t h e r o w c o u n t e r ?
Y e s , e n d of row

25

COLNUM

26
27
28 M A T H

I F Y E S , GO T O C O U N T
GO TO ROWSCAN
R I N D E X e - - R I N D E X 4- 1

H a v e we s h i f t e d c o m p l e t e l y t h r o u g h b i t m a p
word?
Yes, fetch another word.
N o , s c a n n e x t b i t in w o r d
R I N D E X = a d d r e s s of n o n z e r o e l e m e n t

1

AND

rOW)

-

SAVE

C o r r e c t for e h m i n a t e d b i t s
B r a n c h to code to s c a n row in b i t m a p for 1 b i t s
F o l l o w m g code s c a n s o n e e n t i r e r o w of a b i t m a p .
A f t e r first w o r d of row is s c a n n e d , t h e b i t
counter (ROWEND) = b
Increment bit map word address by one
W o r d of b i t m a p
P i c k u p h{gh o r d e r b i t f r o m b i t w o r d

(WORD)
= 1

= ROWEND

COLNUM
element

=

column

number

of

non-zero

P e r f o r m r e q u i r e d o p e r a t i o n on e l e m e n t
29
30 E N D 1

GO TO ENDROW
STOP

Computing Surveys, ¥oi 5, No 2, June 1973

Return
E n d of o p e r a t i o n o n t h e row.
•

128

U. W . Pooch and A . Nieder
ALGORITHM

Statement
01
02
03

04
05
06
07 S T A R T
08
09
10
ll

I

12
13
14
15
16 M A T H

2: A D D R E S S

MAP SCHEME

R O W *-- i
R I N D E X ~-- v(~)
R O W *-" R O W -- 1
C O L S ¢-- 3
R O W (-- R O W * C O L S
R O W (-- R O W - 1
R O W ~-" ROW + 1
C O L N U M (--- C O L N U M + 1
COLNUM > COLS
IF YES, GO TO ENDROW
B Y T E ~- b y t e f r o m
address map
BYTE ~ 0
IF YES, GO TO START
C H E C K ~-- 0
CHECK *- BYTE
CHECK ~ CHECK + RINDEX

Meaning
i = row
v = row index vector
(i - 1)
3 = $ columns
j*(~ -- 1)
(3*(2 -- 1)) -- 1
Increment across row
Increment column $
E n d of row
Yes, done
Pick up partial word
I s b y t e zero?
Reenter scan process
Zero w o r k a r e a
Byte to work area
Points to non-zero element
Required operations
performed here

17
18 E N D R O W
19

GO TO START
STOP
END
ALGORITHM

Statement
01
O2
03
O4
O5
O6
07 S T A R T
O8
O9
10
11
12
13
14
15
16
17 M A T H

Reenter scan process
Finish

3: A D D R E S S

MAP SCHEME

B E G I N *-- A d d r e s s of a d d r e s s m a p
B E G I N ~-- B E G I N + J
B E G I N *-- B E G I N -- 1
C O L S (-- 3
ROWS ~
B E G I N ~-- B E G I N - C O L S
B E G I N ~-- B E G I N + C O L S
R I N D E X *- v(I)
R O W C T R ~-- R O W C T R + 1
ROWCTR > COLS
IF YES, GO TO ENDROW
BYTE *- byte from address map
BYTE = 0
~
IF YES, GO TO START
C H E C K ~- 0
C H E C K ~- B Y T E
C H E C K ~-- C H E C K + R I N D E X

Meaning
Pointer
J = column g
3 = g columns
i = g rows
Increment address
Row index vector
I n c r e m e n t row c o u n t e r
P a s s e d e n d of m a t r i x ?
Yes, passed end
Pick up partml word
Is b y t e zero?
Reenter scan process
Zero w o r k a r e a
Byte to work area
P o i n t s to n o n - z e r o e l e m e n t

Required operations
performed here

18
19 E N D R O W
20

GO TO START
STOP
END

Reenter scan process
Finish

Computing Surveys, Vol 5, No 2, June 1973
Indexing Techniques for Sparse Matrices

I

•

129

ROW + i I

,1,,

)--

IR,,DEX + v(i)

_

i_ __~ .......

~, ~IT÷bit frombitmap
I
COLS÷ J

(
D

[Row ~- RO.*CO'S i
$

,, .

I"R W
O÷

¢

NO

¢

(ROW + BITS - I ) / B I T S l !

[i.o.~,o: ~oc,~
.~

[START ÷ R W
OI
• ~
,
I MASK& SHIFT R W N I
O ED

I

....
RowEND~

NO

oc.o. ; O"E"9

@.o

@

R W N- SAVEI
O ED

FIG A1. Flowchart--algorithm 1 bit map scheme.

Computing Surveys, VoL 5, No. 2, JuBe 1973
130

•

U. W. Pooch and A. Nieder

C E K÷ 0
HC

~ NDEX÷ v(i)

ICHECK÷BYTE

F~o~~o~-~I

~-EX~
ICHECK÷ C E K+ RIND - HC

CL÷j
OS

@

O _,]
~,ROW÷R W+ l

YS
E
( IS COLNUM,COLS~
~
~NO
~_~TE ÷ bYte from address map ]
~IS BYTE= 0?~

@"°

FIG A2

YES~

F l o w c h a r t - - a l g o r i t h m 2: a d d r e s s m a p s c h e m e

Computing Surveys, Vol 5, No 2, June 1973
Indexing Techniques for Sparse Matrices

BEGIN

I,

÷ address map address I

•

131

CHECK 0
÷

T
FCHECK÷ BYTE

BEGIN ÷ BEGIN + J - l

C E K÷ C E K+
HC
HC

~ BEG,N ÷

RINDEX

$
BEGIN + CO'S l

[ RINDEX÷ v(I> I

( ~ RO.C~> c o ~

~

'~._j /

NO

I-BYTE+ byte from address map~

~ ~,~ o~; Y~ < ~

+
Fza A3

F l o w c h a r t - - a l g o r i t h m 3 address m a p scheme.

Computing Surveys, Vol, 5, No. 2, June 1973
132

•

U. W. Pooch and A. Nieder

BIBLIOGRAPHY
1. BRAYTON, R., GUSTAVSON, F., AND WILLOUGHBY~ R. "Some results on sparse matrices." RC2332, IBM Watson Research Center,
(February 1969), 37-46.
2. LARSEN, L. "A modified inversion procedure
for product form of the inverse-linear programruing codes " Comm. ACM 5, 7 (July 1962) 382383
3. LIVESLEY,R. "An analysis of large structural
system." Comp. J. 3, (1960)34-39.
4. McCoRMICK,C.W. "Application of partially
handed matrix methods to structural analysis." Sparse Matrix Proceedings, R. Willoughby (Ed.) IBM Watson Research Center,
RAl1707 (March 1969) 155-158
5 ORCHARD-HAYs, W. Advanced L~near Programming Techniques McGraw-Hill, New
York, 1968, 73-82.
6. TEWARSON, R. "On the product form of inverse of sparse matrices." S I A M Rewew 8,
(1966) 336-342.
7 TEWARSON,R. "Row column permutation of
sparse matrices." Comp. J 10, (1967/68)
300-305
8. BRAYTON, R., GUSTAVSON, F , AND WILLOUGHBY, R. "Some results on sparse matrices." (Introduction), RC2332, IBM Watson
Research Center, (February 1969) 1-3.
9. BASHKOW, T "Network analysis." Mathematical Methods for Digztal Computers A.
Ralston and A. S. Wilf, Eds., Vol. I, John
Wiley and Sons, New York, 1967280-290
1O. TINNEY,W F. "Comments on using sparsltv
techniques for power system problems."
Sparse Matrix Proceedings R Willoughby, Ed.,
IBM Watson Research Center, RAl1707
(March, 1969) 25-34.
11. PALACOL,E . L . "The finite element method
of structural analysis " Sparse Matmx Proceedzngs R. Willoughby Ed., IBM Watson Research Center, RAl1707 (March, 1969) 101-5.
12. RALSTON, A. "Numerical integration methods for the solution of ordinary differential
equations." Mathematzcal Metaods for Dzgztal
Computers A. Ralston and A. S. Wilf Eds, Vol.
I, John Wiley and Sons, New York, 1967, 95109.
13. ROMANELLI, M "Runge-Kutta methods for
the solution of ordinary differentml equations " Mathematzcal Methods for Dzgztal Computers A. Ralston and A S Wilf, Eds , Vol. I,
John Wiley and Sons, New York 1967, 110-20.
14. WAC~SPRESS,E "The numerical solution of
boundary value problems " Mathematzcal
Methods for Dzgztal Computers A Ralston and
A. S. Wflf, E d s , Vol. I, John Wiley and Sons,
New York, 1967, 121-7.
15. WEIL, R,, JR, AND KETTLER, P. " A n algorithm to provide structure for decomposition."
Sparse Matrzx Procee&ngs R. Willoughby, Ed.,
IBM Wa~sca Research Center, RAl1707
(March, 1969) 11-24
16. GUSTAVSON,F., LINIGEB,W., WILLOUGHBY,R.
"Symbohc generation of an optimal crout algorithm for sparse systems of linear equa-

Computing Surveys, Vol. 5, No 2, June 1973

17.

18.

19

20.

21.

22
23.

24.

25.

26.

27
28.
29.
30.
31.
32.
33.
34.

tlons." Sparse Matrix Proceedings R. Willoughby, Ed., IBM Watson Research Center,
RAl1707 (March, 1969) 1-10.
SMI~I, D . M . "Data logistics for matrix inversion." Sparse Matrix Proceedings R. Willoughby, Ed., IBM Watson Research Center,
RAl1707 (March, 1969) 127-32.
SPILLERS,W. R., AND t~ICKERSON,N. "Optimal elimination for sparse symmetric systems
as a graph problem." Quar Appl. Math. 26
(1968) 425-32
STEWARD,D. V. "On an
to technique for the analysis of thepproacha of large
structure
systems of equations." S I A M Rev 4 (1962)
321-42.
TEWARSON,R . P . "The Gausslan elimination
and sparse systems," Sparse Matrzx Proceed~ngs R. Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 35-42.
GIVENS, W., McCoRMICK, HOFFMAN, et al.
"Panel discussion on new and needed word
and open questions." (Chairman P. Wolfe),
Sparse Matmx Proceedings R. Willoughby, Ed.,
IBM Watson Research Center, RAl1707
(March, 1969) 159-80.
WILKES, M. V. "The growth of interest in
microprogramming: a literature survey,"
Com p. Surveys, 1,3 (September, 1969) 139-45.
ORC~ARD-HAYs,W. " M P s y s t e m s technology
for large sparse matrices." Sparse Matrix Proceedzngs R. Willoughby, Ed , IBM Watson Research Center, RAl1707 (March, 1969) 59-64.
CHANG, A. "Apphcatlon of sparse matrix
methods in electric power system analysis."
Sparse Matrix Proceedings R. Willoughby, Ed.,
IBM Watson Research Center, BAll707
(March, 1969) 113-122.
BRAYTON, n . , GUSTAVSON, F., WILLOUGHBY,
R "Some results on sparse matrices." IBM
Watson Research Center, RC2332 (February
1969) 21-22.
CHhRTRES, B A., ANn GLUDEN, J C. " C o m putable error bounds for direct solution of
hnear equations." J ACM 14, 1 (Jan 1967)
63-71
FORSY~HE, G. E. "Crout with pivoting."
Comm. ACM 3 (1960) 507-8.
JENNINGS,A. "A compact storage scheme for
the solution of symmetric linear simultaneous
equations." Comput. J. 9 (1966/67) 281-5
System 360 Matrix Language (MATLAN) Application Description, IBM H20-0479 Program
Description Manual, IBM H20-0564
McNAMEE, J M. "Algorithm 408, a sparse
matrix package." (Part I), Comm ACM 4, 4
(April 1971) 265-273.
DULMAGE, A L., AND MENDELSOHN, N. S.
"On the inversion of sparse matrices." Math.
Comp. 16 (1962) 494-496.
MAYOH,B.H. "A graph technique for inverting certain matrices." Math. Comp. 19 (1965)
644-646.
RoT~, J. P. "An application of algebraic
topology: Kron's method of tearing " Quar.
Appl. Math. 17 (1959) 1-24
SWIFT, G "A comment on matrix inversaon
by partition." S I A M Rev. 2 (1960) 132-33.
Indexing Techniques for Sparse Matrices
35. KNUTH, D. ]~. The Art of Computer Programm~ng, Vol. I, Addison--Wesley, Reading,
Mass. 1968 299-304, 554-556.
36. BERZTISS, A . T . Data Structures: Theory and
Practice. Academic Press, New York, 1971,
276-279.
37. LARCOMBE, M. "A hst processing approach
to the solution of large sparse sets of matrix
equations and the factorization of the overall
matrix." in Large Sparse Sets of L~near Equatwns, Reid, J. K., Ed., Academm Press,
London, 1971.
38. WEIL, R. L., ANDKI~TTLER,P . C . "Rearranging matmces to block-angular form for decompotation (and other) algorithms." Management Science 18, 1 (Sept. 1971) 98-108.
39. GUSTAVSON, F. G. "Some basic techniques
for solving sparse systems of linear equations "
in Sparse Matmces and Their Applications,
Rose, D J , and Willoughby, R. A., Eds.,
Plenum Press, New York, 1972 41-52.
40. FIKE, C . T . PL/I for Scientific Programmers,

41.
42.

43.
44.

45.
46.

•

133

Prentice-Hall, Englewood Cliffs, N. J., 1970
108, 180.
WILLOUGHBY, R. A. "A survey of sparse
matrix technology." IBM Watson Research
Center, RC3872 May 1972.
CuTmt.t., E. "Several strategies for reducing
the band-width of matrices." in Sparse Matraces and their Applications, Rose, D . J., and
Willoughby, R. A., Eds., Plenum Press, New
York, 1972, 34-38.
TEWARSON,R . P . "Computations withsparse
matrices." SIAM Rev., 12, 4 (Oct. 1970) 527543.
PETTY, J. S. "FORTRAN M: programming
package for band matrices and vectors." Aerospace Research Labs., Wright-Patterson AFB,
Ohio, ARL-69-0064 (April, 1969).
SHLL~RS, W . R . "On Diakoptics: Tearing an
arbitrary system." Quar. Appl. Math. 23
(1965) 188-90.
IBM System/360 Model 65 Functional Characteristics, IBM A22-6884-3, File No. $360-01.

Computing Surveys, VoI. 5, No. 2, June 1973

Más contenido relacionado

La actualidad más candente

Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957IJMER
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATANexgen Technology
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)CSCJournals
 
RFNM-Aranda-Final.PDF
RFNM-Aranda-Final.PDFRFNM-Aranda-Final.PDF
RFNM-Aranda-Final.PDFThomas Aranda
 
Fault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms andFault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms andeSAT Publishing House
 
Fault diagnosis using genetic algorithms and principal curves
Fault diagnosis using genetic algorithms and principal curvesFault diagnosis using genetic algorithms and principal curves
Fault diagnosis using genetic algorithms and principal curveseSAT Journals
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
 
USING ADAPTIVE AUTOMATA IN GRAMMAR-BASED TEXT COMPRESSION TO IDENTIFY FREQUEN...
USING ADAPTIVE AUTOMATA IN GRAMMAR-BASED TEXT COMPRESSION TO IDENTIFY FREQUEN...USING ADAPTIVE AUTOMATA IN GRAMMAR-BASED TEXT COMPRESSION TO IDENTIFY FREQUEN...
USING ADAPTIVE AUTOMATA IN GRAMMAR-BASED TEXT COMPRESSION TO IDENTIFY FREQUEN...ijcsit
 
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
 
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...cscpconf
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataeSAT Journals
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel basedIJITCA Journal
 
Semi-Supervised Discriminant Analysis Based On Data Structure
Semi-Supervised Discriminant Analysis Based On Data StructureSemi-Supervised Discriminant Analysis Based On Data Structure
Semi-Supervised Discriminant Analysis Based On Data Structureiosrjce
 

La actualidad más candente (17)

B0330811
B0330811B0330811
B0330811
 
C0312023
C0312023C0312023
C0312023
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)
 
RFNM-Aranda-Final.PDF
RFNM-Aranda-Final.PDFRFNM-Aranda-Final.PDF
RFNM-Aranda-Final.PDF
 
Fault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms andFault diagnosis using genetic algorithms and
Fault diagnosis using genetic algorithms and
 
Fault diagnosis using genetic algorithms and principal curves
Fault diagnosis using genetic algorithms and principal curvesFault diagnosis using genetic algorithms and principal curves
Fault diagnosis using genetic algorithms and principal curves
 
Cray HPC + D + A = HPDA
Cray HPC + D + A = HPDACray HPC + D + A = HPDA
Cray HPC + D + A = HPDA
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
 
USING ADAPTIVE AUTOMATA IN GRAMMAR-BASED TEXT COMPRESSION TO IDENTIFY FREQUEN...
USING ADAPTIVE AUTOMATA IN GRAMMAR-BASED TEXT COMPRESSION TO IDENTIFY FREQUEN...USING ADAPTIVE AUTOMATA IN GRAMMAR-BASED TEXT COMPRESSION TO IDENTIFY FREQUEN...
USING ADAPTIVE AUTOMATA IN GRAMMAR-BASED TEXT COMPRESSION TO IDENTIFY FREQUEN...
 
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
 
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
 
Data reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological dataData reduction techniques for high dimensional biological data
Data reduction techniques for high dimensional biological data
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel based
 
Semi-Supervised Discriminant Analysis Based On Data Structure
Semi-Supervised Discriminant Analysis Based On Data StructureSemi-Supervised Discriminant Analysis Based On Data Structure
Semi-Supervised Discriminant Analysis Based On Data Structure
 

Similar a A survey of indexing techniques for sparse matrices

Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Derryck Lamptey, MPhil, CISSP
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Editor IJARCET
 
IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363SHIVA REDDY
 
IRJET- Load Optimization with Coverage and Connectivity for Wireless Sensor N...
IRJET- Load Optimization with Coverage and Connectivity for Wireless Sensor N...IRJET- Load Optimization with Coverage and Connectivity for Wireless Sensor N...
IRJET- Load Optimization with Coverage and Connectivity for Wireless Sensor N...IRJET Journal
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paperprreiya
 
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discoveryJAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discoveryIEEEGLOBALSOFTTECHNOLOGIES
 
Optimal Converge cast Methods for Tree- Based WSNs
Optimal Converge cast Methods for Tree- Based WSNsOptimal Converge cast Methods for Tree- Based WSNs
Optimal Converge cast Methods for Tree- Based WSNsIJMER
 
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...Wireless sensor networks, clustering, Energy efficient protocols, Particles S...
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...IJMIT JOURNAL
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTIAEME Publication
 
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...IJMIT JOURNAL
 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsJigisha Aryya
 
3 3 energy efficient topology
3 3 energy efficient topology3 3 energy efficient topology
3 3 energy efficient topologyIAEME Publication
 
3 3 energy efficient topology
3 3 energy efficient topology3 3 energy efficient topology
3 3 energy efficient topologyprjpublications
 
AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...
AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...
AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...Editor IJCATR
 

Similar a A survey of indexing techniques for sparse matrices (20)

Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 
IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363
 
IRJET- Load Optimization with Coverage and Connectivity for Wireless Sensor N...
IRJET- Load Optimization with Coverage and Connectivity for Wireless Sensor N...IRJET- Load Optimization with Coverage and Connectivity for Wireless Sensor N...
IRJET- Load Optimization with Coverage and Connectivity for Wireless Sensor N...
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paper
 
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discoveryJAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
 
Optimal Converge cast Methods for Tree- Based WSNs
Optimal Converge cast Methods for Tree- Based WSNsOptimal Converge cast Methods for Tree- Based WSNs
Optimal Converge cast Methods for Tree- Based WSNs
 
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...Wireless sensor networks, clustering, Energy efficient protocols, Particles S...
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
 
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...
 
A046010107
A046010107A046010107
A046010107
 
Ia3613981403
Ia3613981403Ia3613981403
Ia3613981403
 
Ia3613981403
Ia3613981403Ia3613981403
Ia3613981403
 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systems
 
K355662
K355662K355662
K355662
 
K355662
K355662K355662
K355662
 
3 3 energy efficient topology
3 3 energy efficient topology3 3 energy efficient topology
3 3 energy efficient topology
 
3 3 energy efficient topology
3 3 energy efficient topology3 3 energy efficient topology
3 3 energy efficient topology
 
AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...
AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...
AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...
 

Más de unyil96

Xml linking
Xml linkingXml linking
Xml linkingunyil96
 
Xml data clustering an overview
Xml data clustering an overviewXml data clustering an overview
Xml data clustering an overviewunyil96
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a surveyunyil96
 
Web page classification features and algorithms
Web page classification features and algorithmsWeb page classification features and algorithms
Web page classification features and algorithmsunyil96
 
The significance of linking
The significance of linkingThe significance of linking
The significance of linkingunyil96
 
Techniques for automatically correcting words in text
Techniques for automatically correcting words in textTechniques for automatically correcting words in text
Techniques for automatically correcting words in textunyil96
 
Strict intersection types for the lambda calculus
Strict intersection types for the lambda calculusStrict intersection types for the lambda calculus
Strict intersection types for the lambda calculusunyil96
 
Smart meeting systems a survey of state of-the-art
Smart meeting systems a survey of state of-the-artSmart meeting systems a survey of state of-the-art
Smart meeting systems a survey of state of-the-artunyil96
 
Semantically indexed hypermedia linking information disciplines
Semantically indexed hypermedia linking information disciplinesSemantically indexed hypermedia linking information disciplines
Semantically indexed hypermedia linking information disciplinesunyil96
 
Searching in metric spaces
Searching in metric spacesSearching in metric spaces
Searching in metric spacesunyil96
 
Searching in high dimensional spaces index structures for improving the perfo...
Searching in high dimensional spaces index structures for improving the perfo...Searching in high dimensional spaces index structures for improving the perfo...
Searching in high dimensional spaces index structures for improving the perfo...unyil96
 
Realization of natural language interfaces using
Realization of natural language interfaces usingRealization of natural language interfaces using
Realization of natural language interfaces usingunyil96
 
Ontology visualization methods—a survey
Ontology visualization methods—a surveyOntology visualization methods—a survey
Ontology visualization methods—a surveyunyil96
 
On nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domainsOn nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domainsunyil96
 
Nonmetric similarity search
Nonmetric similarity searchNonmetric similarity search
Nonmetric similarity searchunyil96
 
Multidimensional access methods
Multidimensional access methodsMultidimensional access methods
Multidimensional access methodsunyil96
 
Machine transliteration survey
Machine transliteration surveyMachine transliteration survey
Machine transliteration surveyunyil96
 
Machine learning in automated text categorization
Machine learning in automated text categorizationMachine learning in automated text categorization
Machine learning in automated text categorizationunyil96
 
Is this document relevant probably
Is this document relevant probablyIs this document relevant probably
Is this document relevant probablyunyil96
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search enginesunyil96
 

Más de unyil96 (20)

Xml linking
Xml linkingXml linking
Xml linking
 
Xml data clustering an overview
Xml data clustering an overviewXml data clustering an overview
Xml data clustering an overview
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a survey
 
Web page classification features and algorithms
Web page classification features and algorithmsWeb page classification features and algorithms
Web page classification features and algorithms
 
The significance of linking
The significance of linkingThe significance of linking
The significance of linking
 
Techniques for automatically correcting words in text
Techniques for automatically correcting words in textTechniques for automatically correcting words in text
Techniques for automatically correcting words in text
 
Strict intersection types for the lambda calculus
Strict intersection types for the lambda calculusStrict intersection types for the lambda calculus
Strict intersection types for the lambda calculus
 
Smart meeting systems a survey of state of-the-art
Smart meeting systems a survey of state of-the-artSmart meeting systems a survey of state of-the-art
Smart meeting systems a survey of state of-the-art
 
Semantically indexed hypermedia linking information disciplines
Semantically indexed hypermedia linking information disciplinesSemantically indexed hypermedia linking information disciplines
Semantically indexed hypermedia linking information disciplines
 
Searching in metric spaces
Searching in metric spacesSearching in metric spaces
Searching in metric spaces
 
Searching in high dimensional spaces index structures for improving the perfo...
Searching in high dimensional spaces index structures for improving the perfo...Searching in high dimensional spaces index structures for improving the perfo...
Searching in high dimensional spaces index structures for improving the perfo...
 
Realization of natural language interfaces using
Realization of natural language interfaces usingRealization of natural language interfaces using
Realization of natural language interfaces using
 
Ontology visualization methods—a survey
Ontology visualization methods—a surveyOntology visualization methods—a survey
Ontology visualization methods—a survey
 
On nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domainsOn nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domains
 
Nonmetric similarity search
Nonmetric similarity searchNonmetric similarity search
Nonmetric similarity search
 
Multidimensional access methods
Multidimensional access methodsMultidimensional access methods
Multidimensional access methods
 
Machine transliteration survey
Machine transliteration surveyMachine transliteration survey
Machine transliteration survey
 
Machine learning in automated text categorization
Machine learning in automated text categorizationMachine learning in automated text categorization
Machine learning in automated text categorization
 
Is this document relevant probably
Is this document relevant probablyIs this document relevant probably
Is this document relevant probably
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search engines
 

Último

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 

Último (20)

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 

A survey of indexing techniques for sparse matrices

  • 1. A Survey of Indexing Techniques for Sparse Matrices UDO W. POOCH, AND AL NIEDER Texas A & M Umversily,* College Statwn, Texas A sparse matrix is defined to be a matrix containing a high proportion of elements that are zeros. Sparse matrices of large order are of great interest and application in science and industry; for example, electrical networks, structural engineering, power distribution, reactor diffusion, and solutions to differential equations While conclusions within this paper are primarily drawn considering orders of greater than 1000, much ~s applicable to sparse matrices of smaller orders in the hundreds. Because of increasing use of large order sparse matrices and the tendency to attempt to solve larger order problems, great attention must be focused on core storage and execution time Every effort should be made to optimize both computer memory allocation and executmn times, as these are the limiting factors that most often dictate the practicahty of solving a given problem Indexing algorithms are the subject of this paper, as they are generMly recognized as the most ~mportant factor in fast and efficient processing of large order sparse matrices. Indexing schemes of main interest are the bit map, address map, row-column, and the threaded list Major variations of the indexing techniques above mentioned are noted, as well as the particular indexing scheme inherent in diagonal or band matrices. The concluding section of the paper compares the types of methods, discusses their suitabihty for different types of processing, and makes suggestions eoneernlng the adaptability and flexibility of the maj or exmting methods of indexing algorithms for application to user problems Key Words and Phrases: Matrix, sparse matrix, matrix manipulation, indexing. CR Categomes: 5 14, 5 19 I. INTRODUCTION Computations involving sparse matrices have been of widespread use since the 1950s, becoming increasingly popular with the advent of faster cycle times and larger computer memories. One cycle time is the time required for the central processing unit to send and to receive a data signal from main memory. Systems applications for sparse matrices include electrical networks and power distribution, structural engineering, reactor diffusion, and solutions to differentim equations. A sparse matrix is a matrix having few nonzero elements. Matrix density is defined as the number of nonzero elements of the * D e p a r t m e n t of Industrial Engineering. matrix divided by the total number of elements in the full matrix. Most available references utilizing sparse matrices for calculations [1-8] consider matrices of order 50, or more [9, 10], with densities ranging from 15 % to 25 % and decreasing steadily as the order increases. This paper will accept these boundary conditions as a strict definition of a sparse matrix. Brayton, Gustavson, and Willoughby [8] say that a typical large (implied to be in the hundreds) order sparse matrix has 2 to 10 nonzero entries per row. Hays [5] says that an average of 20 nonzero elements per row is not an unreasonably small number in quite large (implied to be around 100 and greater) order. Livesley [1] indicates that an average of 3 or 4 elements Computing Surveys, Vol. 5, No. 2, June 1973
  • 2. 110 • U. W. Pooch and A. Nieder CONTENTS I Introduction II Bit Map Scheme III Address Map Scheme IV Row-Column Scheme V Threaded List Scheme ¥I Diagonal or Band Indexing Scheme VII Conclusion Appendix A Algorithm 1 Bit Map Scheme Algorithm 2 Address Map Scheme Algorithm 3. Address Map Scheme Bibliography 109-112 112-114 114-116 116-119 119-122 122-123 123-127 127-132 132-133 C o p y r i g h t (~ 1973, A s s o c ~ a t m n for C o m p u t i n g M a c h i n e r y , Inc. G e n e r a l p e r m i s s i o n to r e p u b h s h , b u t n o t for profit, all or p a r t of t h i s m a t e r i a l is g r a n t e d , p r o v i d e d t h a t A C M ' s c o p y r i g h t n o t i c e is g i v e n a n d t h a t r e f e r e n c e is m a d e to thJs p u b l i c a tion, to i t s d a t e of ~.ssue, a n d to t h e f a c t t h a t rep r m L i n g p r i v i l e g e s were g r a n t e d b y p e r m i s s i o n of t h e A s s o c m t m n for C o m p u t i n g M a c h i n e r y . Computing Sulvevs, Vol 5, No 2, June 1973 per row in a large (implied to be around 1000) order structural problem is a good estimate. If the order I of the matrix is reasonably small, i.e., about order 50 or less, it would make little difference if the full matrix were kept in core. However, if the sparse matrix is of larger order than about 50, it becomes efficient in terms of execution time and core allocation to store only the nonzero entries of the matrix. The efficiency of retaining only the nonzero elements becomes obvious in the exampie of a 500 X 500 matrix with 10 % density. With one word of storage allocated for each element, the matrix requires 250,000 words, which is very often more than is physically available. Storing only the nonzero elements requires 25,000 words. If the full matrix were multiplied by a similar full matrix a minimum of 500 X 500 X 500 = 125 X 106 arithmetic operations are required, compared to a minimum of (500 X 10 %)3 = 125 X 103 arithmetic operations when only the nonzero elements are retained. If both 500 X 500 matrices were to be retained in core as full matrices, core allocation and execution time would be prohibitive on many computers, and the problem would be abandoned as infeasible for computer solution. By storing the nonzero elements in some reasonable manner, and using logical operations to decide when arithmetic operations are necessary, Brayton, et al. [8] relate that both the storage requirements and the required amount of arithmetic can often, in practice, be decreased by a factor of I over the full matrix. Sparse matrices are classified generally by the arrangement of the nonzero elements. When the matrix is in random form, nonzero elements appear in no specific pattern. A matrix is said to be a band matrix, or in band form, if its elements a~.~ = 0 for [ i - j I > m (where m is a small integer, and usually m ~ I) and where the nonzero elements form a band along the mam diagonal. The band width is the number of nonzero elements that appear in one row of a band matrix (i.e., 2m ~- 1). A block-diagonal form occurs when submatrices of nonzero elements appear along the matrix diagonal. In block
  • 3. Indexing Techniques for Sparse Matrices form, the matrix has submatrices of nonzero elements that occur in no specific pattern throughout the full matrix. The block dimension is the order of a submatrix in a block or block-diagonal matrix. In electrical network and power distribution problems, the matrix is generally in random, band, or block-diagonal form, with the elements representing circuit voltages, currents, impedances, power sources, or users [9-10]; in structural engineering applications, the sparse matrix is generally of band or block form, with the band width or block dimension representing the number of joints per floor [3, 11]; in reactor diffusion problems and differential equations, the band form of matrix is most common, with the band width being the number of points used in a pointdifference formula [12-14]. This paper, while not concerned with the actual mathematical manipulations of sparse matrices, is primarily concerned with the indexing algorithms employed in such calculations. If the sparse matrix is stored in a haphazard manner, elements can only be retrieved by a search of all the data, which takes much time. If the sparse matrix is stored in some very convenient form, execution time will be much less. Conservation of execution time is of major importance in selecting an indexing algorithm. Another major consideration in selecting a particular indexing method is the amount of fast core the method requires in addition to that used for the storage of the nonzero data elements. For most applications, a small difference in core allocation between two methods is not a critical factor. In this case, the critical consideration is the execution time difference between the two methods. Since execution times vary greatly with the methods of indexing, an exact comparison of execution times must reflect the type of mathematical manipulation that is to be performed on the sparse matrix. One last major aspect of indexing algorithm selection concerns the adaptability and flexibility of programming the selected scheme. This depends in great part on the type of machine, business or scientific; machine configuration; operating system capabilities; number of bits per word; access • 111 times for peripheral devices; average instruction times; availability of the required instructions; the maximum row or column size to be used; the expected matrix density; and the availability and size of buffers. As with most applications, the use of a high-level programming language may provide relative ease of implementation for a selected indexing scheme, but such use is frequently accompanied by penalties in execution time and storage requirements. However, on the positive side, use of high-level languages may well result in a minimum of elapsed time for problem solution with a given programming staff, as well as overall minimum cost, considering both personnel and computer usage. Problems involving large order sparse matrices focus their attention on core storage utilization and execution time minimization, and therefore all but eliminate the employment of high-level languages for indexing schemes. In subsequent sections of this paper, current indexing schemes will be examined in an attempt to isolate a "fast" indexing algorithm, with "fast" being defined as producing an optimization of execution time and core storage for sparse matrices of large order. Particular advantages and disadvantages of each major type of indexing discussed will be brought to the attention of the reader. Parts II through VI discuss aspects of particular indexing schemes, while Part VII compares the requirements and advantages of the various schemes. Part VII, in conclusion, also makes recommendations concerning the adaptability and flexibility of the major existing indexing algorithms for application to user problems. The authors have attempted, as much as possible, to make their discussions machine independent. However, the authors made use of an IBM System 360/65 Model I in their research and certain basic aspects of this machine, such as the 32-bit word, are alluded to in the succeeding pages. The interested reader should have httle difficulty in adapting the concepts presented to machines of differing architecture. Computing Surveys, Vol 5, N o 2, June 1973
  • 4. 112 U. W. Pooch and A. Nieder • II. BIT MAP SCHEME I 0100 1010 I 1001 0101 I . . . . . . . I n a bit m a p scheme, a Boolean form of the matrix M is the basic indexing reference. Whenever a nonzero entry occurs in the sparse matrix, a 1 bit is placed in the bit map, with null entries remaining as zeros in the bit map. The position of each successive nonzero entry is found by counting over to the next 1 bit in the map. More rapid access to any element of a row is achieved b y providing an additional row index vector, where each element of t h a t vector is the address of the first nonzero elem e n t of each row [16]. An additional column index vector m a y also be applied for a more rapid column access, but this will also necessitate storing each nonzero entry twice. I t should be noted, however, t h a t any machine based on word, rather t h a n bit, addressing techniques will give much slower access in one dimension of the matrix t h a n in the other. As an example, the following matrix M, and its associated bit m a p and reduced Zvector is given. M= BM= 05 00 10 [3,2,5,4,7,1,8] Z-- 01 00 10 Figure 1 demonstrates a sample bit m a p supplemented with the row index vector V; the Z elements are the nonzero elements of the matrix. The bit m a p in Figure 1 is a matrix conception of the bit map. To conserve core, instead of using one word for each row of the bit map, all four rows (16 bits) are cornv v(2) • 2 • z(2) V(3) • 4 , Z(4) z(5) V(4) ) z(3) ) Z(6) z(7) R WIndex ValueIndicates O f l r s t nonzero element for row FI~ 1. , Z-vector value Sample bit m a p . Computing Surveys, Vol 5, No 2, June 1973 Bit Map byte 1 byte 2 ] byte 3 FIG. 2. Bit map of Figure 1 in core. pacted into one word as shown in Figure 2 with byte (8 bits) boundaries marked. F r o m Figure 2, it is simple to see t h a t the bit map, being the Boolean form of the matrix, will, in fast core, require at least W = I . J / B words, where I and J are the dimensions of the matrix and B is the n u m b e r of bits per word; W is rounded up to the nearest integer. The bit m a p uses at m i n i m u m Emt Map ---- (100/B) % of the storage requirements of the full matrix for indexing. The additional row index vector adds W= I . A / B more words, where A is the n u m b e r of bits required for an address. Supplemented with the row index vector, E•lt Map -~- R O W I n d e x = IO0/B (1 ~- ( A / J ) ) % of the full matrix is required for the indexing. Now, if the sparse matrix has less t h a n 65,536 nonzero elements, then A can be 16 bits in excess 32,768 notation. I n a 32-bitword machine for example, 16 bits m a y be conveniently accessed if the instruction set has a complement of half-word instructions. Attention should be given to the number of bits required for an address to range through the m a x i m u m core size. If this number of bits is not conveniently manipulated, it will be necessary to use more than the m i n i m u m a m o u n t of core to gain an execution advantage. Execution times for full word instructions are often less t h a n execution times for half-word instructions. Therefore, when choosing a convenient number of bits for A, the n u m b e r of bits used for an address, it is i m p o r t a n t to realize the tradeoff between core conservation and access time. Using B = 32 bits (word length), and A = 16 bits (half-word length), for a 500 × 500 matrix the bit m a p and row index vector require 8313 words, or 3.325 % of the 250,000 words for the full matrix; if the matIix is only 5 % dense, another 12,500 words are required for the nonzero elements; the total is 20,813 words, or 8.325 % of the full matrix.
  • 5. Indexing Techniques for Sparse Matrices In order to reference the M,~ element, it is necessary to physically count across to the j t h element in the zth "row" of the bit map. The correct bit will lie in the S~ = ((i - 1) * J + j + (B - 1))/B word of the bit map. To isolate the required bit, it will be necessary to either shift the word the necessary number of bits or mask all the other bits by a logical operation. If a shift is used, then repeated shifts perform a row operation when the bit map is stored by rows. Algorithm 1 (see Appendix) isolates the correct beginning word of a row in the bit map; a segment of the code shifts through one entire row, in preparation for a mathematical manipulation of the row. Algorithm 1 with slight alteration will accommodate matrices up to order 100,000. The restriction occurs in statement 06, where the multiplication must not result in loss of significant bits due to exceeding word size. In practice, the algorithm is limited either by the index vector being half-words, as indexing is provided for only 65,536 nonzero elements; or by 4095 rows or columns, the maximum number used in the indexing in statement 02. When the bit map is stored by rows, as in the algorithm above, then to perform a column operation it is necessary to count to the correct j bit for all I rows. This means executing virtually the entire algorithm I times. If more than a few column operations are to be performed, then execution time will become an important factor. The execution time is dependent on the density of the sparse matrix, the order of the sparse matrix, and the number of column operations to be performed. The time factor is exemplified by the following: EXAMPLE 1: A 500 X 500 m a t r i x exists, and it is necessary to perform 10 column operations when the matrix is 5 % dense. The average column execution time will be that of the 250th column. Assuming the entire algorithm is executed for each row, the execution time will be approximately: 500 rows X 10 column operations [(time to locate beginning of each row) • 113 + .05 density X 500/2 columns X (to process 1 bits) + (1 - .05 density) X 500/2 columns X (time to process 0 bits) + 500/2 columns X (time to locate bit in bit map) + 500/2 words X (time to locate word in bit map)] which is about 10 seconds on the I B M 360/65, with additional microseconds incorporated for the mathematical operation not listed in the coding. Had the same procedure been carried out on the transpose of the bit map, that is, the bit map is now column-oriented instead of row-oriented, then the execution time would have been cut by a factor of about 500, a considerable time savings. Not taken into consideration is any further computer processing, such as updating an index register after each 4095 characters or bytes, if necessary. If the bit map of the sparse matrix can be transposed and the data rearranged in less time than the difference between the column and row execution times, then the transpose operation will conserve execution time. In the above example, the difference between column and row execution times is about 9.7 seconds. For certain types of operations the bit map is ideal. Being in Boolean form, which means elements are either 1 or 0, true or false, or plus or minus, the bit map is the most compact form for logical operations, such as AND, OR, or E X C L U S I V E OR. Thus, if matrices MA and M B exist, and it is necessary to determine which elements are nonzero in both matrices, it is necessary only to A N D each word of bit map MA with the corresponding word of bit map MB. If the result is zero, both are not present; if the result is nonzero, the indicated elements appear in both matrices. An E X C L U S I V E OR determines which elements are present in either, but not both, of the matrices; an OR determines which elements appear in either or both of the matrices. Logical operations performed on the bit map require about 1/~2 of the execution time for the same logical operation on the full scale matrix, because the bit map on a 32 bit-word machine condenses 32 pieces of data into 1 word. Additionally, Computing Surveys, Vol. 5, No 2, June 1973
  • 6. 114 • U. W. Pooch and A. N~eder and often most importantly, the bit map conserves core storage. To determine how many elements will be present in the sum of two rows, and their order, an OR is performed on the two rows of the bit map. Using similar techniques, the feasibility of rearranging the matrix in a form more convenient for the user, such as diagonal form, where nonzero elements appear all along the diagonal, is determined. Kettler and Well [15] discuss some of the aspects of such a rearrangement algorithm. M a n y references are found to endorse or suggest the use of a bit map scheme for sparse matrices [7, 15-20], but it is particularly difficult to ascertain the exact algorithms utilized, as most authors do not include these in their papers. While a bit map scheme appears convenient and fast, it is restricted by the amount of fast core available for the bit map. In the case where the sparse matrix is less dense than the percentage of the full matrix that the bit map scheme occupies, core storage will be conserved by switching to an alternative method of indexing. Givens [21] has suggested that the bit map scheme would be more attractive to users if some special instructions were designed and implemented, to further decrease execution times. One such instruction Givens references is C L E A R TO ZERO, which would clear a large block of core, e.g., the bit map, from a first to a last address. Another instruction would be LOAD N E X T NONZERO, which would fetch the address of the next nonzero entry of the bit map, given the previous nonzero element, thereby eliminating the necessity of counting through all the zero bits. These special instructions would be implemented as microprogrammed subroutines [21]. To define a microprogram, it is necessary to understand that the execution of each assembly language instruction involves a specific sequence of transfers of information from one register in the processor to another; some of the transfers take place directly, and some through an adder or other logical circuit. Each of these steps defines a microinstruction and the complete set of steps necessary to execute the assembly language instruction constitutes a microprogram [22]. Computing Surveys, Vol 5, No 2, June 1973 IlL ADDRESS MAP SCHEME The address map is similar in form to the bit map, the main difference being that the address map stores an address or address displacement for each matrix element. If the matrix element is zero then a zero address is stored. The bit map requires only one bit for each matrix element. Since an address or address displacement requires more than one bit for each matrix element, the address map scheme will require N times more core storage than the bit map scheme, where N is the number of bits used for an address or address displacement. If address displacements instead of full-length addresses are used, then the address map must be augmented by a row index vector, as with the bit map. Assuming there are less than 256 nonzero entries per row, for example, an address displacement would require only 8 bits (a common character size). If a particular computer allows character operations that are faster than the access time to an individual bit map entry, the improved column access time of the address map can warrant the increased core expenditure. On a system with 6 bit characters, up to 64 nonzero row entries can be accommodated. The overall percentage storage requirement of the full matrix required for the address map with the row index vector will be EAdd.... Map = 100/B (C + A / J ) % where B is the number of bits per word; C is the number of bits used for an address displacement; A is the number of bits used for an element of the row index vector; and J is the number of columns of the matrix. Using C = 8 bits; A = 32 bits; B = 32 bits; and J = 1000 columns, the address map and row index vector require 25.1% of the full matrix, that is 251,000 words compared to 1 million for the full matrix. In addition, if the matrix is 5% dense, an additional 50,000 words are required for the storage of the nonzero elements. In order to isolate the M,~ element, it is necessary to access the S, = C / B (i -- 1). J -t- j character (or byte). In terms of words, S, = {C[(i -- 1). J + (j - 1)] + B } / B . )
  • 7. Indexing Techniques for Sparse Matrices where i and 3 are respectively the row and column of interest. If the S~ character (byte) is zero, it is a null entry; otherwise, the content of the S~ character (byte) is added to the row index element to give the address of the nonzero element. The address map scheme is subject to many of the same limitations of the bit map scheme, and requires a larger amount of core storage for indexing. A sample coding, Algorithm 2, which has the same characteristics as the example used in the bit map method (Algorithm 1) illustrates that fewer arithmetic operations than the bit map method are required when the computer is equipped with character addressing capabilities. If the computer used does not allow convenient arithmetic manipulation of individual characters, then the coding enclosed in brackets in Algorithm 2 must be added to overcome this difficulty. The bracketed coding requires much of the algorithm time, so if a computer has built-in arithmetic character manipulation, then the algorithm becomes increasingly faster. With an example similar to Example 1, we find that the execution time, with the bracketed coding included, is drastically different from the bit map time. This is primarily because of the easy access to any character. To access by column instead of by row, only the first row location of the correct column need be found. To find the correct location of the character in row 2, it is sufficient to add just the column dimension. This process is continued until the end of the matrix is encountered. For a column manipulation, then, we easily obtain Algorithm 3, similar to Algorithm 2. EXAMPLE 2. As in Example 1, a 500 X 500 matrix exists with 5 % density, and it is necessary to perform 10 column operations. It is therefore necessary to execute Algorithm 3, 10 times, so the execution time will be approximately 10 column operations X [(initialization time to lobate beginning of each row) 500 rows X (time to locate bit in bit map) • 115 + (1 -- .05 density) X (time to process 0 bits) + .05 density X (time to process 1 bits)] which is about 30 msec on the I B M 360/65, and has incorporated 2 additional ~sec that were included for the mathematical operation not listed in the coding. As with Algorithm 1, the limitations are due to the use of halfwords for the index vector, and to the use of an index register. Note that there is a considerable time savings, but at the expense of computer memory. Again, not taken into consideration is any further computer processing, other than the above coding, such as updating index registers, which may be necessary and require more time. Unhke the bit map scheme, where the entire row of the bit map up to the desired element must be scanned for nonzero entries before data manipulation can occur, the address map method requires only a reference to the desired element. Because the storage location of a data element is found independently of all except the desired address displacement, the address map method blends well with the concept of parallel processing. Parallel processing involves the s~multaneous execution of a sequence of operations by dependent central processing units. Thus, using the address map method, 4 separate central processing units could simultaneously execute the required arithmetic on 4 different elements of the matrix; at best, using the bit map method, different steps in the execution of 1 matrix element would be shared by the 4 central processing units. Employing the address map method, the processing units could work independently, except for the final results; while the bit map method would require transfers of information from one processing unit to the other processing units to execute the shared steps, which introduces an additional time lag. While no references have been found to explicitly endorse or suggest this method, and comparatively large core requirements exist, the address map scheme m a y prove useful with some future computer t h a t features both very fast core of a few million characters and a multitude of parallel proc- Computing Surveys, Vol. 5, No. 2, June 1973
  • 8. 116 • U. W . Pooch and A . Nieder 0 2 0 0 the row designation and another specified number of bits for the column designation (Figure 4). If computations are to be performed in a row manner, it is highly practical and efficient to order the nonzero entries first by rows and then by columns. Ordering the entries by rows makes it unnecessary to maintain the row index for every nonzero element; only the row need be identified for the first nonzero element of each row, as it is known t h a t all the following entries up to the next row indicator belong to the same row. In order to create the row marker, a check bit, such as a minus sign bit, can be set in the first column index word of each row (Figure 5), or as is usually done, an additional and separate row index vector can be created (Figure 6). The row index element generally contains the address or index number of the first column index for the row. The same syst e m m a y be applied to ordering the entries I: O 4 ° oo 1 o 7 9 FIG. 3 v(]> v(2) v(3) v(4) v(5) v(6) v(7) 5 l 2 ÷ 2 Z(1) 2 l ÷ 6 Z(2) 2 3 + 4 Z(3) 3 l ÷ 3 Z(4) 4 l ÷ 4 2 ÷ 7 9 z(s) z(6) 4 4 ÷ 5 Z(7) Row FIG. 4 nators 0 Sample matrix. Col umn Indexing with row and column deslg- V(2) essing units. Hoffman and McCormick [22] state t h a t at present the value of parallel processing on a large scale is debatable as far as manipulating sparse matrices, as there are virtually no available computers with more t h a n just a few parallel central processing units, and the field is quite unexplored. IV. R O W - C O L U M N 2 V(1) SCHEME Row-column indexing schemes refer to methods relying on paired vectors of some type; generally one vector contains the nonzero elements, which are most often ordered by rows or columns, and the other vector maintains the indexing information. Row-column indexing schemes are sometimes referred to as block index, row, or column packing schemes, depending on the author's description of how the indexing algorithm works [7, 15, 17, 20, 23-24]. I n the simplest, but not the most core- and time-efficient form, each nonzero element of the matrix has a corresponding index word t h a t contains a specified number of bits for Computing Surveys, Vol 5, No 2, June 1973 V(3) V(4) V(5) V(6) V(7) ÷ ~ z(l) Z(2) - 1 ÷ + 3 + - 1 Z(4) - 1 z(5) ~(6) : ÷ + 2 + 4 Row Column indicator (Sign b i t ) z(3) Z(7) FIG. 5. Indexing with row m d m a t o r and column designation VR(1) ~ ! VR(2) VR(3) VR(4) First column index for each row (halfword) V(1) V(2) 1 1 V(3) 3 1 ÷ V(4) 1 : ÷ V(5) " ' V(6) 2 ~ ÷ V(7) 4 i ÷ Column (halfword) 2 6 4 3 7 9 5 z(1) z(2) z(3) z(4) z(5) z(6) z(7) FIG 6 Indexing with row vector and column index vector.
  • 9. Indexing Techniques for Sparse Matrices by columns if column operations are to be performed. Figures 3 through 6 depict sample vectors for the row-column schemes described above. The index vectors are V and VR; the nonzero entries are contained in vector Z. The data matrix used in Figures 4 through 6 is displayed in Figure 3. The nonzero entries of the data matrix are stored by rows, in order of increasing column number. All index vectors are full words unless otherwise noted. From the above figures it is evident, there exists a wide possibility of variation in the row-column scheme of indexing. Further variations and adaptations can occur as a result of optimizing peculiar computer characteristics, or as a result of making calculations on special forms of sparse matrices, such as block matrices. However, caution is advised, for such optimizations may result in a useless program whenever system changes occur, and should therefore only be used when they are critical economies of the calculations. In the instance of computer peculiarities, Smith [17] states that a particular type of second generation IBM computer did not utilize the bits of the second word in extended-precision floating-point calculations that were normally used as the exponent bits in single precision floating-point calculations. A sparse matrix row-column indexing algorithm was developed that employed these otherwise wasted 8 to 9 bits as the row or column indices, and could accommodate matrices up to order 255 and 511 respectively. For the case of a special sparse matrix, the row-column indexing scheme for a blockdiagonal matrix could become a blocked indexing scheme. The blocked indexing scheme would be identical to the row-column method, except that the large sparse matrix is partitmned into several smaller submatrices (blocks). Then each submatrix is identified with a separate row-column scheme of some sort. A blocked indexing scheme may also be used to refer to combining several column indices into one block (word). For example, one 64-bit word would contain 4 column indices, each index of 16 bits. When a row • 117 operation is performed, then, 4 nonzero elements can be readied for processing at the expense of a loading time for only one block [17]. I t should be noted t h a t for many computers and algorithms more time is required to load a referenced word for arithmetic processing than is required to perform the necessary arithmetic to isolate the required bits of the referenced word. Likewise, more time is required to load extended-precision words than ordinary ,words. Also, since most computers are geared to utilize arithmetic data primarily by words, more time is required to load a half-word for arithmetic processing than is required to load a full word. Another major variation, known as delta or displacement indexing, is also popular, and is somewhat similar to the address map form of indexing. For one particular example of a delta indexing scheme, one 64-bit extended-precision word contains one 16-bit index and six 8-bit displacements to the index. Therefore, the column indices of 7 elements can be referred to by loading and processing one extended-precision word, which can result in both a considerable time and core savings. For a delta of 8 bits, it is possible for 2 nonzero entries of the same row to be a maximum of 255 columns apart. If elements can appear farther apart than 255 columns, then a greater number of bits must be allocated for each delta or the method must be abandoned. To determine the column number of the first element paired with the 64-bit index word, the first 16 bits of the index word are used. In order to determine subsequent column numbers for any other element paired with the 64-bit index word, the appropriate delta is added to the first 16 bits and the sum of deltas in between. Smith [17] also states that delta indexing is more efficient for large order (implied order about 250) sparse matrices than a blocked index form. Figures 7 and 8 depict the blocked and delta indexed word mentioned above, and are equivalent. EXAMPLE 3. From Figure 7, column index 3 = 1078. From Figure 8, column index 3 = 1027 + 20 -t- 31 = 1078. Computing Surveys, Vol, 5, No 2, June 1973
  • 10. 118 * U. W. Pooch and A. Nieder 1027 Column index 1 1047 1078 1095 Column Column Column index 2 index 3 index 4 (16 bits each index) FIG. 7 Blocked index word. For the row-column indexing method, using a column index for each nonzero entry and a row index vector, there is a required minimum for indexing W = I / B ( J . T . D + V) words; where I is the number of rows; J is the number of columns; T is the number of bits used for a column index element; D is the density of the matrix; V is the number of bits used for a row index element; and B is the number of bits per word. In reality, however, for matrices up to order 65,535 (in excess 32,768 notation), half-words may be most conveniently and efficiently used for all the row and column indices. Half-word indices are used to increase core savings at a generally tolerable increase in execution time; few it any matrices of order 30,000 or greater have been of notable use. Using half-word indices, then, the abovementioned indexing scheme requires a minimum core storage of ERow-co~umn = ( 1 / 2 J + D ) % of the full matrix for indexing. To access an M , element, it is necessary to refer to the ith row index, which points to the first nonzero element of the ith row. The column indices between the ~th and i + 1st row indices are searched for j. If the column indices searched do not contain j, the M , element is zero; otherwise the data element paired with the j column index is fetched and processed. For row operations, as long as the matrix remains ordered, execution time is very fast. For more than a few column operations, however, on a matrix of order greater than about 200, it is almost always more convenient and efficient to transpose the entire matrix and reorder all the data elements before performing the desired arithmetic. Again, the same situation exists as with the bit map; if the data and indexing scheme can be transposed in less time than the difference between the column and row execution times, then the transpose operation will conserve execution time. Unlike the bit map and address map schemes, which have constant core requirements for indexing, the row-column method has a core requirement for indexing directly proportional to the matrix density. Since each nonzero element has a paired column index, only the number of elements in the row index vector is constant. For example, adding two 50 X 50 sparse matrices, M A and MB, does not in general produce the result that the total number of resulting nonzero elements is the sum of the nonzero elements for each matrix before the matrix addition: if M A has 250 data elements and M B has 450, the sum of matrices MA and M B will not, in most cases, have 700 elements, i n the sum of matrices M A and MB, the only surety is t h a t there will still be 50 row index elements. A variable amount of core for indexing creates core allocation difficulties t h a t m a y not be readily acceptable to the user. In comparison to the bit map method, the row-column indexing method is noted for its fast execution time, when data elements are properly ordered, and its ease of programming, even for matrices of very large order (in the thousands). A wide variety of references endorse (or imply an endorsement of) a row-column techmque for indexing [15, 17, 25-30], or a block-diagonal method [3134], especially for particular applications, as noted in the Introduction, or for special matrices, such as symmetric matrices. I t should be noted that a symmetric matrix 1027 20 31 17 Column delta delta delta m delta index 1 (16 bits) (8 bits each index) FIG 8. Delta index word. Computing S u r v e y s , Vol 5, N o 2, J u n e 1973 __ __ delta delta
  • 11. Indexing Techniques for Sparse Matrices decreases by almost 50 % the core requirements in the row-column technique, both for the data elements and for the indexing elements. Two of the more general sets of algorithms encountered for processing random, and some special, sparse matrices and employing the row-column indexing technique are MATLAN [29], an I B M product, and Algorithm 408 [30], a more recent private effort. As these algorithms are readily available and are of general interest, a particular coding example is not given for the rowcolumn indexing technique. Both these algorithms were intended for use on sparse matrices of order less than about 32,700, and are more efficient for orders less than (about) 1,000. MATLAN is a programming system, operating under the control of Operating System/360, and has a very wide applicability. MATLAN includes many supplementary features, such as different versions for an all-core problem and for a segmented problem, three overlay structures for core storage, and options on precision. A segmented problem exists when portions of the problem under consideration are stored in core and on tapes or disks, an all-core problem exists when the storage requirement is such that the entire problem is stored in fast memory. Because of the variable precision option and the all-core or segmented feature, it is difficult to assess execution times. Array dimensions are limited to 32,756, which indicates half-words are used for indexing purposes. Algorithm 408 uses a variation of the indexing algorithm depicted in Figure 6. Instead of having the row index vector contain the address or index number of the first column index for the row, the row index vector contains the number of stored elements in the row. In addition, the row index vector is appended to the column index vector by using the same array name, M. While the scope of Algorithm 408 is not as broad as ~¢IATLAN, Algorithm 408 has the distinct advantage of being readily alterable: a section of the reference is devoted to possible alterations, such as combining three or more indices to a word of the M array. • 119 Because of the great variation in coding, at present it is not considered economically worthwhile to compare actual core storage and execution times to determine which of the many different existing algorithms employing the row-column method is the most efficient or optimal. A good basis for examing some of the rowcolumn indexing scheme characteristics rests on using half-word indices, with a row index vector, for calculations. At worst, the method (as typified by Algorithm 408) will utilize less core than the full matrix up to a density of slightly over 66%. Conservation of core allocation and execution time increases as the density decreases. It has been noted that the bit map method employs approximately 4 % of the full matrix for indexing. Therefore, it can easily be seen that when the matrix density falls below about 4%, the row-column method will conserve more core than the bit map scheme. In addition, the advantage of the faster indexing into the data by the row-column method in this case almost excludes the use of the bit map, except for special cases, such as a Boolean problem. V. THREADED LIST SCHEME A threaded, or linked list, scheme contains one element of an array in core for each nonzero element of the sparse matrix. Each array element in a linked list method has at least three components: one component contains the row and column indices; another contains the matrix element (data); and the third contains the address of, or a pointer to, the next array element. If the third component of an array element were not present, the linked list scheme would have, at an absolute minimum, the same core requirement for indexing as the row-column method. The third component adds W = A*D/B more words for indexing which gives a minimum total of W -- I / B ((J.T A- A)D A- V) words for indexing a threaded list scheme: where I is the number of rows; J is the number of columns; D is the density of the matrix; T is the number of bits used for a column index; V is the number Computing Surveys, Vol. 5, No. 2, June 1973
  • 12. 120 • U. W. Pooch and A. Nieder of bits used for a row index; A is the number of bits required for an address to range through the entire amount of core used to contain the complete threaded list; and B is the number of bits per word. For any practical application, however, both the row and column indices must be retained, which gives an overall minimum core allocation for indexing of W = I . J . D ( T + V + A ) / B words. As in the previously discussed methods of indexing, half-words (16 bits) are used in practice for both the row and column indices, which give capabilities of a matrix of order 65,535 (in excess 32,768 notation). In addition, because of the great difficulty and great time involved in manipulating addresses of less than full word size (refer to Bit Map Scheme), full words (32 bits) are conveniently used for addresses. These considerations now require for the overall minimum core storage for indexing, W = 2 . I . J . D words. As a percentage (E) of the full matrix, this is E L m k e d LI~t = 2*D % necessary for indexing. In order to reference an M , element, the entire threaded list must be searched if the nonzero elements are stored in a random manner. Elements can be stored, except for updates, and accessed more efficiently by rows and colums, which can reduce access time to particular elements or rows of elements. Elements need not be stored contiguously for reasonably efficient processing. In one particular application of a threaded list scheme, data elements were initially stored by rows and columns, and a table of pointers was kept. Each pointer addressed the beginning element of a group of 8 elements. Any particular item, or row of items, could be found by a binary search on the list of pointers. Example 4 typifies the search for a particular matrix element in this application of linked list indexing. EXAMPLE 4. Matrix elements are stored by rows and columns. The element to be found is in the middle row of the matrix, so the pointer in the middle of the pointer list is selected. The contents of the pointer word Computing Surveys, Vol 5, No 2, June 1973 addresses an element of the linked list. The element is then examined, to compare the row and column components with the required row and column numbers. Three separate cases can now occur: (1) If the row and column numbers match, the correct element has been found. (2) The rest of the elements in the group of 8 are searched, and if the row matches, but not the column, it is known that the correct group can probably be found by a search on the next few pointers about the pointer last used. if the pointer indexed an element whose column number was greater than required, then the next lower pointer is used. (3) The rest of the elements in the group of 8 are searched, and if the row doesn't match, then a binary search on the pointers is continued. In a binary search, if the pointer indexed an element whose row number was greater than required, the next pointer to be selected is the one halfway between the last pointer (upper bound pointer in this case) and the lower bound pointer (the first pointer in this case). When the procedure is iterated, (2) above, and the appropriate groups are searched, but the correct row and column cannot be found, then it is known that the required matrix element is the null element. It should be noted that unless the data elements are in reasonable order, the binary search on the pointers is almost useless. The particular value of a linked list is that there is no longer the requirement that data elements be stored contiguously: updates, insertions, and deletions of matrix elements are performed by altering the address component of the appropriate hnked elements. However, a linked list expansion or contraction results in some pointer groups having a greater number of link elements, and some other pointer groups having fewer link elements. The alterable number of link
  • 13. Indexing Techniques for Sparse Matrices elements in each pointer group necessitates a periodic updating of the pointer table. A pointer table update is vital to the efficiency of the binary search, and may require a great amount of execution time. The amount of execution time required for a pointer table update depends directly on the number of link elements to be grouped, as each link element must be inspected m order to find each successive link element. For peak efficiency of the binary search, every group should have the same number of linked list elements. Using the additional pointer table to combat the otherwise slow execution time of the linked list scheme, one pointer exists for each 8 nonzero matrix elements. Employing a full word for each pointer, which is an address, we now have a minimum indexing core requirement of W = 21/~*I*J*D words, for ELmked List --~ 2 . 1 2 5 , D % of the full matrix. This is a much greater core requirement than the row-column methods of the previous section require for any matrix of order greater than three. Figure 9 depicts a few elements of a linked list, and the correlation between elements. A pointer table is not included. Not previously mentioned is the practical necessity of maintaining a table of available addresses, so that core allocation remain conservative during the insertion and deleAddress Address I051 next RW O Column element . . . . . . . . . . . . . . . . . . . Data element * Address 1162 . F'2 I 3 1 9841 J i i . . . . . . . . . . . Address . . . . . . . . . . i. . "1 . . . . . I I 1273 . I i i I H 41 1,4 FIG 9 f .6'2 J Linkedhst elements. f • 121 tion of matrix elements. When matrix elements are deleted, the address of the deleted link element must be appended to the table of available addresses. Not only must the table be maintained in fast core but the threaded list scheme additionally requires a buffer area to be used for the inserted and/or deleted link elements. If such a buffer area is not used or kept, then core will not be conserved and the prime ~dvantage of the threaded list will have been discarded. Few references endorse, or suggest endorsement of, the linked list scheme as a practical method for indexing sparse matrices [15, 34-37]. Only a few sources [15, 38-40] found in the literature survey actually utilized the threaded list scheme; while the actual algorithms were seldom described in great detail, the scheme basically followed the designs of Example 4. Overall, the threaded list technique of indexing into sparse matrices requires a significant amount of execution time for processing indices, in addition to the core requirements of a buffer and two separate tables. Inherent in the method, then, are considerable execution times for processing and considerable core expenditure, in comparison with the bit map and row-column schemes for identical matrices. Offsetting these disadvantages, however, the linked list scheme has the distinct advantage of not requiring a significant amount of execution time to update the linked list by insertion or deletion of single matrix elements or series of matrix elements. All other previously discussed indexing techniques require a shifting of data when an update is performed, which will take a great amount of execution time when numerous matrix elements have to be shifted to make the appropriate word available for the update. The linked list scheme is slow for random processing of matrix elements; however, in many applications items are accessed sequentially by row or column. In these applications, proper chains of pointers speed up processing greatly. As with previous methods, a definite symmetry of the sparse matrix reduces proportionately the core requirements for indexing. Computing Surveys, Vol. 5, No. 2, June 1973
  • 14. 122 • U. W . P o o c h a n d A . N~eder Vl. DIAGONAL OR BAND INDEXING SCHEME / -199 Band and diagonal matrices are special types of matrices t h a t occur frequently in electrical engineering, structural engineering, nuclear engineering and physics, solutions to differential equations, and a host of other fields, as mentioned in the I n t r o d u c t i o n . Band and diagonal matrices, while of frequent occurrence, should not be mistaken as a general case of sparse m a t rices. When band or diagonal matrices occur, a special effort on the part of the user should be made to a d a p t his processing a n d / o r indexing algorithms to the case at hand. This adaptation should be made because of the inherent simplicity of processing, manipulating, and solving band matrices, and also because of the opportunity to minimize core allocation and execution time. In most cases, band or diagonal matrices are processed either wholly by rows or columns, and httle or no processing of single elements occurs. For a band matrix, a comm o n manipulation involves decreasing the band width. I n such a manipulation, it is normal procedure for one entire row (column) to operate on the row (column) immediately above or below it (or to either side). With such a simple processing sequence, it is evident t h a t only a few rows (columns) need be maintained in fast core for immediate use. If d a t a transmission rates are comparable to the rate with which rows (columns) are manipulated, then rows (columns) not in immediate use can be stored on slower access devices, such as tapes or disks. Storing data on tapes or disks frees the more expensive fast core. I n most machine configurations there is a much larger amount of m e m o r y available in the slower devices. When slow devices can be used efficiently for processing band matrices, the capability of manipulating large order sparse matrices is limited by the m a x i m u m allowable execution time and the desired accuracy limits of the results, and not by the order of the matrix involved. To further conserve execution time, but at the expense of fast memory, the entire band matrix can be stored in fast core. Preserving Computing Surveys, Vol 5, No 2, June 1973 lO0 5 99 -199. lOl 98 5 -199 98 lOl 5 0 -199, I02 97 5 -199 97 102 5 -199 I03 96,5 -199 0 96 103 5 -199 104 95 5 -199 / FIG 10. Band matrix. the entire matrix in fast core eliminates the transmission times between fast core and auxiliary devices, as well as the time required to restore elements in fast core, which is done prior to data manipulation and processing. Another prime a d v a n t a g e directly involved with data transmission is the use of overlapping channels in burst or select mode. However, when the matrix is fully maintained in fast core, channels will then be available to other users on multi-user computers. If the band matrix has full bands, t h a t is, no row has any zero elements within the band, then the total number of elements to be stored is the band width multiplied by the number of rows in the matrix. Figure 10 depicts a band matrix with full bands (a band width of 3 here): EXAMPLE 5. Figure 10 is the resulting 9 X 9 matrix obtained by using a central difference approximation (3 points) to solve the boundary-value differential equation 2 + 3t 2 = y + y' + y" using 10 intervals between the points y ( t = 0) = 0. a n d y ( t = 1) = 1. A 5-point interpolation would yield a band width of 5; 50 intervals would result in a 49 X 49 matrix. N o t e that the augment column, a constant associated with each row of the matrix, is not considered here as an integral part of the sparse matrix. Accuracy of results depends on the number of intervals, n u m b e r of points in the interpolation formula, and computer round off. I n one particular application of processing a band matrix by rows (columns), it is convenient and efficient to store elements in full vectors, one vector for each super- or sub-
  • 15. Indexing Techniques for Sparse Matrices diagonal of the band matrix. Since the diagonal has the greatest number of elements, the vector for the diagonal will be the largest vector. To avoid double indexing, which takes greater execution time, an additional table of addresses is created. Each element of the address table contains the address of the first element of the respective vector. The indexing scheme in the algorithm used to arithmetically manipulate the band matrix is then altered to suit the storage scheme. If, for some reason, it is more convenient to store elements in a row or column form, e.g., because of a very difficult or time-consuming arithmetic manipulation, most of the advantage of employing a band scheme is lost, and other methods of indexing should be considered. Band matrices, as noted above, are unusual from an indexing standpoint because of the very slight core requirements for indexing. For the application described above, only W = I , V / B words are required for indexing; where I is the number of rows; V is the number of bits used for a row index element; and B is the number of bits per word. As a percentage (E) of the full matrix, this indexing requirement is Ezand = 1 0 0 / J % where J is the number of columns in the matrix when full words are used for the table of addresses. If hMf-words are adequate, it decreases this requirement further by onehalf. It should be brought to the attention of the user that in the instance where bands do contain zero elements, a decision should be made whether to employ a band scheme, which may not be very efficient in use of core if a large number of null entries exists, or some other particular scheme, such as a block-diagonM scheme, which may not conserve execution time. Many papers [4, 10, 34, 40-43] are concerned with band matrices, primarily, as said, because of the prevalence of band matrices in many specific fields of interest. Also, many algorithms are readily available for processing band matrices; FOaTRAN M • 123 [44] being one of the more recent programming packages. VII. CONCLUSION In the previous sections four major types of indexing methods were discussed, three of which are in general use: the bit map scheme, the row-column scheme, and the threaded list scheme. Each major type, of course, has many variations (the address map method is not in general use at present, so no variations occur). The important special case of the band matrix is discussed as a separate entity, because it is not a general case of a sparse matrix, even though it has wide application. As stated in the Introduction, one of the major considerations in selecting a particular indexing method is the amount of fast core the method requires, in addition to the data elements. The indexing in the bit map method requires a fast core allocation of approximately 4 % of the full matrix; in the address map method indexing requires about 25 % of the full matrix. The row-column and threaded list schemes have no definite core requirements for indexing, and fast memory for indexing is directly proportional to the sparse matrix density. The percentage of the full matrix required for indexing a rowcolumn scheme is about one times the matrix density, and about twice the density is required for a threaded list scheme. Previous discussion indicated that an exact comparison of execution times must reflect the type of mathematical manipulation being performed on the sparse matrix. For example, the bit map method is of particular use when the matrix is used to produce an "optimal" ordering, so the matrix inverse will not have a greatly increased density. In contrast, the row-column method is faster than other methods when manipulations involve one row (column) acting on other rows (columns). The second important aspect of indexing scheme selection is the conservation of execution time. If arithmetic operations are to be performed on the data, primary consideration should first be given to a rowcolumn method; if Boolean arithmetic or Computmg Surveys, Vol. 5, No 2, June 1973
  • 16. 124 • U. W. Pooch and A. Nieder reordering algorithms are to be performed, the bit map scheme should be considered first; and if a great number of data elements are to be reordered, created, or annihilated, a threaded list scheme deserves first consideration. The bit map scheme has a definite core allocation for indexing, offers a reasonable row access time, is quite fast in execution time when row operations are performed, is core efficient when the matrix density is greater than 4 %, and allows very fast manipulation of logical (Boolean) operations. Logical operations can be conveniently used to determine when arithmetic operations are to be executed. As to its disadvantages: the bit map scheme has extremely poor column access time when elements are ordered by rows, which in most cases requires transposing the bit map and reordering the data elements: it makes poor use of parallel processing, requires considerable time to reorder data elements, and is not core efficient when matrix density fails below 4 %. The address map proves advantageous when character addressing is available, makes very efficient use of parallel processors, provides ready access to any element, does not require an extensive amount of execution time (in comparison to the bit map scheme) to reorder data elements, and exhibits a reasonable row and column execution time. The primary disadvantages of the address map method are: a large fast core requirement for indexing; and the relatively large execution time, in comparison with the threaded list scheme, to reorder matrix elements. Both bit and address maps require significant execution times to transpose the mat r i x - t h e map must be transposed, and all the data elements must be reordered. Execution time to transpose the matrix is directly proportional to the order of the matrix and the matrix density. Primary advantages of the row-column schemes are: a very fast row access time in comparison with the bit and address maps; a relatively fast column access time in comparison to all other methods; conservation of Computing Surveys, Vol 5, N o 2, June 1973 core with matrices of less than 4% density when compared to the bit map method; an increase in efficiency as the order of the matrix increases, as more complex variations become more efficient; and faster reordering than the bit map or address map methods. The main disadvantages of the row-column scheme are that column access time and the time required to reorder elements greatly increase as the matrix order a n d / o r matrix density increases. The threaded list technique is the sole technique that allows a simple and fast executing method of reordering, adding, or annihilating data elements. The threaded list scheme exhibits a variety of disadvantages, the primary ones being a large core requirement for indexing in comparison with the row-column method, a slow access time for rows when elements are stored by rows, and an even slower access time for columns compared with the rowcolumn method. The inclusion of orthogonal links, as discussed by K n u t h [35], removes some of the column access difficulties, but only at the price of additional storage. For the special case of band matrices, a scheme similar to the one described in Part VI should be used unless either half or more of the elements within the b, nd width are null, or the nature of the mathematical operations to be performed dictates otherwise (as described in Part VI). If the band matrix scheme cannot he utilized, the user must decide which characteristics of the other types of indexing are considered vital to the solution, and select a method on this basis. A final major aspect of indexing the user must consider concerns the adaptability and flexibility of programming the selected scheme, which depends upon the factors enumerated in the Introduction. The following suggestions and comments concerning programming flexibility and adaptability are offered. None of the major types of indexing schemes requires double indexing. Double indexing involves using one register (adder) to index across the row, and another register to index down the column. Double indices have at least three drawbacks: they require
  • 17. Indexing Techniquesfor Sparse Matrices more time than single indices; the computer may have a built-in limit on the number of characters or words that can be indexed by one or both of the registers before a new index (base) register must be designated;and registers are at a premium, because of the extremely fast register to register operation time, and should be used for more vital arithmetic. In the last analysis, the increased time involved in double indexing is the critical factor. In general, the larger the order of the matrix, the lower the matrix density. Because of this the row-column method is preferred for matrices with orders of 1000 or more, especially when arithmetic manipulations or operations are to be performed. As the order of the matrix increases, it becomes more efficient to employ more complex variations of the major types. For instance, the delta indexing scheme (as described in Part VI) conserves a considerable amount of fast core compared with the simpler row-column schemes, without a great increase in execution time, when the order approaches 1000. If the matrix requires more fast core than is available, the user must decide either to segment the matrix between fast and slow core, or to reduce the complexity of the problem. If the problem can be simplified, or the matrix condensed or partitioned (blocked), then it is not necessary to segment the matrix between fast and slow core. Simplifying the matrix involves the real consideration of whether or not it is economically feasible to reorder rows and/or columns to produce a new matrix that can be more efficiently processed. Many schemes have been developed [7, 16, 18, 27] to attempt such an optimal ordering of matrix elements. Condensing the matrix involves the elimination of data elements that produce insignificant or negligible change in the results. Such condensing can often be done with reasonable competence by somebody skilled in the nature of the problem to be solved. If the matrix is of block-diagonal form, each block can be processed as a separate entity to produce a composite result. The availability of a virtual memory • 125 processor might lead the user to the erroneous conclusion that the benefits of a proper indexing algorithm are negated. This is not so; at some time during the processing of a sparse matrix the matrix must reside in physical memory. It then follows that the fewer the number of pages occupied by the sparse matrix, the fewer the page faults generated, and therefore the less time involved in moving the matrix to and from peripheral paging devices. In other words, the same benefits accruing from indexing in an ordinary processor apply in a virtual memory processor.When such updating of data files is anticipated, the user should designate buffer storage. When new matrix elements are introduced, they should be stored in the buffer area. When a considerable humber of corrections to the data elements exist (about 5%), then the matrix is reordered. The threaded list scheme requires no separate buffer area, as a buffer is inherent in the indexing scheme. The segments of coding that contain the actual indexing algorithm should be programmed in a low-level language, such as assembly language, to conserve execution time. High-level languages, such as FOgWRhN utilize a compiler, which may not produce the most efficient coding. For instance, if a division by 32,768 is necessary, the high-level language may simply create a division by 32,768 in assembly language. If the highlevel compiler, however, recognized that a division by 32,768 is identical to shifting an accumulator right 16 bits, the assembly language version would be a shift right logical or shift right double logical. The first version would require significantly more execution time than the more efficient assembly language program version. A considerable savings is realized when the computation is performed perhaps as many as several million times in a program. The user should avoid making the indexing algorithm in a subroutine form, especially in a high-level language, because of the added linkage time during program execution. While a "fast" algorithm for indexing into arbitrarily sparse matrices would allow very Computing Surveys, Vol. 5, No. 2, June 1973
  • 18. 126 • U. W . Pooch and A . N~eder efficient core storage allocation and execution times for matrix manipulations, it is also evident that no such single algorithm exists, at least at present. The advent of array processors and pipeline computers may eliminate the desire to handle sparse matrices in any special manner whatsoever. However, it also appears that no matter how large, or how fast and sophisticated, computing machines become, users will continue to strive for core storage conservation and faster execution times. It remains to be seen if sufficiently sophisticated indexing algorithms will be developed to accomplish those goals in array or pipeline machines; or whether such machines will come into Computing Surveys, Vol. 5, No 2, June 1973 general use and provide an environment conducive to developing sparse matrix indexing schemes. For the present, the choice of an indexing algorithm depends upon many considerations, with each major type of indexing discussed here having particular advantages and disadvantages. Careful selection of an algorithm can satisfactorily achieve the goals of conservation of core memory and execution time. In addition, whenever there exists some pattern to the nonzero entries, the possibility of reorganizing the calculations as a means to handle some sparse matrices should be carefully considered.
  • 19. Indexing Techniques for Sparse Matrices • 127 APPENDIX ALGORITHM 1 BIT MAP SCHEME Statement Meaning is t h e row n u m b e r t h a t will b e m a m p u l a t e d v is t h e row i n d e x v e c t o r b = n u m b e r of b i t s / w o r d (* -- 1) J is t h e n u m b e r of c o l u m n s in t h e m a t r i x (z -- 1) * J Save (z- l)*J (((z1 ) * J ) 4- b - D S, = (((~ - l) * J ) 4- b -- 1 ) / b w o r d c o n t a i n s t h e first b i t of r e q u i r e d row E n d of row c o u n t e r ( J ) S t a r t i n g w o r d of t h e r o w D e t e r m i n e c o r r e c t n u m b e r of d i s p l a c e m e n t b i t s ; M A S K = m a s k for m a x i m u m d i s p l a c e m e n t bits S h i f t to e l i m i n a t e i n c o r r e c t b i t s ( f r o m p r e v i o u s 01 02 03 04 05 06 07 08 09 R O W ~R I N D E X ~- v(~) BITS ~ b R O W ¢-- R O W - 1 C O L S ~- J ROW e- ROW * COLS S A V E (--- R O W R O W ~- R O W 4- B I T S R O W ~- R O W / B I T S 10 ll 12 R O W E N D (-- C O L S S T A R T ~- R O W R O W E N D ~- R O W E N D MASK 13 S T A R T *- S T A R T * 2 * * S A V E 14 15 16 C O U N T R O W E N D *- R O W E N D GO TO ROWSCAN R O W E N D *-- B I T S 17 18 19 R O W S C A N R O W ~- R O W 4- 1 W O R D ~-- b i t w o r d f r o m m a p W O R D B 1 T *-- b i t f r o m b i t - w o r d 20 C O L N U M (-- C O L N U M 4- 1 Increment column number 21 WORDB1T 22 IF YES, GO TO MATH Following statements are branch controls Is t h e b i t n o n - z e r o ? Yes, an element exists. 23 E N D R O W 24 COLNUM = COLS ~ IF YES, GO TO END1 Is t h e c o l u m n c o u n t e r e q u a l to t h e r o w c o u n t e r ? Y e s , e n d of row 25 COLNUM 26 27 28 M A T H I F Y E S , GO T O C O U N T GO TO ROWSCAN R I N D E X e - - R I N D E X 4- 1 H a v e we s h i f t e d c o m p l e t e l y t h r o u g h b i t m a p word? Yes, fetch another word. N o , s c a n n e x t b i t in w o r d R I N D E X = a d d r e s s of n o n z e r o e l e m e n t 1 AND rOW) - SAVE C o r r e c t for e h m i n a t e d b i t s B r a n c h to code to s c a n row in b i t m a p for 1 b i t s F o l l o w m g code s c a n s o n e e n t i r e r o w of a b i t m a p . A f t e r first w o r d of row is s c a n n e d , t h e b i t counter (ROWEND) = b Increment bit map word address by one W o r d of b i t m a p P i c k u p h{gh o r d e r b i t f r o m b i t w o r d (WORD) = 1 = ROWEND COLNUM element = column number of non-zero P e r f o r m r e q u i r e d o p e r a t i o n on e l e m e n t 29 30 E N D 1 GO TO ENDROW STOP Computing Surveys, ¥oi 5, No 2, June 1973 Return E n d of o p e r a t i o n o n t h e row.
  • 20. • 128 U. W . Pooch and A . Nieder ALGORITHM Statement 01 02 03 04 05 06 07 S T A R T 08 09 10 ll I 12 13 14 15 16 M A T H 2: A D D R E S S MAP SCHEME R O W *-- i R I N D E X ~-- v(~) R O W *-" R O W -- 1 C O L S ¢-- 3 R O W (-- R O W * C O L S R O W (-- R O W - 1 R O W ~-" ROW + 1 C O L N U M (--- C O L N U M + 1 COLNUM > COLS IF YES, GO TO ENDROW B Y T E ~- b y t e f r o m address map BYTE ~ 0 IF YES, GO TO START C H E C K ~-- 0 CHECK *- BYTE CHECK ~ CHECK + RINDEX Meaning i = row v = row index vector (i - 1) 3 = $ columns j*(~ -- 1) (3*(2 -- 1)) -- 1 Increment across row Increment column $ E n d of row Yes, done Pick up partial word I s b y t e zero? Reenter scan process Zero w o r k a r e a Byte to work area Points to non-zero element Required operations performed here 17 18 E N D R O W 19 GO TO START STOP END ALGORITHM Statement 01 O2 03 O4 O5 O6 07 S T A R T O8 O9 10 11 12 13 14 15 16 17 M A T H Reenter scan process Finish 3: A D D R E S S MAP SCHEME B E G I N *-- A d d r e s s of a d d r e s s m a p B E G I N ~-- B E G I N + J B E G I N *-- B E G I N -- 1 C O L S (-- 3 ROWS ~ B E G I N ~-- B E G I N - C O L S B E G I N ~-- B E G I N + C O L S R I N D E X *- v(I) R O W C T R ~-- R O W C T R + 1 ROWCTR > COLS IF YES, GO TO ENDROW BYTE *- byte from address map BYTE = 0 ~ IF YES, GO TO START C H E C K ~- 0 C H E C K ~- B Y T E C H E C K ~-- C H E C K + R I N D E X Meaning Pointer J = column g 3 = g columns i = g rows Increment address Row index vector I n c r e m e n t row c o u n t e r P a s s e d e n d of m a t r i x ? Yes, passed end Pick up partml word Is b y t e zero? Reenter scan process Zero w o r k a r e a Byte to work area P o i n t s to n o n - z e r o e l e m e n t Required operations performed here 18 19 E N D R O W 20 GO TO START STOP END Reenter scan process Finish Computing Surveys, Vol 5, No 2, June 1973
  • 21. Indexing Techniques for Sparse Matrices I • 129 ROW + i I ,1,, )-- IR,,DEX + v(i) _ i_ __~ ....... ~, ~IT÷bit frombitmap I COLS÷ J ( D [Row ~- RO.*CO'S i $ ,, . I"R W O÷ ¢ NO ¢ (ROW + BITS - I ) / B I T S l ! [i.o.~,o: ~oc,~ .~ [START ÷ R W OI • ~ , I MASK& SHIFT R W N I O ED I .... RowEND~ NO oc.o. ; O"E"9 @.o @ R W N- SAVEI O ED FIG A1. Flowchart--algorithm 1 bit map scheme. Computing Surveys, VoL 5, No. 2, JuBe 1973
  • 22. 130 • U. W. Pooch and A. Nieder C E K÷ 0 HC ~ NDEX÷ v(i) ICHECK÷BYTE F~o~~o~-~I ~-EX~ ICHECK÷ C E K+ RIND - HC CL÷j OS @ O _,] ~,ROW÷R W+ l YS E ( IS COLNUM,COLS~ ~ ~NO ~_~TE ÷ bYte from address map ] ~IS BYTE= 0?~ @"° FIG A2 YES~ F l o w c h a r t - - a l g o r i t h m 2: a d d r e s s m a p s c h e m e Computing Surveys, Vol 5, No 2, June 1973
  • 23. Indexing Techniques for Sparse Matrices BEGIN I, ÷ address map address I • 131 CHECK 0 ÷ T FCHECK÷ BYTE BEGIN ÷ BEGIN + J - l C E K÷ C E K+ HC HC ~ BEG,N ÷ RINDEX $ BEGIN + CO'S l [ RINDEX÷ v(I> I ( ~ RO.C~> c o ~ ~ '~._j / NO I-BYTE+ byte from address map~ ~ ~,~ o~; Y~ < ~ + Fza A3 F l o w c h a r t - - a l g o r i t h m 3 address m a p scheme. Computing Surveys, Vol, 5, No. 2, June 1973
  • 24. 132 • U. W. Pooch and A. Nieder BIBLIOGRAPHY 1. BRAYTON, R., GUSTAVSON, F., AND WILLOUGHBY~ R. "Some results on sparse matrices." RC2332, IBM Watson Research Center, (February 1969), 37-46. 2. LARSEN, L. "A modified inversion procedure for product form of the inverse-linear programruing codes " Comm. ACM 5, 7 (July 1962) 382383 3. LIVESLEY,R. "An analysis of large structural system." Comp. J. 3, (1960)34-39. 4. McCoRMICK,C.W. "Application of partially handed matrix methods to structural analysis." Sparse Matrix Proceedings, R. Willoughby (Ed.) IBM Watson Research Center, RAl1707 (March 1969) 155-158 5 ORCHARD-HAYs, W. Advanced L~near Programming Techniques McGraw-Hill, New York, 1968, 73-82. 6. TEWARSON, R. "On the product form of inverse of sparse matrices." S I A M Rewew 8, (1966) 336-342. 7 TEWARSON,R. "Row column permutation of sparse matrices." Comp. J 10, (1967/68) 300-305 8. BRAYTON, R., GUSTAVSON, F , AND WILLOUGHBY, R. "Some results on sparse matrices." (Introduction), RC2332, IBM Watson Research Center, (February 1969) 1-3. 9. BASHKOW, T "Network analysis." Mathematical Methods for Digztal Computers A. Ralston and A. S. Wilf, Eds., Vol. I, John Wiley and Sons, New York, 1967280-290 1O. TINNEY,W F. "Comments on using sparsltv techniques for power system problems." Sparse Matrix Proceedings R Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 25-34. 11. PALACOL,E . L . "The finite element method of structural analysis " Sparse Matmx Proceedzngs R. Willoughby Ed., IBM Watson Research Center, RAl1707 (March, 1969) 101-5. 12. RALSTON, A. "Numerical integration methods for the solution of ordinary differential equations." Mathematzcal Metaods for Dzgztal Computers A. Ralston and A. S. Wilf Eds, Vol. I, John Wiley and Sons, New York, 1967, 95109. 13. ROMANELLI, M "Runge-Kutta methods for the solution of ordinary differentml equations " Mathematzcal Methods for Dzgztal Computers A. Ralston and A S Wilf, Eds , Vol. I, John Wiley and Sons, New York 1967, 110-20. 14. WAC~SPRESS,E "The numerical solution of boundary value problems " Mathematzcal Methods for Dzgztal Computers A Ralston and A. S. Wflf, E d s , Vol. I, John Wiley and Sons, New York, 1967, 121-7. 15. WEIL, R,, JR, AND KETTLER, P. " A n algorithm to provide structure for decomposition." Sparse Matrzx Procee&ngs R. Willoughby, Ed., IBM Wa~sca Research Center, RAl1707 (March, 1969) 11-24 16. GUSTAVSON,F., LINIGEB,W., WILLOUGHBY,R. "Symbohc generation of an optimal crout algorithm for sparse systems of linear equa- Computing Surveys, Vol. 5, No 2, June 1973 17. 18. 19 20. 21. 22 23. 24. 25. 26. 27 28. 29. 30. 31. 32. 33. 34. tlons." Sparse Matrix Proceedings R. Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 1-10. SMI~I, D . M . "Data logistics for matrix inversion." Sparse Matrix Proceedings R. Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 127-32. SPILLERS,W. R., AND t~ICKERSON,N. "Optimal elimination for sparse symmetric systems as a graph problem." Quar Appl. Math. 26 (1968) 425-32 STEWARD,D. V. "On an to technique for the analysis of thepproacha of large structure systems of equations." S I A M Rev 4 (1962) 321-42. TEWARSON,R . P . "The Gausslan elimination and sparse systems," Sparse Matrzx Proceed~ngs R. Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 35-42. GIVENS, W., McCoRMICK, HOFFMAN, et al. "Panel discussion on new and needed word and open questions." (Chairman P. Wolfe), Sparse Matmx Proceedings R. Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 159-80. WILKES, M. V. "The growth of interest in microprogramming: a literature survey," Com p. Surveys, 1,3 (September, 1969) 139-45. ORC~ARD-HAYs,W. " M P s y s t e m s technology for large sparse matrices." Sparse Matrix Proceedzngs R. Willoughby, Ed , IBM Watson Research Center, RAl1707 (March, 1969) 59-64. CHANG, A. "Apphcatlon of sparse matrix methods in electric power system analysis." Sparse Matrix Proceedings R. Willoughby, Ed., IBM Watson Research Center, BAll707 (March, 1969) 113-122. BRAYTON, n . , GUSTAVSON, F., WILLOUGHBY, R "Some results on sparse matrices." IBM Watson Research Center, RC2332 (February 1969) 21-22. CHhRTRES, B A., ANn GLUDEN, J C. " C o m putable error bounds for direct solution of hnear equations." J ACM 14, 1 (Jan 1967) 63-71 FORSY~HE, G. E. "Crout with pivoting." Comm. ACM 3 (1960) 507-8. JENNINGS,A. "A compact storage scheme for the solution of symmetric linear simultaneous equations." Comput. J. 9 (1966/67) 281-5 System 360 Matrix Language (MATLAN) Application Description, IBM H20-0479 Program Description Manual, IBM H20-0564 McNAMEE, J M. "Algorithm 408, a sparse matrix package." (Part I), Comm ACM 4, 4 (April 1971) 265-273. DULMAGE, A L., AND MENDELSOHN, N. S. "On the inversion of sparse matrices." Math. Comp. 16 (1962) 494-496. MAYOH,B.H. "A graph technique for inverting certain matrices." Math. Comp. 19 (1965) 644-646. RoT~, J. P. "An application of algebraic topology: Kron's method of tearing " Quar. Appl. Math. 17 (1959) 1-24 SWIFT, G "A comment on matrix inversaon by partition." S I A M Rev. 2 (1960) 132-33.
  • 25. Indexing Techniques for Sparse Matrices 35. KNUTH, D. ]~. The Art of Computer Programm~ng, Vol. I, Addison--Wesley, Reading, Mass. 1968 299-304, 554-556. 36. BERZTISS, A . T . Data Structures: Theory and Practice. Academic Press, New York, 1971, 276-279. 37. LARCOMBE, M. "A hst processing approach to the solution of large sparse sets of matrix equations and the factorization of the overall matrix." in Large Sparse Sets of L~near Equatwns, Reid, J. K., Ed., Academm Press, London, 1971. 38. WEIL, R. L., ANDKI~TTLER,P . C . "Rearranging matmces to block-angular form for decompotation (and other) algorithms." Management Science 18, 1 (Sept. 1971) 98-108. 39. GUSTAVSON, F. G. "Some basic techniques for solving sparse systems of linear equations " in Sparse Matmces and Their Applications, Rose, D J , and Willoughby, R. A., Eds., Plenum Press, New York, 1972 41-52. 40. FIKE, C . T . PL/I for Scientific Programmers, 41. 42. 43. 44. 45. 46. • 133 Prentice-Hall, Englewood Cliffs, N. J., 1970 108, 180. WILLOUGHBY, R. A. "A survey of sparse matrix technology." IBM Watson Research Center, RC3872 May 1972. CuTmt.t., E. "Several strategies for reducing the band-width of matrices." in Sparse Matraces and their Applications, Rose, D . J., and Willoughby, R. A., Eds., Plenum Press, New York, 1972, 34-38. TEWARSON,R . P . "Computations withsparse matrices." SIAM Rev., 12, 4 (Oct. 1970) 527543. PETTY, J. S. "FORTRAN M: programming package for band matrices and vectors." Aerospace Research Labs., Wright-Patterson AFB, Ohio, ARL-69-0064 (April, 1969). SHLL~RS, W . R . "On Diakoptics: Tearing an arbitrary system." Quar. Appl. Math. 23 (1965) 188-90. IBM System/360 Model 65 Functional Characteristics, IBM A22-6884-3, File No. $360-01. Computing Surveys, VoI. 5, No. 2, June 1973