Introduction of data structures and algorithms

1
Vinay Kumar V, Assistant Professor, CS Department
BCA203T DATA STRUCTURES
MODULE 1:
INTRODUCTION
 Definition
 Elementary data organization
 Data Structures
 Data structures operations
 Abstract data types
 Algorithms complexity
 Time-space tradeoff
 Preliminaries: Mathematical notations and functions
 Algorithmic notations
 Control structures
 Complexity of algorithms
 Asymptotic notations for complexity of algorithms
 String Processing: Definition
 Storing Stings
 String as ADT
 String operations
 Word/Text processing
 Pattern Matching algorithms

2
 Basic Terminology
1. Data: Data are simply values or sets of values.
 The word "data" was first used to mean "Transmissible and storable computer information" in
1946.
Data Documents: whenever data needs to be registered, data exists in the form of a data
documents. Different kinds of data documents include:
 Data repository
 Data study
 Data set
 Software
 Data paper
 Data base
 Data handbook
 Data journal etc...
Structure: Structure means to purposefully arrange something in a specific way.
 Structure is an arrangement and organization of interrelated elements in a system.
 The arrangement of and relations between the parts of something complex.
 A building or other object constructed from several parts.
2. Data items: Data items refer to a single unit of values.
Data items that are divided into sub-items are called Group items. Ex: An Employee
Name may be divided into three sub items - first name, middle name, and last name.
Data items that are not able to divide into sub-items are called Elementary items.
Ex: R.No, Gender, Age etc...
3. Entity: An entity can be a real-world object, either animate or inanimate, that can be easily
identifiable. The values may be either numeric or non-numeric.
Attributes Names, Age, Sex, SSN
Values Rohland Gail, 34, F, 134-34-5533
For Example, in a college data base students, Staff, classes and courses offered can be
considered asentities.
All these entities have some attributes that give them their identity.
4. Field is a single elementary unit of information representing an attribute of an entity.
5. Record is the collection of field values of a given entity.
6. File is the collection of records of the entities in a given entity set.
Each record in a file may contain many field items but the value in a certain field may uniquely
determine the record in the file. Such a field K is called a primary key and the values k1, k2,
….. in such a field are called keys or key values.

3
Records may also be classified according to length.
A file can have fixed-length records or variable-length records.
 In fixed-length records, all the records contain the same data items with the same amount
of space assigned to each data item.
 In variable-length records file records may contain different lengths.
Example: Student records have variable lengths, since different students take different
numbers of courses. Variable-length records have a minimum and a maximum length.
The above organization of data into fields, records and files may not be complex enough to
maintain and efficiently process certain collections of data. For this reason, data are also
organized into more complex types of structures.
 Data Structures
Data may be organized in many different ways. The logical or mathematical model of a
particular organization of data is called a data structure.
The choice of a particular data model depends on the two considerations
1. It must be rich enough in structure to mirror the actual relationships of the data in the
real world.
2. The structure should be simple enough that one can effectively process the data
whenever necessary.
In computer science, a data structure is a data organization, management and storage format
that enables efficient access and modification. More precisely, a data structure is a collection of data
values, the relationships among them, and the functions or operations that can be applied to the data.
The study of complex data structures includes the following three steps:
1. Logical or mathematical description of the structure
2. Implementation of the structure on a computer
3. Quantitative analysis of the structure, which includes determining the amount of
memory needed to store the structure and the time required to process the structure.
Data Structures Usage
Data structures serve as the basis for abstract data types (ADT). The ADT defines the logical form of
the data type. The data structure implements the physical form of the data type.
Different types of data structures are suited to different kinds of applications, and some are highly
specialized to specific tasks. For example, relational databases commonly use B-tree indexes for data
retrieval, while compiler implementations usually use hash tables to look up identifiers.
Data structures provide a means to manage large amounts of data efficiently for uses such as
large databases and internet indexing services. Usually, efficient data structures are key to designing
efficient algorithms. Some formal design methods and programming languages emphasize data

4
structures, rather than algorithms, as the key organizing factor in software design. Data structures
can be used to organize the storage and retrieval of information stored in both main
memory and secondary memory.
Data Structures Implementation
Data structures are generally based on the ability of a computer to fetch and store data at any place in
its memory, specified by a pointer—a bit string, representing a memory address, that can be itself
stored in memory and manipulated by the program. Thus, the array and record data structures are
based on computing the addresses of data items with arithmetic operations, while the linked data
structures are based on storing addresses of data items within the structure itself.
The implementation of a data structure usually requires writing a set of procedures that create and
manipulate instances of that structure. The efficiency of a data structure cannot be analyzed
separately from those operations. This observation motivates the theoretical concept of an abstract
data type, a data structure that is defined indirectly by the operations that may be performed on it,
and the mathematical properties of those operations (including their space and time cost).
Classification of Data Structures
Data structures are generally classified into
 Primitive data Structures
 Non-primitive data Structures
1. Primitive data Structures: Primitive data structures are the fundamental data types which
are supported by a programming language. Basic data types such as integer, real, character

5
and Boolean are known as Primitive data Structures. This data type consists of characters that
cannot be divided and hence they also called simple data types.
2. Non- Primitive data Structures: Non-primitive data structures are those data structures
which are created using primitive data structures. Examples of non-primitive data structures
are the processing of complex numbers, linked lists, stacks, trees, and graphs.
Based on the structure and arrangement of data, non-primitive data structures is further
classified into
1. Linear Data Structure
2. Non-linear Data Structure
1. Linear Data Structure:
A data structure is said to be linear if its elements form a sequence or a linear list. There are
basically two ways of representing such linear structure in memory.
1. One way is to have the linear relationships between the elements represented by means
of sequential memory location. These linear structures are called arrays.
2. The other way is to have the linear relationship between the elements represented by means of
pointers or links. These linear structures are called linked lists.
The common examples of linear data structure are Arrays, Queues, Stacks, Linked lists
Static Data Structure:
In Static data structure the size of the structure is fixed.
Array:
Array is a container which can hold a fix number of items and these items should be of the same
type. The simplest type of data structure is a linear (or one-dimensional) array. By a linear array,
we mean a list of a finite number n of similar data elements referenced respectively by a set of n
consecutive numbers, usually i, 2, 3,.. n. If we choose the name A for the array, then the elements
of A are denoted by subscript notation
A[l], A(2], A[3)...............A[N]
Regardless of the notation, the number K in A[K| is called a subscript and A|K] is called a
subscripted variable.
Element − each item stored in an array is called an element.
Index − each location of an element in an array has a numerical index, which is used to identify
the element.

6
Linear arrays are called one-dimensional arrays because each element in such an array is
referenced by one subscript. A two-dimensional array is a collection of similar data elements
where each element is referenced by two subscripts.
Dynamic Data Structure:
In Dynamic data structure the size of the structure in not fixed and can be modified during the
operations performed on it
Stack:
A stack, also called a Last-In First-Out (LIFO) system, is a linear list in which insertions and
deletions can take place only at one end, called the top. This structure is similar in its operation to
a stack of dishes on a spring system as shown in fig.
Note that new 4 dishes are inserted only at the top of the stack and dishes can be deleted only
from the top of the Stack.

7
Queue:
A queue, also called a First-In First-Out (FIFO) system, is a linear list in which deletions can
take place only at one end of the list, the "from'' of the list, and insertions can take place only at
the other end of the list, the rear of the list.
This structure operates in much the same way as a line of people waiting at a bus stop, as
pictured in Fig. the first person in line is the first person to board the bus. Another analogy is
with automobiles waiting to pass through an intersection the first car in line is the first car
through.
Linked List:
A linked list is a sequence of data structures, which are connected together via links.
Linked List is a sequence of links which contains items. Each link contains a connection to
another link. Linked list is the second most-used data structure after array. Following are the
important terms to understand the concept of Linked List.
Link − each link of a linked list can store a data called an Element/Data Item.
Next − each link of a linked list contains a link to the next link called Next.
Linked List − A Linked List contains the connection link to the first link called First.
2. Non-linear Data Structure:
A data structure is said to be non-linear if the data are not arranged in sequence or a linear. The
insertion and deletion of data is not possible in linear fashion. This structure is mainly used to
represent data containing a hierarchical relationship between elements. Trees and graphs are
the examples of non-linear data structure.
Trees:
Data frequently contain a hierarchical relationship between various elements. The data
structure which reflects this relationship is called a rooted tree graph or a tree.

8
Important Terms
Following are the important terms with respect to tree.
 Path − Path refers to the sequence of nodes along the edges of a tree.
 Root −The node at the top of the tree is called root. There is only one root per tree and one path from
the root node to any node.
 Parent −Any node except the root node has one edge upward to a node called parent.
 Child −The node below a given node connected by its edge downward is called its child node.
 Leaf −The node which does not have any child node is called the leaf node.
 Subtree −Subtree represents the descendants of a node.
 Visiting −Visiting refers to checking the value of a node when control is on the node.
 Traversing −Traversing means passing through nodes in a specific order.
 Levels −Level of a node represents the generation of a node. If the root node is at level 0, then its next
child node is at level 1, its grandchild is at level 2, and so on.
 Keys −Key represents a value of a node based on which a search operation is to be carried out for a
node.
Graph:
A graph is a pictorial representation of a set of objects where some pairs of objects are connected by
links. The interconnected objects are represented by points termed as vertices, and the links that
connect the vertices are called edges.
Formally, a graph is a pair of sets (V, E), where V is the set of vertices and E is the set of edges,
connecting the pairs of vertices. Take a look at the following graph.

9
In the above graph,
V = {a, b, c, d, e}
E = {ab, ac, bd, cd, de}
Data Types & Range
 DATA STRUCTURES OPERATIONS
The data appearing in data structures are processed by means of certain operations.
The following four operations play a major role in thistext:
1. Traversing: accessing each record/node exactly once so that certain items in the record
may be processed (This accessing and processing is sometimes called visiting the record).
2. Searching: Finding the location of the desired node with a given key value or finding the
locations of all such nodes which satisfy one or more conditions.
3. Inserting: Adding a new node/record to the structure.
4. Deleting: Removing a node/record from the structure.
The following two operations, which are used in special situations like sometimes two or more of the
operations may be used in a given situation; e.g., we may want to delete the record with a given key,
which may mean we first need to search for the location of the record.
1. Sorting: Arranging the records in some logical order (e.g., alphabetically according to
some NAME key, or in numerical order according to some NUMBER key, such as social
security number or account number)
2. Merging: Combining the records in two different sorted files into a single sorted file.
Example of List Representation
Consider a list L consisting of data items—1, 2, 3, 4, 5, 6, 7, 8, 9 as shown in Fig(a). We can use
any of four data structures to support L—a linear list, a matrix, a tree, or a graph, as given in Fig.
(b), (c) and (d) respectively.

10
 Abstract Data Types
Abstract Data type (ADT) is a type (or class) for objects whose behavior is defined by a set of
value and a set of operations.
The definition of ADT only mentions what operations are to be performed but not how these
operations will be implemented. It does not specify how data will be organized in memory and
what algorithms will be used for implementing the operations. It is called ―abstract because it
gives an implementation-independent view. The process of providing only the essentials and
hiding the details is known as abstraction.
The user of data type does not need to know how that data type is implemented, for example, we
have been using Primitive values like int, float, char data types only with the knowledge that these
data type can operate and be performed on without any idea of how they are implemented. So a
user only needs to know what a data type can do, but not how it will be implemented. Think of
ADT as a black box which hides the inner structure and design of the data type.
Abstract Data Type Model
ADT Model
There are two different parts of the ADT model – functions (public and private) and data structure both
are contained within the ADT model itself and do not come within the scope of application program.
On the other hand data structures are available to all of the ADT’s functions as required and a function
may call on any other function to accomplish its task.

11
S (P) = C + SP (I)
Data are entered, accessed, modified and deleted through the external application programming
interface. This interface can only access public functions. For each ADT operations there is an
algorithm that performs its specific task. The operation name and parameters are available to the
application and they provide the only interface to the application.
 Algorithm Complexity
The time and space it uses are two major measures of the efficiency of an algorithm. The complexity of
an algorithm is the function which gives the running time and/or space in terms of the input size.
Suppose X is an algorithm and n is the size of input data, the time and space used by the algorithm X
are the two main factors, which decide the efficiency of X.
 Time Factor − Time is measured by counting the number of key operations such as comparisons
in the sorting algorithm.
 Space Factor − Space is measured by counting the maximum memory space required by the
algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space required by the
algorithm in terms of n as the size of input data.
Space Complexity
Space complexity of an algorithm represents the amount of memory space required by the algorithm in
its life cycle. The space required by an algorithm is equal to the sum of the following two components −
 A fixed part that is a space required to store certain data and variables that are independent of the
size of the problem. For example, simple variables and constants used, program size, etc.
 A variable part is a space required by variables, whose size depends on the size of the problem.
For example, dynamic memory allocation, recursion stack space, etc.
Space complexity S (P) of any algorithm P is
Where,
C is the fixed part and
S (I) is the variable part of the algorithm, which depends on instance characteristic I.
Following is a simple example that tries to explain the concept −
Algorithm: SUM (A, B)
Step 1 - START
Step 2 - C ← A + B + 10
Step 3 - Stop
Here we have three variables A, B, and C and one constant. Hence S (P) = 1 + 3.
Now, space depends on data types of given variables and constant types and it will be multiplied
accordingly.
Time Complexity
Time complexity of an algorithm represents the amount of time required by the algorithm to run to
completion.
Time requirements can be defined as a numerical function T (n).

12
T (n) = c*n
Where,
T (n) can be measured as the number of steps, provided each step consumes constant time.
For example, addition of two n-bit integers takes n steps. Consequently, the total computational time is
Where,
c is the time taken for the addition of two bits.
Here, we observe that T(n) grows linearly as the input size increases.
Algorithm Analysis
Efficiency of an algorithm can be analyzed at two different stages, before implementation and
after implementation.
They are the following −
 A Priori Analysis − This is a theoretical analysis of an algorithm. Efficiency of an algorithm is
measured by assuming that all other factors, for example, processor speed, are constant and have
no effect on the implementation.
 A Posterior Analysis − This is an empirical analysis of an algorithm. The selected algorithm is
implemented using programming language. This is then executed on target computer machine.
In this analysis, actual statistics like running time and space required are collected.
 Time-Space Trade-off
 A time space tradeoff is a situation where by increasing the amount of space for storing the
data, one may be able to reduce the time needed for processing the data, or vice versa (
conversely, the computation time can be reduced at the cost of increased memory use) i.e a
situation where one thing increases and another thing decreases
 As the relative costs of CPU cycles, RAM space and hard drive space change—hard drive
space has for some time been getting cheaper at a much faster rate than other components
of computers the appropriate choices for time space tradeoff have changed radically.
 Often, by exploiting a time space tradeoff, a program can be made to run much faster.
 It is a way to solve a problem in:
o Less time and by using more space/memory
o By solving a problem in very little space by spending a long amount of time.
Types of Space-Time Trade-off
i. Compressed or Uncompressed data
ii. Re Rendering or Stored images
iii. Smaller code or loop unrolling
iv. Lookup tables or Recalculation
i. Compressed or Uncompressed data:
 A space-time trade-off can be applied to the problem of data storage.
 If data stored is uncompressed, it takes more space but less time.
 If the data is stored compressed, it takes less space but more time to run the decompression

13
algorithm.
 There are many instances where it is possible to directly work with compressed data. In that
case of compressed bitmap indices, where it is faster to work with compression than
without compression.
ii. Re-Rendering or Stored images:
 Storing only the source and rendering it as an image every time the page is requested would
be trading time for space.
More time used but less space
 Storing the images would be trading space for time.
More space used but less time
iii. Smaller Code (with loop) / Larger Code or Loop Unrolling (without loop):
 Smaller code occupies less space in memory but it requires high computation time that is
required for jumping back to the beginning of the loop at the end of each iteration.
 Larger code or Loop unrolling can optimize execution speed at the cost of increased binary
size. It occupies more space in memory but requires less computation time (as we need
not perform jumps back and forth since there is no loop).
iv. Lookup tables or Recalculation:
 In a lookup table, an implementation can include the entire table which reduces computing time but
increases the amount of memory needed.
 It can recalculate i.e., compute table entries as needed, increasing computing time but reducing
memory requirements.
Example
More time, Less space More Space, Less time
1. int a,b;
2. printf(“enter value of a n”);
3. scanf(“%d”,&a);
4. printf(“enter value of b n”);
5. scanf(“%d”,&b);
6. b=a+b;
7. printf(“output is:%d”,b);
1. int a,b,c;
2. printf(“enter values of a,b and c n”);
3. scanf(“%d%d%d”,&a,&b,&c);
4. printf(“output is:%d”,c=a+b);
 Mathematical notations and functions
This section gives various mathematical functions which appear very often in the analysis of
algorithms and in computer science in general, together with their notation.
 Floor and Ceiling Functions:
o If x is a real number, then it means that x lies between two integers which are called the floor
and ceiling of x. i.e. ...
|_x_| is called the floor of x. It is the greatest integer that is not greater than x.
| x | is called the ceiling of x. It is the smallest integer that is not less than x.

14
o If x is itself an integer, then |_x_| = | x |, otherwise |_x_| + 1 = | x |
E.g: |_3.14_| = 3, |_-8.5_| = -9, |_7_| = 7
| 3.14 |= 4, | -8.5 | = -8, | 7 |= 7
 Remainder Function (Modular Arithmetic):
If k is any integer and M is a positive integer, then:
k (mod M)
gives the integer remainder when k is divided by M.
E.g: 25(mod 7) = 4
25(mod 5) = 0
 Integer and Absolute Value Functions:
If x is a real number, then integer function INT(x) will convert x into integer and the fractional
part is removed.
E.g: INT (3.14) >= 3
INT (-8.5) = -8
The absolute function ABS(x) or | x | gives the absolute value of x i.e. it gives the positive
value of x even if x is negative.
E.g: ABS(-15) = 15 or ABS | -15| = 15
ABS(7) = 7 or ABS | 7 | = 7
ABS(-3.33) = 3.33 or ABS | -3.33 | = 3.33
 Summation Symbol (Sums):
The symbol which is used to denote summation is a Greek letter Sigma.
Let a1, a2, a3, ….. , an be a sequence of numbers.
Then the sum a, +a2+ — + a„ and am+ aw + , + •……+ a
will be denoted, respectively, by
The letter J in the above expressions is called a dummy index or dummy variable. Other letters
frequently used as dummy variables are f, k, s and l.
 Factorial Function:
The product of the positive integers from 1 to n, inclusive, is denoted by n! (read “n factorial").
That is,
n! = 1 * 2 * 3 * ….. * (n-2) * (n-1) * n
It is also convenient to define 0! = 1.
E.g: 4! = 1 * 2 * 3 * 4 = 24
5! = 5 * 4! = 120
 Permutations:
A permutation of a set of n elements is an arrangement of the elements in a given order. For
example, the permutations of the set consisting of the elements a. b. c are as follows:
abc. acb, bac, bca. cab, cba
One can prove: There are n! permutations of a set of n elements. Accordingly, there are 4! - 24
permutations of a set with 4 elements, 5! = 120 permutations of a set with 5 elements and so on.

15
 Exponents and Logarithms:
Exponent means how many times a number is multiplied by itself. Recall the following
definitions for integer exponents (where m is a positive integer). If m is a positive integer, then
Exponents are extended to include all rational numbers by defining, for any rational number
m/n,
Example:
The concept of logarithms is related to exponents. If b is a positive number, then the logarithm
of any positive number x to the base b is written as logbx. It represents the exponent to which b
should be raised to get x i.e. y = logbx and by = x
E.g: log28 = 3, since 23
=8
log100.001 = - 3, since 10-3
= 0.001
logb1 = 0, since b0
= 1
logb b = 1, since b1
= b
 Algorithmic Notations:
An algorithm is a finite step-by-step list of well-defined instructions for solving a particular
problem. The formal definition of an algorithm, which uses the notion of a Turing machine or its
equivalent. This section describes the format that is used to present algorithms throughout the text.
From the data structure point of view, following are some important categories of algorithms
 Search − Algorithm to search an item in a data structure.
 Sort − Algorithm to sort items in a certain order.
 Insert − Algorithm to insert item in a data structure.
 Update − Algorithm to update an existing item in a data structure.
 Delete − Algorithm to delete an existing item from a data structure.
Characteristics of an Algorithm
Not all procedures can be called an algorithm. An algorithm should have the following characteristics
 Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or phases), and
their inputs/outputs should be clear and must lead to only one meaning.
 Input − An algorithm should have 0 or more well-defined inputs.
 Output − An algorithm should have 1 or more well-defined outputs, and should match the desired
output.
 Finiteness − Algorithms must terminate after a finite number of steps.
 Feasibility − Should be feasible with the available resources.
 Independent − An algorithm should have step-by-step directions, which should be independent of
any programming code.

16
How to Write an Algorithm?
There are no well-defined standards for writing algorithms. Rather, it is problem and resource
dependent. Algorithms are never written to support a particular programming code.
As we know that all programming languages share basic code constructs like loops (do, for,
while), flow-control (if-else), etc. These common constructs can be used to write an algorithm.
We write algorithms in a step-by-step manner, but it is not always the case. Algorithm writing is a
process and is executed after the problem domain is well-defined. That is, we should know the
problem domain, for which we are designing a solution.
Problem − (Largest Element in Array) A nonempty array DATA with N numerical values is given.
This algorithm finds the location LOC and the value MAX of the largest element of DATA. The
variable K is used as a counter.
Step 1: [Initialize] Set k:=1, LOC:=1 and MAX:= DATA[1].
Step 2: [Increment cunter] Set K:=:K+1.
Step 3: [Test counter.] IF K>N, then:
Write:LOC, MAX, and Exit.
Step 4: [Compare and update.]If MAX<DATA[k], then:
Set LOC:=K and MAX:= DATA[K].
Step 5: [Repeat loop.] Go to Step 2.
Flowchart:
CODE:
public class AlgorithmicNotation
{
public static void main(String[] args)
{
int[] Data ={1,3,7,2,5,9};
int N=Data.length;
int k=1, Loc =1, Max=Data[0];
while(k<N)

17
{
if(Max<Data[k])
{
Loc =k;
Max = Data[k];
}
k++;
}
System.out.println("loc:"+Loc);
System.out.println("Max"+Max);
}
}
The following summarizes certain conventions that we will use in presenting our algorithms.
 Identifying Number
Each algorithm is assigned an identifying number
 Steps, Control, Exit
The steps of the algorithm are executed one after the other, beginning with Step 1. Unless
indicated otherwise. Control may be transferred to Step n of the algorithm by the statement "Go
to Step n."
If several statements appear in the same step, e.g.,
Set K := 1, LOC := I and MAX := DATA[ 1 ].
then they are executed from left to right.
the algorithm is completed when the statement
Exit.
is encountered.
 Comments
Each step may contain a comment in brackets which indicates the main purpose of the step. The
comment will usually appear at the beginning or the end of the step.
 Variable Names
Variable names will use capital letters, as in MAX and DATA. Single-letter names of variables
used as counters or subscripts will also be capitalized in the algorithms (K and N, for example),
even though lowercase may be used for these same variables (k and n) in the accompanying
mathematical description and analysis.
 Assignment Statement
Our assignment statements will use the dots-equal notation := that is used in Pascal.
For example: Max := DATA[l]
assigns the value in DATA[l] to MAX. In C language, we use the equal sign = for this
operation.
 Input and Output
Data may be input and assigned to variables by means of a Read statement, with the following
form:

18
Read: Variables names.
Similarly, messages, placed in quotation marks, and data in variables may be output by means
of a Write or Print statement with the following form:
Write: Messages and/or variable names.
 Procedures
The term “procedure” will be used for an independent algorithmic module which solves a
particular problem. Generally, the word “algorithm” will be reserved for the solution of general
problems. The term “procedure” will also be used to describe a certain type of sub-algorithm.
 Control Structures:
A control structure is like a block of programming that analyses variables and chooses a
direction in which to go based on given parameters. The term flow control details the direction
the program takes.
Algorithms and their equivalent computer programs are more easily understood if they mainly
use self-contained modules and three types of logic, or flow of control, called
1. Sequence logic, or sequential flow
2. Selection logic, or conditional flow
3. Iteration logic, or repetitive flow
These three types of logic are discussed below, and in each case we show the equivalent
flowchart.
1. Sequence logic or sequential flow
 Sequential logic as the name suggests follows a serial or sequential flow in
which the flow depends on the series of instructions given to the computer.
Unless new instructions are given, the modules are executed in the obvious
sequence.
 The sequences may be given, by means of numbered steps explicitly. Also,
implicitly follows the order in which modules are written. Most of the
processing, even some complex problems, will generally follow this
elementary flow pattern.
2. Selection logic or conditional flow
 Selection Logic simply involves a number of conditions or parameters which decides one out
of several written modules. The structures which implement this logic are called conditional
structures or If structures.
 These structures can be of three types:
o Single Alternative: This structure has the form:
If (condition) then:
[Module A]
[End of If structure]

19
The logic of this structure is pictured in Fig. If the condition holds, then Module A.
which may consist of one or more statements is executed; otherwise Module A is
skipped and control transfers to the next step of the algorithm.
o Double Alternative: This structure has the form
If condition, then:
[Module A]
Else:
[Module B]
[End of If structure.]
The logic of this structure is pictured in Figure. As indicated by the How chart, if the
condition holds, then Module A is executed; otherwise Module B is executed.
o Multiple Alternatives: This structure has the form
If condition(l), then:
[Module A1]
Else if condition(2), then:
[Module A2]
.
.
Else if condition(M). then:
[Module Am]
Else:
[Module B]
[End of If structure]
The logic of this structure allows only one of the modules to be executed. Specifically,
either the module which follows the first condition which holds is executed, or the
module which follows the final Else statement is executed. In practice, there will rarely
be more than three alternatives.
3. Iteration logic or repetitive flow
The Iteration logic employs a loop which involves a repeat
statement followed by a module known as the body of a loop.
The two types of these structures are:
o Repeat-for loop: This structure has the form:
Repeat for i = A to N by I:
[Module]
[End of loop]
Here, A is the initial value, N is the end value and I is the
increment. The loop ends when A>B. K increases or
decreases according to the positive and negative value of I
respectively.

20
o Repeat-while Structure: This structure has the form:
It also uses a condition to control the loop.
Repeat while condition:
[Module]
[End of Loop]
The logic of this structure is Observe that the cycling continues
until the condition is false. We emphasize that there must be a
statement before the structure that initializes the condition
controlling the loop, and in order that the looping may
eventually cease, there must be a statement in the body of the
loop that changes the condition.
 Complexity of Algorithms:
The complexity of an algorithm M is the function f(n) which gives the running time and/or
storage space requirement of the algorithm in terms of the size n of the input data. The storage
space required by an algorithm is simply a multiple of the data size n. The term “complexity"
refers to the running time of the algorithm.
The two cases one usually investigates in complexity theory are as follows:
1. Worst case: the maximum value of f(n) for any possible input
2. Average case: the expected value of f(n)
3. Best case: the minimum value of f(n) for any possible input
The following example illustrates that the function f(n), which gives the running lime
of an algorithm, depends not only on the size n of the input data but also on the particular data.
Example: Linear/ Sequential Search
Linear/Sequential searching is a searching technique to find an item from a list until the
particular item not found or list not reached at the end. We start the searching from 0th index
to Nth-1 index in a sequential manner, if a particular item found, returns the position of that
item otherwise return failure status or -1.

21
Algorithm:
LINEAR_SEARCH (LIST, N, ITEM, LOC, POS)
In this algorithm, LIST is a linear array with N
elements, and ITEM is a given item to be searched.
This algorithm finds the location LOC of ITEM into POS
in LIST, or sets POS := 0, if search is unsuccessful.
1. Set POS := 0;
2. Set LOC := 1;
3. Repeat STEPS a and b while LOC <= N
a. If ( LIST[LOC] = ITEM) then
i. SET POS := LOC; [get position of item]
ii. return;
b. Otherwise
i. SET LOC := LOC + 1 ; [increment counter]
4. [END OF LOOP]
5. return
Complexity of Linear search:
The complexity of the search algorithm is based on the number of comparisons C,
between ITEM and LIST [LOC]. We seek C (n) for the worst and average case,
where n is the size of the list.
Worst Case: The worst case occurs when ITEM is present at the last location of the list, or it is
not there at al. In either situation, we have,
C (n) = n
Now, C (n) = n is the worst-case complexity of linear or sequential search algorithm.
Average case: In this case, we assume that ITEM is there in the list, and it can be present at any
position in the list. Accordingly, the number of comparisons can be any of the numbers 1, 2, 3,
4,..., n. and each number occurs with probability p = 1/n. Then,

22
Implementation of Linear/ Sequential Search
#include <stdio.h>
#define SIZE 5
/*Function to search item from list*/
int LinearSearch(int ele[], int item)
{
int POS = -1;
int LOC = 0;
for (LOC = 0; LOC < SIZE; LOC++) {
if (ele[LOC] == item) {
POS = LOC;
break;
}
}
return POS;
}
int main()
{
int ele[SIZE];
int i = 0;
int item;
int pos;
printf("nEnter Items : n");
for (i = 0; i < SIZE; i++) {
printf("Enter ELE[%d] : ", i + 1);
scanf("%d", &ele[i]);
}
printf("nnEnter Item To Be Searched : ");
scanf("%d", &item);
pos = LinearSearch(ele, item);
if (pos >= 0) {
printf("nItem Found At Position : %dn", pos + 1);
}
else
{
printf("nItem Not Found In The Listn");
}
return 0;
}
 Asymptotic Notations for Complexity of Algorithms:
1. Big-O Notation (Ο) – Big O notation specifically describes worst case scenario.
2. Omega Notation (Ω) – Omega (Ω) notation specifically describes best case scenario.
3. Theta Notation (θ) – This notation represents the average complexity of an algorithm.

23
Worst-case or upper bound: Big-O: O(n)
Definition: O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤
cg(n) for all n ≥ n0 }
o The “Big-O“ notation defines an upper hound function g(n) for f(n) which represents
the time/ space complexity of the algorithm on an input characteristic n
o In the worst case analysis, we calculate upper bound on running time of an algorithm. We
must know the case that causes maximum number of operations to be executed.
o For Linear Search, the worst case happens when the element to be searched is not present in
the array. When x is not present, the search() functions compares it with all the elements of
arr[] one by one.
o Therefore, the worst case time complexity of linear search would be Θ(n).
Best-case or lower bound: Big-Omega: Ω(n)
Definition: f(n} = Ω(g(n)) (read as f of n is omega of g of n), there exist positive constants c
and n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0 }
o The theta notation is used when the function f(n) is bounded both above and below by the
function g(n).
o In the best case analysis, we calculate lower bound on running time of an algorithm. We
must know the case that causes minimum number of operations to be executed.
o In the linear search problem, the best case occurs when x is present at the first location. The
number of operations in the best case is constant.
o So time complexity in the best case would be Θ(1)

24
Average-case: Big-Theta: Θ(n)
Definition: f(n} = Θ(g(n)) (read as f of n is theta of g of n), there exist positive constants c1,
c2 and n0 such that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ n0
From the definition it implies that the function g(n) is both an upper bound and a lower
bound for the function f(n) for all values of n, where n>= n0. In other words, f(n) is
such f(n)= O(g(n)) and f(n) = Ω (g(n)).
o The omega notation is used when the function g(n) defines a lower bound for the function
f(n).
o In average case analysis, we take all possible inputs and calculate computing time for all of
the inputs. Sum all the calculated values and divide the sum by total number of inputs. We
must know distribution of cases.
o For the linear search problem, let us assume that all cases are uniformly distributed. So we
sum all the cases and divide the sum by (n+1).
 String Processing: Definition
String is finite sequence S of zero or more character. The number of character in string is
called its length. The string with zero character is called empty string or null. Specific strings
will be denoted by enclosing single quotation mark.
Example: ‘The End’, ‘To be or not to be’, ‘ ‘ are strings of length 7, 18 and zero.
Concatenation let S1, S2 be string. The string consisting of the characters of S1 followed by
Characters of string S2 is called the concatenation of S1 and S2 it will be denoted S1//S2.
Example: ‘THE’ // ‘END’ = ‘THEEND’ it is noted that length of S1//S2 is equal to sum of the
length of S1 and S2.
Substring a string Y is called a substring of string S if there exist strings X and z such that
S = X//Y//Z
If X is empty string, then Y is called initial substring of S, if z is an empty string then Y is
called a terminal substring of S.
If y is substring of S then length of S does not exceed X.

25
 Storing Stings:
Strings are stored in three types of structure.
1. Fix length structure.
2. Variable length structure.
3. Linked structure.
1. Record-Oriented, Fixed length storages:
 In this storage each line of print is viewed as record, where all record have same
length i.e where each record accommodates the same number of character.
Example:
/* PROGRAM PRINTING TWO INTEGERS IN DECREASING ORDER*/
void main ()
{
int J,K;
scanf (*%d %d",tJ,&K);
if(J>K)
printf ("%d %dn*,J,K);
if(J<K)
printf ("Sid %dn',K,J);
}
Advantage
 Ease of accessing data from any given record.
 Ease of updating data in a given record.
Records Stored Sequentially in the Computer
Using a record-oriented fixed-length storage medium the input data will appear in
memory as shown above, where we assume that 200 is the address of the first character
of the program.

26
Disadvantage.
 Time is wasted reading an entire record if most of storage consist of inessential
blank space.
 Certain records may require more space than available.
 When correction consist of more or fewer characters than the original text, changing
a misspelled word requires the entire record be changed.
Suppose we wanted to insert a new record in the above Example. This would require
that all succeeding records be moved to new memory locations. However, this
disadvantage can be easily remedied as indicated in the below figure. That is, one can
use a linear array POINT which gives the address of each successive record, so that the
records need not be stored in consecutive locations in memory. Accordingly, inserting a
new record will require only an updating of the array POINT.
Records stored using pointers
2. Variable length storage
 The storage of variable length strings in memory cells with fixed length can be done
in two general ways.
o One can use a marker, such as two dollar signs ($$), to signal the end of the
string.

27
o One can list the length of the string as an additional item in the pointer array.
3. Linked storage
 Linked storage is used for most extensive word processing applications, strings are
stored by means of linked lists.
 By a (one way) linked list, we mean a linearly ordered sequence of memory cells
called nodes, where each node contains an item called link, which points to the next
node in list which contain the address of next node.
Strings may be stored in linked lists as follows.
 Bach memory cell is assigned one character or a fixed number of characters and a link
contained in the cell gives the address of the cell containing the next character or group of
characters in the string.
 For example, consider this famous quotation:
“To be or not to be”
Figure (a) shows how the string would appear in memory with one character per node.
Figure (b) shows how it would appear with four characters per node.
 String as ADT:
The Abstract String is a programmer-defined linear ordering of objects from a finite set of
objects termed an alphabet. The alphabet may be restricted to alphanumeric and punctuation
characters but may also include additional mark-up information such as a new line character.

28
A string data type typically should have operations to:
• Return the nth
character in a string.
• Set the n01
character in a string to c.
• Find the length of a string.
• Concatenate two strings.
• Copy a string.
• Delete part of a string.
• Modify and compare strings in other ways.
The following is a set of operations we might want to do on strings.
GETCHAR(str, n) Returns the nth
character in the string
PUTCHAR(str, n, c) Sets the nth
character in the siring to c
LENGTH(str) Returns the number of characters in the string
POS(strl, str2) Returns the position of the first occurrence of str2 found
in str2, or 0 if no match
CONCAT(strl, str2) Returns a new string consisting of characters in str1
followed by characters in str2
SUBSTRING(strl, i, m) Returns a substring of length m starting at position i in
string str
DELETE(str, i, m) Deletes m characters from str starting at position i
INSERT(strl, str2, i) Changes strl into a new string with str2 inserted in
position i
COMPARE(strl, str2) Returns an integer indicating whether str1 > str2
 String Operations:
Although string may be consider as sequence or linear array of character, groups of consecutive
elements in a string (such as word, phrase and sentence) called substring.
Furthermore the basic units of access in a string are usually these substrings, not individual
characters.
Substring
Accessing a substring from a given string requires three pieces of information, the name of string,
the position of the first character of the substring in the given string and the length of the substring
or the position of the last character of the substring. We call this operation SUBSTRING.
Syntax: SUBSTRING (String, Initial, length)
Example: SUBSTRING ('TO BE OR MOT TO BE ‘, 4, 7) =‘BE OR N’
Indexing
It also called pattern matching, refers to finding the position where a string pattern P first appears
in a given string text T. we call it INDEX and write

29
Syntax: INDEX (text, pattern)
If pattern P does not appear in the text T, then INDEX assign value 0. The arguments “text" and
“pattern" can be either string constants or string variables.
Example: Suppose T contains the text - 'HIS FATHER IS THE PROFESSOR'
Then, INDEX (T, 'THE') = 7
INDEX (T, ’THEN')=0
INDEX (T, ' THE ‘)=14
Concatenation
Let S1 and S2 be string then concatenation of S1 and S2 is denoted by S1 // S2 is the string
consisting of the character of S1 followed by the character S2.
Syntax: strcat (S1,S2);
Example: S1 ‘MARK’
S2 ‘TWAIN’
S1//S2 = ‘MARKTWAIN”
S1// ‘ ‘ //S2 = ‘MARK TWAIN”
Length
The number of characters in a string is called its length.
Syntax: LENGTH (string)
Example: LENGTH (‘COMPUTER ’) =9
 Word/Text Processing:
In earlier time’s computer can process data only character type now a day’s computer process
printed text letter articles etc. the operation usually associated with word processing are the
following
 Insertion: inserting a string in the middle of the text.
 Deletion: deleting a string from the text.
 Replacement: replacing one string in the text by another
Insertion
Suppose in a given text T we wants to insert a string S so that S begins in position K. we denote
this operation by
Syntax: INSERT (text, position, string)
Example: INSERT (‘ABCDEF’, 3, ‘XYZ’) = ‘ABXYXCDEF’
This insertion function can also be implemented by using string operation
INSERT (T, K, S) = SUBSTRING (T, 1, K-1) //S// SUBSTRING (T, K, LENGTH (T)- K+1)
That is, the initial substring of T before position K, which has length K -1, is concatenated with
String S, and the result is concatenated with remaining part of T, has length
LENGTH(T)-(K-1) = LENGTH(T) –K+1

30
Deletion
Suppose in a given text T we wants to remove the substring which begins in position K and length
L. we denote this operation by
Syntax: DELETE (text, position, length)
Example: DELETE (‘PRESTON’, 2, 2) = ‘PSTON’
DELETE (‘ABCDEFG’, 2, 4) = ‘AFG’
We assume that nothing is deleted if position K = 0. Thus
DELETE (‘ABCDEFG’, 0, 2) = ’ABCDEFG’
Replacement
Suppose in a given text T we want to replace the first occurrence of a pattern P1 by a pattern P2.
We will denote this operation by
Syntax: REPLACE (text, pattern1, Pattern2)
Example: REPLACE (‘ABXYDEFGH’, ‘XY’, ‘CD’) = ‘ABCDDEFGH’
REPLACE (‘XABYABZ’, ‘BA`,’C) = 'XABYABZ'
In the second case, the pattern BA does not occur, hence there is no change.
We note that replace function can be expressed as deletion function followed by insertion function.
The REPLACE function can be executed by using the following three steps
K := INDEX(T,P1)
T := DELETE(T, K, Length(P1))
INSERT (T, K, P2)
The first two steps delete P1 from T and third step insert P2 in the position K from which P1 was
deleted.
 Pattern Matching Algorithms:
Pattern matching is the problem of deciding whether or not given string pattern P appears in a
string text T.
We assume that the length P does not exceed the length of T. There are different algorithms. The
main goal to design these type of algorithms to reduce the time complexity. The traditional
approach may take lots of time to complete the pattern searching task for a longer text.
First Pattern matching algorithm:
In this algorithm we compare a given pattern P with each of the substring of T, moving from left to
right until we get a match. Structure is
Wk = SUBSTRING (T, K, Length (P))
This statement shows that Wk denotes the substring of T having same length as P and beginning
with the kth
character of T. First we compare P character by character, with first substring, W1. If
all the character are the same then p=W1 and so P appears in T and index (T, P)=1. Suppose some
character of P is not match of W1 then P≠W1 and we move to next substring W2. That is, we next
compare P with W2. If P≠W2, then we compare P with W3, and so on.

31
The process stops
(a) when we find a match of P with some substring Wk and so P appear in T and INDEX (T, P)=K
(b) when we exhaust all the Wk‘s with no match and hence p does not appear in T.
The maximum value MAX of the substring K is equal to LENGTH (T) – LENGTH (P) + 1

Introduction of data structures and algorithms

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Introduction of data structures and algorithms

Similar a Introduction of data structures and algorithms (20)

Último

Último (20)

Introduction of data structures and algorithms