1. 1
Searching
Session VIII
Dr. V.Umadevi M.Sc(CS &IT). M.Tech (IT).,
M.Phil., PhD., D.Litt.,
Director, Department of Computer Science,
Jairams Arts and Science College, Karur.
Data Structures & Algorithms
2. 2
Searching
• Basic Search Techniques
• Tree Searching
• General search Trees
• Hashing
Data Structures & Algorithms
3. 3
Searching
• Searching is a technique where the memory is scanned for the required
data
• Easiest way to search from a table is to perform a sequential search,
– i.e. a search through all the elements in the table or list.
• This can be done by :
For (int i = 0; i<n; i++)
If (x == a[i] )
break;
Data Structures & Algorithms
4. 4
Terminologies in Search Techniques
• A table or a file is a group of elements called records.
• A key associated with each record is used to differentiate among the
records.
• The Key present within the record is internal key or Embedded key.
• The separate table of keys that include pointers to the records are external
keys.
• A set of keys which uniquely identifies the record is Primary key.
Data Structures & Algorithms
6. 6
Sequential Search
• Simplest form of search.
• Applicable to data stored in an array or as a linked list.
• This method involves searching the table to scan each entry in sequential
manner until the record is found or can be concluded that it will not be
found.
• This method of traversing the data sequentially to locate the item is called
Linear or Sequential Search.
Data Structures & Algorithms
7. 7
Algorithm for sequential search is
for (i = 0; i < n; i++)
if (key = = k(i) ) then
return(i);
return (-1);
• In the worst case the search requires n comparisons
and the best case requires only one search.
• The complexity will be
Best Case = O(1) and for
Worst case it is O(n).
Data Structures & Algorithms
8. 8
Binary Search
• Binary Search technique is used when the items are placed in an array
that is sorted either in ascending or in descending order.
• In this method
– Key is compared with the middle item of the array
– If there’s a match it is returned successfully
– If the key is lesser the lower half of the array is to be searched
– If the key is greater the upper half of the array is to be searched.
– This procedure is repeated till the array is exhausted or the item is
found.
Data Structures & Algorithms
9. 9
l = 1 ; u = n; done = ‘f’
while ((l<=u) && (done=='f'))
{ m=(l+u)/2;
if (k>f[m])
l=m+1;
if (k<f[m])
u=m-1;
if (k==f[m])
{ i=m;
done='t';
}
}
if (done=='f')
printf("n KEY %d IS NOT FOUND IN THE FILE",k);
else
printf("n KEY %d IS FOUND IN POSITION %d",k,i);
Data Structures & AlgorithmsAlgorithm for Binary Search
• Binary search gives a complexity of O(log n) .
• This is much faster than the linear search
10. 10
Searching an Ordered Table
• If a table of fixed size is sorted in ascending or descending order of keys,
then efficiency of searching can be improved.
• In case the key we are searching for is absent, in an unsorted file of size n,
n comparisons are needed .
• In case of a sorted file n/2 comparisons is enough to conclude that the key
is absent.
• Because as soon as an element greater than the key is found, we can detect
the key to be missing.
Data Structures & Algorithms
11. 11
• For example in many applications a response to a request for information
may be deferred to next day.
• Here all such requests are collected and searching is done overnight.
• Sequential search in both the tables will require only a few look ups.
• There is no need to search the entire table for each request.
• This searching technique is useful while dealing with a master file and a
large transaction file.
Data Structures & Algorithms
12. 12
Indexed Sequential Search
• Efficient method to search in a sorted file.
• An auxiliary table, called an index is created. Each element in index consists
of a key kindex and pointer to the record referred by kindex.
• Assumptions:
• Kindex array of keys in index.
• Pindex array of pointers within the index to actual records in the file.
• N size of the file.
• Sequential search is performed first on the index table. Once the correct index
portion has been found, a second sequential search is performed in a
small portion of the record table itself.
• Deletions are done by flagging deleted entries
• Insertion involves a lot of shifting of elements.
• An alternate method is to keep an overflow area at some other location and
link together any inserted records
Data Structures & Algorithms
15. 15
Tree Searching
• In this type of trees, all the left descendants of a node with key
‘key’ have keys that are less than ‘key’ and all the the right
descendants have keys that are greater than or equal to key.
• In this method:
• If the key is compared with the root.
• If it is equal search is successful
• If the key is lesser, the left child is compared.
• If the key is greater, the right child is compared.
• The above steps are repeated till the nodes are
exhausted or the search is successful
Data Structures & Algorithms
16. 16
• Algorithm for searching in a binary search tree is
p = tree;
While ( p!= null && key != k(p) )
p = (key < k(p)) ? left(p) : right(p);
return(p);
• The efficiency of the search process can be improved by using a sentinel.
• All left and right tree nodes with null pointers now points to this sentinel.
• While searching, the key is inserted into the sentinel .
• After searching is over, if p equals the sentinel pointer, then the search is
unsuccessful. Otherwise p points to the desired node.
• A sorted array can be produced by traversing the tree in inorder.
• From a sorted array, tree can be constructed in two different ways.
1. viewing the middle element as the root and the remaining elements as the left
or right child as they are greater or smaller.
– This tree will be a balanced one.
1. Viewing the first element as the root and each successive element as the right
/ left son of its predecessor.
– This tree will be an unbalanced one.
Data Structures & Algorithms
17. 17
Insertion and Deletion
Algorithm for insertion into a Binary Search Tree
q = null; p = tree;
while ( p!= null)
{ if ( key = = k(p))
return(p); q = p;
if ( key < k(p))
p = left(p);
else
p = right(p) ;
}
v = maketree(rec,key);
if (q = = null )
tree = v;
else
if ( key < k(q)
left(q) = v;
else
right(q) = v;
return(v);
• After insertion of a new node also, the order of the tree is maintained.
Data Structures & Algorithms
18. 18
Deletion
• Deletion involves deleting a node with key ‘key’ from a binary search tree.
– If the node to be deleted has no sons, it may be deleted without further
adjustment to the tree.
– If the node has to be deleted has only one subtree, its only son can be
moved up to take its place.
– If the node to be deleted has two subtrees, its inorder successor must
take its place.
Efficiency
– Varies between o(n) and o(logn) depending on the structure of the tree
– Average search time is O(log n).
Data Structures & Algorithms
19. 19
General Search Trees
– General non binary trees are used as search tables, in external storage
devices.
– Multiway Search Trees
– Digital Search Trees
Data Structures & Algorithms
20. 20
Multiway Search Trees
• A multi way search tree of order n is a general tree in which each node has
n or fewer subtrees and contains one fewer key than it has subtrees.
• Suppose if a node has four subtrees, it contains three keys.
• Some terminologies related to multiway search tree are
node(p) denotes node
numtrees(p) denotes number of subtrees of node(p)
numtrees(p) <= n, the order of the trees
son(p,0),son(p,1),……son(p,numtrees(p)-1)
denotes subtrees of node(p)
k(p,0), k(p,1),. …… k(p,numtrees(p) –2)
denotes keys in the node
Data Structures & Algorithms
23. 23
Algorithm to Search a Multiway Search Tree
p = tree;
if ( p = = null)
{
position = -1;
return(-1)
}
I = nodesearch (p,key);
if ( i< numtrees(p) – 1 && key = = k(p,I )
{ position = i;
return (p)
}
– The function nodesearch locates the smallest key in a node greater than
or equal to the search argument.
Data Structures & Algorithms
24. 24
• To insert into a multiway search tree, we have to search for the argument
key.
• If it is found it is returned to the node.
• Otherwise the pointer is returned to the semi leaf nodes.
• For example, consider what would happen if the keys were inserted in the
order 73, 77, 84, 86, 87, 84, 85 in the following tree. ( in fig b.)
Data Structures & Algorithms
26. 26
Hashing
• Hashing is a search technique, where the elements are ordered with respect
to a key value.
• The function that calculates the key value is hash function
• A hash function maps a key to an integer. The key-value pairs are stored in
array of buckets.
• Bucket address calculated as
bucket_address = hash(key)%hash_table_size.
e.g.. 7321 is represented as 7321mod 100 = 21 is the address where it
is stored.
Data Structures & Algorithms
27. 27
Hash Tables / Direct Access Tables
• A collection of n elements with unique keys, are stored in a direct access
table T[m].
• The hash function is f(x) = x.
• For a key k, if we access T[k]
• if it contains an element, return it
• if it doesn’t then return a null.
• Thus the complexity of this will be O(1).
• The keys should be unique
• The range of the keys should be bounded.
Data Structures & Algorithms
28. 28
Mapping Functions
• The direct access approach is a one to one mapping from h(k) to k in (1,m).
• This is perfect hashing function and maps each key to a distinct integer.
• But finding such a perfect hash function is not possible always.
• Sometimes h(k) may map several keys to an integer.
• This situation is called collision.
Data Structures & Algorithms
29. 29
Handling the Collisions
Various techniques used to solve collision are:
1. Chaining
2. Re- Hashing
3. Linear Probing
4. Clustering
5. Quadratic Probing
6. Overflow area
Data Structures & Algorithms
30. 30
Chaining
• This technique builds a linked lists of all the items whose keys hash to
the same values.
• During search this short linked list is traversed for the desired key.
• Unlimited number of collisions can be handled.
• Prior knowledge of no: of items is not necessary
Data Structures & Algorithms
31. 31
Linear Probing
• Simplest of the re-hashing function.
• Here f is a linear function i.e. f(x) = x.
• Once there is a collision, the first free cell can be assigned .
• To insert { 89, 18, 49, 58, 69} into a hash table,
Let the function be f( x ) = x mod 10
89 9 18 8 49 9 again.
Here 49 is moved to the next immediate position i.e. to 0.
58 1 and 69 2
New address are calculated quickly.
Data Structures & Algorithms
32. 32
Re – Hashing
• This scheme uses a second hashing operation when there is
collision.
• The function can be a new one or a re-application of the
original function .
• If the declared hash table is almost full, a new table of
comparatively larger size will be declared and elements are
shifted to it.
• Suppose the original function is h(x) = x mod 7, then we can
use h(x) = x mod 17 for rehashing.
• This is a very expensive algorithm taking running time of
O(n).
Data Structures & Algorithms
33. 33
• When two keys that hash into different values compete with each other in
successive rehashes.
• This is primary clustering.
• To resolve this clustering, hash function is applied continuously until an empty slot
is found.
• To eliminate this, h(x) is applied repeatedly.
• First hash will yield h1 = h(x) + 1 %table size, second one h2 = h(x) +2 etc.
• Different keys that hash to the same value follow the same rehash path.
• This is called secondary clustering.
• To eliminate this double hashing is used.
• This involves use of two hash functions, h1(x) and h2(x).
• Primary hash function is used. If there is a collision, secondary function is
applied.
Data Structures & AlgorithmsClustering
34. 34
• Different keys that hash to the same value follow the same rehash path.
• This is called secondary clustering.
• To eliminate this double hashing is used.
• This involves use of two hash functions, h1(x) and h2(x).
• Primary hash function is used. If there is a collision, secondary function is
applied.
Data Structures & Algorithms
35. 35
Quadratic Probing
• In this method, secondary hash function will be a quadratic one.
• Rehash Address = h(key) + ci2
• Rehashing scheme use the originally allocated table space and thus avoid
linked list overhead.
• Knowledge of number of elements is required to determine the function.
Data Structures & Algorithms
36. 36
Overflow Area
• This method involves dividing the pre- allocated table into two sections
– the primary area to which keys are mapped and an area for collisions, called
the Overflow Area.
• When a collision occurs, a slot in the overflow area is used for the new
element and a link from the primary area is established.
• Access speed is more.
Data Structures & Algorithms