•1 recomendación•147 vistas

Denunciar

Compartir

Descargar para leer sin conexión

Notes on data structures

- 1. 1 Searching Session VIII Dr. V.Umadevi M.Sc(CS &IT). M.Tech (IT)., M.Phil., PhD., D.Litt., Director, Department of Computer Science, Jairams Arts and Science College, Karur. Data Structures & Algorithms
- 2. 2 Searching • Basic Search Techniques • Tree Searching • General search Trees • Hashing Data Structures & Algorithms
- 3. 3 Searching • Searching is a technique where the memory is scanned for the required data • Easiest way to search from a table is to perform a sequential search, – i.e. a search through all the elements in the table or list. • This can be done by : For (int i = 0; i<n; i++) If (x == a[i] ) break; Data Structures & Algorithms
- 4. 4 Terminologies in Search Techniques • A table or a file is a group of elements called records. • A key associated with each record is used to differentiate among the records. • The Key present within the record is internal key or Embedded key. • The separate table of keys that include pointers to the records are external keys. • A set of keys which uniquely identifies the record is Primary key. Data Structures & Algorithms
- 5. 5 Basic Search Techniques • Sequential Search • Binary Search • Searching an Ordered Table • Indexed Sequential Search • Interpolation Search Data Structures & Algorithms
- 6. 6 Sequential Search • Simplest form of search. • Applicable to data stored in an array or as a linked list. • This method involves searching the table to scan each entry in sequential manner until the record is found or can be concluded that it will not be found. • This method of traversing the data sequentially to locate the item is called Linear or Sequential Search. Data Structures & Algorithms
- 7. 7 Algorithm for sequential search is for (i = 0; i < n; i++) if (key = = k(i) ) then return(i); return (-1); • In the worst case the search requires n comparisons and the best case requires only one search. • The complexity will be Best Case = O(1) and for Worst case it is O(n). Data Structures & Algorithms
- 8. 8 Binary Search • Binary Search technique is used when the items are placed in an array that is sorted either in ascending or in descending order. • In this method – Key is compared with the middle item of the array – If there’s a match it is returned successfully – If the key is lesser the lower half of the array is to be searched – If the key is greater the upper half of the array is to be searched. – This procedure is repeated till the array is exhausted or the item is found. Data Structures & Algorithms
- 9. 9 l = 1 ; u = n; done = ‘f’ while ((l<=u) && (done=='f')) { m=(l+u)/2; if (k>f[m]) l=m+1; if (k<f[m]) u=m-1; if (k==f[m]) { i=m; done='t'; } } if (done=='f') printf("n KEY %d IS NOT FOUND IN THE FILE",k); else printf("n KEY %d IS FOUND IN POSITION %d",k,i); Data Structures & AlgorithmsAlgorithm for Binary Search • Binary search gives a complexity of O(log n) . • This is much faster than the linear search
- 10. 10 Searching an Ordered Table • If a table of fixed size is sorted in ascending or descending order of keys, then efficiency of searching can be improved. • In case the key we are searching for is absent, in an unsorted file of size n, n comparisons are needed . • In case of a sorted file n/2 comparisons is enough to conclude that the key is absent. • Because as soon as an element greater than the key is found, we can detect the key to be missing. Data Structures & Algorithms
- 11. 11 • For example in many applications a response to a request for information may be deferred to next day. • Here all such requests are collected and searching is done overnight. • Sequential search in both the tables will require only a few look ups. • There is no need to search the entire table for each request. • This searching technique is useful while dealing with a master file and a large transaction file. Data Structures & Algorithms
- 12. 12 Indexed Sequential Search • Efficient method to search in a sorted file. • An auxiliary table, called an index is created. Each element in index consists of a key kindex and pointer to the record referred by kindex. • Assumptions: • Kindex array of keys in index. • Pindex array of pointers within the index to actual records in the file. • N size of the file. • Sequential search is performed first on the index table. Once the correct index portion has been found, a second sequential search is performed in a small portion of the record table itself. • Deletions are done by flagging deleted entries • Insertion involves a lot of shifting of elements. • An alternate method is to keep an overflow area at some other location and link together any inserted records Data Structures & Algorithms
- 13. 13 Data Structures & Algorithms
- 14. 14Usage of secondary index Data Structures & Algorithms
- 15. 15 Tree Searching • In this type of trees, all the left descendants of a node with key ‘key’ have keys that are less than ‘key’ and all the the right descendants have keys that are greater than or equal to key. • In this method: • If the key is compared with the root. • If it is equal search is successful • If the key is lesser, the left child is compared. • If the key is greater, the right child is compared. • The above steps are repeated till the nodes are exhausted or the search is successful Data Structures & Algorithms
- 16. 16 • Algorithm for searching in a binary search tree is p = tree; While ( p!= null && key != k(p) ) p = (key < k(p)) ? left(p) : right(p); return(p); • The efficiency of the search process can be improved by using a sentinel. • All left and right tree nodes with null pointers now points to this sentinel. • While searching, the key is inserted into the sentinel . • After searching is over, if p equals the sentinel pointer, then the search is unsuccessful. Otherwise p points to the desired node. • A sorted array can be produced by traversing the tree in inorder. • From a sorted array, tree can be constructed in two different ways. 1. viewing the middle element as the root and the remaining elements as the left or right child as they are greater or smaller. – This tree will be a balanced one. 1. Viewing the first element as the root and each successive element as the right / left son of its predecessor. – This tree will be an unbalanced one. Data Structures & Algorithms
- 17. 17 Insertion and Deletion Algorithm for insertion into a Binary Search Tree q = null; p = tree; while ( p!= null) { if ( key = = k(p)) return(p); q = p; if ( key < k(p)) p = left(p); else p = right(p) ; } v = maketree(rec,key); if (q = = null ) tree = v; else if ( key < k(q) left(q) = v; else right(q) = v; return(v); • After insertion of a new node also, the order of the tree is maintained. Data Structures & Algorithms
- 18. 18 Deletion • Deletion involves deleting a node with key ‘key’ from a binary search tree. – If the node to be deleted has no sons, it may be deleted without further adjustment to the tree. – If the node has to be deleted has only one subtree, its only son can be moved up to take its place. – If the node to be deleted has two subtrees, its inorder successor must take its place. Efficiency – Varies between o(n) and o(logn) depending on the structure of the tree – Average search time is O(log n). Data Structures & Algorithms
- 19. 19 General Search Trees – General non binary trees are used as search tables, in external storage devices. – Multiway Search Trees – Digital Search Trees Data Structures & Algorithms
- 20. 20 Multiway Search Trees • A multi way search tree of order n is a general tree in which each node has n or fewer subtrees and contains one fewer key than it has subtrees. • Suppose if a node has four subtrees, it contains three keys. • Some terminologies related to multiway search tree are node(p) denotes node numtrees(p) denotes number of subtrees of node(p) numtrees(p) <= n, the order of the trees son(p,0),son(p,1),……son(p,numtrees(p)-1) denotes subtrees of node(p) k(p,0), k(p,1),. …… k(p,numtrees(p) –2) denotes keys in the node Data Structures & Algorithms
- 21. 21Multiway Search Tree Data Structures & Algorithms
- 22. 22 Balanced Multiway Search Tree Data Structures & Algorithms
- 23. 23 Algorithm to Search a Multiway Search Tree p = tree; if ( p = = null) { position = -1; return(-1) } I = nodesearch (p,key); if ( i< numtrees(p) – 1 && key = = k(p,I ) { position = i; return (p) } – The function nodesearch locates the smallest key in a node greater than or equal to the search argument. Data Structures & Algorithms
- 24. 24 • To insert into a multiway search tree, we have to search for the argument key. • If it is found it is returned to the node. • Otherwise the pointer is returned to the semi leaf nodes. • For example, consider what would happen if the keys were inserted in the order 73, 77, 84, 86, 87, 84, 85 in the following tree. ( in fig b.) Data Structures & Algorithms
- 25. 25 Data Structures & Algorithms
- 26. 26 Hashing • Hashing is a search technique, where the elements are ordered with respect to a key value. • The function that calculates the key value is hash function • A hash function maps a key to an integer. The key-value pairs are stored in array of buckets. • Bucket address calculated as bucket_address = hash(key)%hash_table_size. e.g.. 7321 is represented as 7321mod 100 = 21 is the address where it is stored. Data Structures & Algorithms
- 27. 27 Hash Tables / Direct Access Tables • A collection of n elements with unique keys, are stored in a direct access table T[m]. • The hash function is f(x) = x. • For a key k, if we access T[k] • if it contains an element, return it • if it doesn’t then return a null. • Thus the complexity of this will be O(1). • The keys should be unique • The range of the keys should be bounded. Data Structures & Algorithms
- 28. 28 Mapping Functions • The direct access approach is a one to one mapping from h(k) to k in (1,m). • This is perfect hashing function and maps each key to a distinct integer. • But finding such a perfect hash function is not possible always. • Sometimes h(k) may map several keys to an integer. • This situation is called collision. Data Structures & Algorithms
- 29. 29 Handling the Collisions Various techniques used to solve collision are: 1. Chaining 2. Re- Hashing 3. Linear Probing 4. Clustering 5. Quadratic Probing 6. Overflow area Data Structures & Algorithms
- 30. 30 Chaining • This technique builds a linked lists of all the items whose keys hash to the same values. • During search this short linked list is traversed for the desired key. • Unlimited number of collisions can be handled. • Prior knowledge of no: of items is not necessary Data Structures & Algorithms
- 31. 31 Linear Probing • Simplest of the re-hashing function. • Here f is a linear function i.e. f(x) = x. • Once there is a collision, the first free cell can be assigned . • To insert { 89, 18, 49, 58, 69} into a hash table, Let the function be f( x ) = x mod 10 89 9 18 8 49 9 again. Here 49 is moved to the next immediate position i.e. to 0. 58 1 and 69 2 New address are calculated quickly. Data Structures & Algorithms
- 32. 32 Re – Hashing • This scheme uses a second hashing operation when there is collision. • The function can be a new one or a re-application of the original function . • If the declared hash table is almost full, a new table of comparatively larger size will be declared and elements are shifted to it. • Suppose the original function is h(x) = x mod 7, then we can use h(x) = x mod 17 for rehashing. • This is a very expensive algorithm taking running time of O(n). Data Structures & Algorithms
- 33. 33 • When two keys that hash into different values compete with each other in successive rehashes. • This is primary clustering. • To resolve this clustering, hash function is applied continuously until an empty slot is found. • To eliminate this, h(x) is applied repeatedly. • First hash will yield h1 = h(x) + 1 %table size, second one h2 = h(x) +2 etc. • Different keys that hash to the same value follow the same rehash path. • This is called secondary clustering. • To eliminate this double hashing is used. • This involves use of two hash functions, h1(x) and h2(x). • Primary hash function is used. If there is a collision, secondary function is applied. Data Structures & AlgorithmsClustering
- 34. 34 • Different keys that hash to the same value follow the same rehash path. • This is called secondary clustering. • To eliminate this double hashing is used. • This involves use of two hash functions, h1(x) and h2(x). • Primary hash function is used. If there is a collision, secondary function is applied. Data Structures & Algorithms
- 35. 35 Quadratic Probing • In this method, secondary hash function will be a quadratic one. • Rehash Address = h(key) + ci2 • Rehashing scheme use the originally allocated table space and thus avoid linked list overhead. • Knowledge of number of elements is required to determine the function. Data Structures & Algorithms
- 36. 36 Overflow Area • This method involves dividing the pre- allocated table into two sections – the primary area to which keys are mapped and an area for collisions, called the Overflow Area. • When a collision occurs, a slot in the overflow area is used for the new element and a link from the primary area is established. • Access speed is more. Data Structures & Algorithms