2. EXAMPLE
• Design a system to store employees' information using their phone number as
key
• Operations: Insert, Search, Delete
• Some possible data structures:
• Array
• Linked List
• Balanced Binary Search Tree
• Direct Access Table
3. EXAMPLE
• Design a system to store employees' information using their phone number as
key
• Operations: Insert, Search, Delete
• Some possible data structures:
• Array: O(n) Search, Delete
• Linked List: O(n) Search
• Balanced Binary Search Tree: O(log n) All
• Direct Access Table: Space Wastage
=> Hash Table: O(1) All
4. BASICS
• Data Structure that implements Associative
Array
• Map key to corresponding value
• Use Hash function to compute index of key-
value pairs into an array of buckets
• O(1) complexity on average and O(n) in worst
case
5. HASHING
• Distribute the entries (key-value pairs) across an array of buckets
• Hash function: Map data of arbitrary size to data of fixed size
• Two steps:
1. hash = hash_func(key)
2. index = hash % table_size
6. CHOOSING A HASH FUNCTION
• Easy to compute
• Uniform Distribution
7. TYPES OF HASH FUNCTION
• Two types:
• Cryptographic hash
• Non-cryptographic hash
• Non-cryptographic hash provides weaker
guarantees than cryptographic hash in
exchange for performance improvements
• Example:
• Crypto: BLAKE2b, SHA-512, MD5, …
• Non-crypto: MurmurHash, xxHash, ...
• Cryptographic hash aims to provide
certain security guarantees
• Main properties of cryptographic hash:
• Deterministic
• Quick
• One-way function
• Avalanche effect
• Collision resistant
• Pre-image attack resistant
8. COLLISION RESOLUTION
• Two or more keys result in a same hash value
• Practically unavoidable
• Handling techniques:
• Separate chaining
• Open addressing
10. COLLISION RESOLUTION
SEPARATE CHAINING
• Make each cell of hash table point to a linked list of records that have same hash
function value
• Advantages:
• Simple to implement
• Hash table never fills up
• Disadvantages:
• Cache performance
• Space wastage
• Search time can become O(n) if chain
gets long
11. COLLISION RESOLUTION
OPEN ADDRESSING
• All elements are stored in the hash table itself
• Operations:
• Insert: Keep probing until an empty slot is found
• Search: Keep probing until key is found or an empty slot is reached
• Delete: If we simply delete a key, then search may fail. So slots of deleted keys are
marked specially as DELETED
14. COLLISION RESOLUTION
OPEN ADDRESSING
Types:
• Linear probing: Linearly probe for next slot => Clustering
index = [hash(x) + i] % S
• Quadratic probing: Look for i^2 slot in ith iteration
index = [hash(x) + i^2] % S
15. COLLISION RESOLUTION
OPEN ADDRESSING
Types:
• Linear probing: Linearly probe for next slot => Clustering
index = [hash(x) + i] % S
• Quadratic probing: Look for i^2 slot in ith iteration
index = [hash(x) + i^2] % S
• Double hashing: Use another hash function hash2(x) and look for i*hash2(x) in ith
iteration
index = [hash(x) + i*hash2(x)] % S
16. COLLISION RESOLUTION
OPEN ADDRESSING
Comparison:
• Linear probing:
• Easy to compute
• Best cache performance
• Suffers from Clustering
• Quadratic probing:
• Lies between cache performance and clustering
• Double hashing:
• Poor cache performance
• No clustering
• More computation time
17. COLLISION RESOLUTION
OPEN ADDRESSING
• Advantages:
• Better cache performance
• Better space usage
• Disadvantages:
• Harder to implement
• Hash table may become full
• Clustering
18. DYNAMIC RESIZING
• Load factor = number of entries / number of buckets
• When load factor is too low or too high => Dynamic resizing
• Approaches:
• Complete resizing
• Incremental resizing
Associative Array is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection.
Collision occurs when a newly inserted key maps to an already occupied slot in hash table
Example: Hash function = key % 7
Space wastage: some buckets may never be used, extra space to store links
At any point, size of table must be greater than or equal to total number of keys
Insert can insert an item in a deleted slot, but search doesn’t stop at a deleted slot.
Example: Hash function = key % 7
Clustering: Many consecutive elements form groups and it starts taking time to find a free slot or to search an element
Better space usage: bucket co the luu key cua bucket khac neu co collision, no extra space for links
During the resize, allocate the new hash table, but keep the old table unchanged.
In each lookup or delete operation, check both tables.
Perform insertion operations only in the new table.
At each insertion also move r elements from the old table to the new table.
When all elements are removed from the old table, deallocate it.