SlideShare una empresa de Scribd logo
1 de 137
Descargar para leer sin conexión
Analysis of Algorithms
Hash Tables
Andres Mendez-Vazquez
October 3, 2014
1 / 81
Outline
1 Basic data structures and operations
2 Hash tables
Hash tables: Concepts
Analysis of hashing under Chaining
3 Hashing Methods
The Division Method
The Multiplication Method
Clustering Analysis of Hashing Functions
A Possible Solution: Universal Hashing
4 Open Addressing
Linear Probing
Quadratic Probing
Double Hashing
5 Excercises
2 / 81
First: About Basic Data Structures
Remark
It is quite interesting to notice that many data structures actually share
similar operations!!!
Yes
If you think them as ADT
3 / 81
First: About Basic Data Structures
Remark
It is quite interesting to notice that many data structures actually share
similar operations!!!
Yes
If you think them as ADT
3 / 81
Examples
Search(S,k)
Example: Search in a BST
8
3
1 6
4 7
10
14
13
k=7
4 / 81
Examples
Insert(S,x)
Example: Insert in a linked list
CA EDB
firstNode
NULL
K
5 / 81
And Again
Delete(S,x)
Example: Delete in a BST
8
3
1 4
7
10
14
13
8
3
1 6
4 7
10
14
13
Delete = 6
6 / 81
Basic data structures and operations.
Therefore
This are basic structures, it is up to you to read about them.
Chapter 10 Cormen’s book
7 / 81
Outline
1 Basic data structures and operations
2 Hash tables
Hash tables: Concepts
Analysis of hashing under Chaining
3 Hashing Methods
The Division Method
The Multiplication Method
Clustering Analysis of Hashing Functions
A Possible Solution: Universal Hashing
4 Open Addressing
Linear Probing
Quadratic Probing
Double Hashing
5 Excercises
8 / 81
Hash tables: Concepts
Definition
A hash table or hash map T is a data structure, most commonly an
array, that uses a hash function to efficiently map certain identifiers of
keys (e.g. person names) to associated values.
Advantages
They have the advantage of having a expected complexity of
operations of O(1 + α)
Still, be aware of α
However, If you have a large number of keys, U
Then, it is impractical to store a table of the size of |U|.
Thus, you can use a hash function h : U→{0, 1, ..., m − 1}
9 / 81
Hash tables: Concepts
Definition
A hash table or hash map T is a data structure, most commonly an
array, that uses a hash function to efficiently map certain identifiers of
keys (e.g. person names) to associated values.
Advantages
They have the advantage of having a expected complexity of
operations of O(1 + α)
Still, be aware of α
However, If you have a large number of keys, U
Then, it is impractical to store a table of the size of |U|.
Thus, you can use a hash function h : U→{0, 1, ..., m − 1}
9 / 81
Hash tables: Concepts
Definition
A hash table or hash map T is a data structure, most commonly an
array, that uses a hash function to efficiently map certain identifiers of
keys (e.g. person names) to associated values.
Advantages
They have the advantage of having a expected complexity of
operations of O(1 + α)
Still, be aware of α
However, If you have a large number of keys, U
Then, it is impractical to store a table of the size of |U|.
Thus, you can use a hash function h : U→{0, 1, ..., m − 1}
9 / 81
When you have a small universe of keys, U
Remarks
It is not necessary to map the key values.
Key values are direct addresses in the array.
Direct implementation or Direct-address tables.
Operations
1 Direct-Address-Search(T, k)
return T[k]
2 Direct-Address-Search(T, x)
T [x.key] = x
3 Direct-Address-Delete(T, x)
T [x.key] = NIL
10 / 81
When you have a small universe of keys, U
Remarks
It is not necessary to map the key values.
Key values are direct addresses in the array.
Direct implementation or Direct-address tables.
Operations
1 Direct-Address-Search(T, k)
return T[k]
2 Direct-Address-Search(T, x)
T [x.key] = x
3 Direct-Address-Delete(T, x)
T [x.key] = NIL
10 / 81
When you have a large universe of keys, U
Then
Then, it is impractical to store a table of the size of |U|.
You can use a especial function for mapping
h : U→{0, 1, ..., m − 1} (1)
Problem
With a large enough universe U, two keys can hash to the same value
This is called a collision.
11 / 81
When you have a large universe of keys, U
Then
Then, it is impractical to store a table of the size of |U|.
You can use a especial function for mapping
h : U→{0, 1, ..., m − 1} (1)
Problem
With a large enough universe U, two keys can hash to the same value
This is called a collision.
11 / 81
When you have a large universe of keys, U
Then
Then, it is impractical to store a table of the size of |U|.
You can use a especial function for mapping
h : U→{0, 1, ..., m − 1} (1)
Problem
With a large enough universe U, two keys can hash to the same value
This is called a collision.
11 / 81
Collisions
This is a problem
We might try to avoid this by using a suitable hash function h.
Idea
Make appear to be “random” enough to avoid collisions altogether
(Highly Improbable) or to minimize the probability of them.
You still have the problem of collisions
Possible Solutions to the problem:
1 Chaining
2 Open Addressing
12 / 81
Collisions
This is a problem
We might try to avoid this by using a suitable hash function h.
Idea
Make appear to be “random” enough to avoid collisions altogether
(Highly Improbable) or to minimize the probability of them.
You still have the problem of collisions
Possible Solutions to the problem:
1 Chaining
2 Open Addressing
12 / 81
Collisions
This is a problem
We might try to avoid this by using a suitable hash function h.
Idea
Make appear to be “random” enough to avoid collisions altogether
(Highly Improbable) or to minimize the probability of them.
You still have the problem of collisions
Possible Solutions to the problem:
1 Chaining
2 Open Addressing
12 / 81
Hash tables: Chaining
A Possible Solution
Insert the elements that hash to the same slot into a linked list.
U
(Universe of Keys)
13 / 81
Outline
1 Basic data structures and operations
2 Hash tables
Hash tables: Concepts
Analysis of hashing under Chaining
3 Hashing Methods
The Division Method
The Multiplication Method
Clustering Analysis of Hashing Functions
A Possible Solution: Universal Hashing
4 Open Addressing
Linear Probing
Quadratic Probing
Double Hashing
5 Excercises
14 / 81
Analysis of hashing with Chaining: Assumptions
Assumptions
We have a load factor α = n
m , where m is the size of the hash table
T, and n is the number of elements to store.
Simple uniform hashing property:
This means that any of the m slots can be selected.
This means that if n = n0 + n1 + ... + nm−1, we have that E(nj ) = α.
To simplify the analysis, you need to consider two cases
Unsuccessful search
Successful search
15 / 81
Analysis of hashing with Chaining: Assumptions
Assumptions
We have a load factor α = n
m , where m is the size of the hash table
T, and n is the number of elements to store.
Simple uniform hashing property:
This means that any of the m slots can be selected.
This means that if n = n0 + n1 + ... + nm−1, we have that E(nj ) = α.
To simplify the analysis, you need to consider two cases
Unsuccessful search
Successful search
15 / 81
Why?
After all
You are always looking for keys when
Searching
Inserting
Deleting
It is clear that we have two possibilities
Finding the key or not finding the key
16 / 81
Why?
After all
You are always looking for keys when
Searching
Inserting
Deleting
It is clear that we have two possibilities
Finding the key or not finding the key
16 / 81
For this, we have the following theorems
Theorem 11.1
In a hash table in which collisions are resolved by chaining, an unsuccessful
search takes average-case time Θ (1 + α), under the assumption of simple
uniform hashing.
Theorem 11.2
In a hash table in which collisions are resolved by chaining, a successful
search takes average-case time Θ (1 + α) under the assumption of simple
uniform hashing.
17 / 81
For this, we have the following theorems
Theorem 11.1
In a hash table in which collisions are resolved by chaining, an unsuccessful
search takes average-case time Θ (1 + α), under the assumption of simple
uniform hashing.
Theorem 11.2
In a hash table in which collisions are resolved by chaining, a successful
search takes average-case time Θ (1 + α) under the assumption of simple
uniform hashing.
17 / 81
Analysis of hashing: Constant time.
Finally
These two theorems tell us that if n = O(m)
α = n
m = O(m)
m = O(1)
Or search time is constant.
18 / 81
Analysis of hashing: Which hash function?
Consider that:
Good hash functions should maintain the property of simple uniform
hashing!
The keys have the same probability 1/m to be hashed to any bucket!!!
A uniform hash function minimizes the likelihood of an overflow when
keys are selected at random.
Then:
What should we use?
If we know how the keys are distributed uniformly at the following
interval 0 ≤ k < 1 then h(k) = km .
19 / 81
Analysis of hashing: Which hash function?
Consider that:
Good hash functions should maintain the property of simple uniform
hashing!
The keys have the same probability 1/m to be hashed to any bucket!!!
A uniform hash function minimizes the likelihood of an overflow when
keys are selected at random.
Then:
What should we use?
If we know how the keys are distributed uniformly at the following
interval 0 ≤ k < 1 then h(k) = km .
19 / 81
What if...
Question:
What about something with keys in a normal distribution?
20 / 81
Possible hash functions when the keys are natural numbers
The division method
h(k) = k mod m.
Good choices for m are primes not too close to a power of 2.
The multiplication method
h(k) = m(kA mod 1) with 0 < A < 1.
The value of m is not critical.
Easy to implement in a computer.
21 / 81
Possible hash functions when the keys are natural numbers
The division method
h(k) = k mod m.
Good choices for m are primes not too close to a power of 2.
The multiplication method
h(k) = m(kA mod 1) with 0 < A < 1.
The value of m is not critical.
Easy to implement in a computer.
21 / 81
When they are not, we need to interpreting the keys as
natural numbers
Keys interpreted as natural numbers
Given a string “pt”, we can say p = 112 and t=116 (ASCII numbers)
ASCII has 128 possible symbols.
Then (128 × 112) + 1280
× 116 = 14452
Nevertheless
This is highly dependent on the origins of the keys!!!
22 / 81
Outline
1 Basic data structures and operations
2 Hash tables
Hash tables: Concepts
Analysis of hashing under Chaining
3 Hashing Methods
The Division Method
The Multiplication Method
Clustering Analysis of Hashing Functions
A Possible Solution: Universal Hashing
4 Open Addressing
Linear Probing
Quadratic Probing
Double Hashing
5 Excercises
23 / 81
Hashing methods: The division method
Hash function
h(k) = k mod m
Problems with some selections
m = 2p, h(k) is only the p lowest-order bits.
m = 2p − 1, when k is interpreted as a character string interpreted in
radix 2p, permuting characters in k does not change the value.
It is better to select
Prime numbers not too close to an exact power of two.
For example, given n = 2000 elements.
We can use m = 701 because it is near to 2000/3 but not near a power
of two.
24 / 81
Hashing methods: The division method
Hash function
h(k) = k mod m
Problems with some selections
m = 2p, h(k) is only the p lowest-order bits.
m = 2p − 1, when k is interpreted as a character string interpreted in
radix 2p, permuting characters in k does not change the value.
It is better to select
Prime numbers not too close to an exact power of two.
For example, given n = 2000 elements.
We can use m = 701 because it is near to 2000/3 but not near a power
of two.
24 / 81
Hashing methods: The division method
Hash function
h(k) = k mod m
Problems with some selections
m = 2p, h(k) is only the p lowest-order bits.
m = 2p − 1, when k is interpreted as a character string interpreted in
radix 2p, permuting characters in k does not change the value.
It is better to select
Prime numbers not too close to an exact power of two.
For example, given n = 2000 elements.
We can use m = 701 because it is near to 2000/3 but not near a power
of two.
24 / 81
Outline
1 Basic data structures and operations
2 Hash tables
Hash tables: Concepts
Analysis of hashing under Chaining
3 Hashing Methods
The Division Method
The Multiplication Method
Clustering Analysis of Hashing Functions
A Possible Solution: Universal Hashing
4 Open Addressing
Linear Probing
Quadratic Probing
Double Hashing
5 Excercises
25 / 81
Hashing methods: The multiplication method
The multiplication method for creating hash functions has two steps
1 Multiply the key k by a constant A in the range 0 < A < 1 and
extract the fractional part of kA.
2 Then, you multiply the value by m an take the floor,
h(k) = m (kA mod 1) .
The mod allows to extract that fractional part!!!
kA mod 1 = kA − kA , 0 < A < 1.
Advantages:
m is not critical, normally m = 2p.
26 / 81
Hashing methods: The multiplication method
The multiplication method for creating hash functions has two steps
1 Multiply the key k by a constant A in the range 0 < A < 1 and
extract the fractional part of kA.
2 Then, you multiply the value by m an take the floor,
h(k) = m (kA mod 1) .
The mod allows to extract that fractional part!!!
kA mod 1 = kA − kA , 0 < A < 1.
Advantages:
m is not critical, normally m = 2p.
26 / 81
Hashing methods: The multiplication method
The multiplication method for creating hash functions has two steps
1 Multiply the key k by a constant A in the range 0 < A < 1 and
extract the fractional part of kA.
2 Then, you multiply the value by m an take the floor,
h(k) = m (kA mod 1) .
The mod allows to extract that fractional part!!!
kA mod 1 = kA − kA , 0 < A < 1.
Advantages:
m is not critical, normally m = 2p.
26 / 81
Implementing in a computer
First
First, imagine that the word in a machine has w bits size and k fits on
those bits.
Second
Then, select an s in the range 0 < s < 2w and assume A = s
2w .
Third
Now, we multiply k by the number s = A2w .
27 / 81
Implementing in a computer
First
First, imagine that the word in a machine has w bits size and k fits on
those bits.
Second
Then, select an s in the range 0 < s < 2w and assume A = s
2w .
Third
Now, we multiply k by the number s = A2w .
27 / 81
Implementing in a computer
First
First, imagine that the word in a machine has w bits size and k fits on
those bits.
Second
Then, select an s in the range 0 < s < 2w and assume A = s
2w .
Third
Now, we multiply k by the number s = A2w .
27 / 81
Example
Fourth
The result of that is r12w + r0, a 2w-bit value word, where the first p-most
significative bits of r0 are the desired hash value.
Graphically
28 / 81
Example
Fourth
The result of that is r12w + r0, a 2w-bit value word, where the first p-most
significative bits of r0 are the desired hash value.
Graphically
extract p bits
28 / 81
Outline
1 Basic data structures and operations
2 Hash tables
Hash tables: Concepts
Analysis of hashing under Chaining
3 Hashing Methods
The Division Method
The Multiplication Method
Clustering Analysis of Hashing Functions
A Possible Solution: Universal Hashing
4 Open Addressing
Linear Probing
Quadratic Probing
Double Hashing
5 Excercises
29 / 81
However
Sooner or Latter
We can pick up a hash function that does not give us the desired uniform
randomized property
Thus
We are required to analyze the possible clustering of the data by the hash
function
30 / 81
However
Sooner or Latter
We can pick up a hash function that does not give us the desired uniform
randomized property
Thus
We are required to analyze the possible clustering of the data by the hash
function
30 / 81
Measuring Clustering through a metric C
Definition
If bucket i contains ni elements, then
C =
m
n − 1
m
i=1 n2
i
n
− 1 (2)
Properties
1 If C = 1, then you have uniform hashing.
2 If C > 1, it means that the performance of the hash table is slowed
down by clustering by approximately a factor of C.
3 If C < 1, the spread of the elements is more even than uniform!!! Not
going to happen!!!
31 / 81
Measuring Clustering through a metric C
Definition
If bucket i contains ni elements, then
C =
m
n − 1
m
i=1 n2
i
n
− 1 (2)
Properties
1 If C = 1, then you have uniform hashing.
2 If C > 1, it means that the performance of the hash table is slowed
down by clustering by approximately a factor of C.
3 If C < 1, the spread of the elements is more even than uniform!!! Not
going to happen!!!
31 / 81
However
Unfortunately
Hash table do not give a way to measure clustering
Thus, table designers
They should provide some clustering estimation as part of the interface.
Thus
The reason the clustering measure works is because it is based on an
estimate of the variance of the distribution of bucket sizes.
32 / 81
However
Unfortunately
Hash table do not give a way to measure clustering
Thus, table designers
They should provide some clustering estimation as part of the interface.
Thus
The reason the clustering measure works is because it is based on an
estimate of the variance of the distribution of bucket sizes.
32 / 81
However
Unfortunately
Hash table do not give a way to measure clustering
Thus, table designers
They should provide some clustering estimation as part of the interface.
Thus
The reason the clustering measure works is because it is based on an
estimate of the variance of the distribution of bucket sizes.
32 / 81
Thus
First
If clustering is occurring, some buckets will have more elements than they
should, and some will have fewer.
Second
There will be a wider range of bucket sizes than one would expect from
a random hash function.
33 / 81
Thus
First
If clustering is occurring, some buckets will have more elements than they
should, and some will have fewer.
Second
There will be a wider range of bucket sizes than one would expect from
a random hash function.
33 / 81
Analysis of C: First, keys are uniformly distributed
Consider the following random variable
Consider bucket i containing ni elements, with Xij= I{element j lands in
bucket i}
Then, given
ni =
n
j=1
Xij (3)
We have that
E [Xij] =
1
m
, E X2
ij =
1
m
(4)
34 / 81
Analysis of C: First, keys are uniformly distributed
Consider the following random variable
Consider bucket i containing ni elements, with Xij= I{element j lands in
bucket i}
Then, given
ni =
n
j=1
Xij (3)
We have that
E [Xij] =
1
m
, E X2
ij =
1
m
(4)
34 / 81
Analysis of C: First, keys are uniformly distributed
Consider the following random variable
Consider bucket i containing ni elements, with Xij= I{element j lands in
bucket i}
Then, given
ni =
n
j=1
Xij (3)
We have that
E [Xij] =
1
m
, E X2
ij =
1
m
(4)
34 / 81
Next
We look at the dispersion of Xij
Var [Xij] = E X2
ij − (E [Xij])2
=
1
m
−
1
m2
(5)
What about the expected number of elements at each bucket
E [ni ] = E


n
j=1
Xij

 =
n
m
= α (6)
35 / 81
Next
We look at the dispersion of Xij
Var [Xij] = E X2
ij − (E [Xij])2
=
1
m
−
1
m2
(5)
What about the expected number of elements at each bucket
E [ni ] = E


n
j=1
Xij

 =
n
m
= α (6)
35 / 81
Then, we have
Because independence of {Xij}, the scattering of ni
Var [ni ] = Var


n
j=1
Xij


=
n
j=1
Var [Xij]
= nVar [Xij]
36 / 81
Then
What about the range of the possible number of elements at each
bucket?
Var [ni ] =
n
m
−
n
m2
= α −
α
m
But, we have that
E n2
i = E


n
j=1
X2
ij +
n
j=1
n
k=1,k=j
XijXik

 (7)
Or
E n2
i =
n
m
+
n
j=1
n
k=1,k=j
1
m2
(8)
37 / 81
Then
What about the range of the possible number of elements at each
bucket?
Var [ni ] =
n
m
−
n
m2
= α −
α
m
But, we have that
E n2
i = E


n
j=1
X2
ij +
n
j=1
n
k=1,k=j
XijXik

 (7)
Or
E n2
i =
n
m
+
n
j=1
n
k=1,k=j
1
m2
(8)
37 / 81
Then
What about the range of the possible number of elements at each
bucket?
Var [ni ] =
n
m
−
n
m2
= α −
α
m
But, we have that
E n2
i = E


n
j=1
X2
ij +
n
j=1
n
k=1,k=j
XijXik

 (7)
Or
E n2
i =
n
m
+
n
j=1
n
k=1,k=j
1
m2
(8)
37 / 81
Thus
We re-express the range on term of expected values of ni
E n2
i =
n
m
+
n (n − 1)
m2
(9)
Then
E n2
i − E [ni ]2
=
n
m
+
n (n − 1)
m2
−
n2
m2
=
n
m
−
n
m2
= α −
α
m
38 / 81
Thus
We re-express the range on term of expected values of ni
E n2
i =
n
m
+
n (n − 1)
m2
(9)
Then
E n2
i − E [ni ]2
=
n
m
+
n (n − 1)
m2
−
n2
m2
=
n
m
−
n
m2
= α −
α
m
38 / 81
Then
Finally, we have that
E n2
i = α 1 −
1
m
+ α2
(10)
39 / 81
Then, we have that
Now we build an estimator of the mean of n2
i which is part of C
1
n
m
i=1
n2
i (11)
Thus
E
1
n
m
i=1
n2
i =
1
n
m
i=1
E n2
i
=
m
n
α 1 −
1
m
+ α2
=
1
α
α 1 −
1
m
+ α2
= 1 −
1
m
+ α
40 / 81
Then, we have that
Now we build an estimator of the mean of n2
i which is part of C
1
n
m
i=1
n2
i (11)
Thus
E
1
n
m
i=1
n2
i =
1
n
m
i=1
E n2
i
=
m
n
α 1 −
1
m
+ α2
=
1
α
α 1 −
1
m
+ α2
= 1 −
1
m
+ α
40 / 81
Finally
We can plug back on C using the expected value
E [C] =
m
n − 1
E
m
i=1 n2
i
n
− 1
=
m
n − 1
1 −
1
m
+ α − 1
=
m
n − 1
n
m
−
1
m
=
m
n − 1
n − 1
m
= 1
41 / 81
Explanation
Using a hash table that enforce a uniform distribution in the buckets
We get that C = 1 or the best distribution of keys
42 / 81
Now, we have a really horrible hash function ≡ It hits only
one of every b buckets
Thus
E [Xij] = E X2
ij =
b
m
(12)
Thus, we have
E [ni ] = αb (13)
Then, we have
E
1
n
m
i=1
n2
i =
1
n
m
i=1
E n2
i
= αb −
b
m
+ 1
43 / 81
Now, we have a really horrible hash function ≡ It hits only
one of every b buckets
Thus
E [Xij] = E X2
ij =
b
m
(12)
Thus, we have
E [ni ] = αb (13)
Then, we have
E
1
n
m
i=1
n2
i =
1
n
m
i=1
E n2
i
= αb −
b
m
+ 1
43 / 81
Now, we have a really horrible hash function ≡ It hits only
one of every b buckets
Thus
E [Xij] = E X2
ij =
b
m
(12)
Thus, we have
E [ni ] = αb (13)
Then, we have
E
1
n
m
i=1
n2
i =
1
n
m
i=1
E n2
i
= αb −
b
m
+ 1
43 / 81
Finally
We can plug back on C using the expected value
E [C] =
m
n − 1
E
m
i=1 n2
i
n
− 1
=
m
n − 1
αb −
b
m
+ 1 − 1
=
m
n − 1
nb
m
−
b
m
=
m
n − 1
b (n − 1)
m
= b
44 / 81
Explanation
Using a hash table that enforce a uniform distribution in the buckets
We get that C = b > 1 or a really bad distribution of the keys!!!
Thus, you only need the following to evaluate a hash function
1
n
m
i=1
n2
i (14)
45 / 81
Explanation
Using a hash table that enforce a uniform distribution in the buckets
We get that C = b > 1 or a really bad distribution of the keys!!!
Thus, you only need the following to evaluate a hash function
1
n
m
i=1
n2
i (14)
45 / 81
Outline
1 Basic data structures and operations
2 Hash tables
Hash tables: Concepts
Analysis of hashing under Chaining
3 Hashing Methods
The Division Method
The Multiplication Method
Clustering Analysis of Hashing Functions
A Possible Solution: Universal Hashing
4 Open Addressing
Linear Probing
Quadratic Probing
Double Hashing
5 Excercises
46 / 81
A Possible Solution: Universal Hashing
Issues
In practice, keys are not randomly distributed.
Any fixed hash function might yield retrieval Θ(n) time.
Goal
To find hash functions that produce uniform random table indexes
irrespective of the keys.
Idea
To select a hash function at random from a designed class of functions at
the beginning of the execution.
47 / 81
A Possible Solution: Universal Hashing
Issues
In practice, keys are not randomly distributed.
Any fixed hash function might yield retrieval Θ(n) time.
Goal
To find hash functions that produce uniform random table indexes
irrespective of the keys.
Idea
To select a hash function at random from a designed class of functions at
the beginning of the execution.
47 / 81
A Possible Solution: Universal Hashing
Issues
In practice, keys are not randomly distributed.
Any fixed hash function might yield retrieval Θ(n) time.
Goal
To find hash functions that produce uniform random table indexes
irrespective of the keys.
Idea
To select a hash function at random from a designed class of functions at
the beginning of the execution.
47 / 81
Hashing methods: Universal hashing
Example
Set of hash functions
Choose a hash function randomly
(At the beginning of the execution)
HASH TABLE
48 / 81
Definition of universal hash functions
Definition
Let H = {h : U → {0, 1, ..., m − 1}} be a family of hash functions. H is
called a universal family if
∀x, y ∈ U, x = y : Pr
h∈H
(h(x) = h(y)) ≤
1
m
(15)
Main result
With universal hashing the chance of collision between distinct keys k and
l is no more than the 1
m chance of collision if locations h(k) and h(l) were
randomly and independently chosen from the set {0, 1, ..., m − 1}.
49 / 81
Definition of universal hash functions
Definition
Let H = {h : U → {0, 1, ..., m − 1}} be a family of hash functions. H is
called a universal family if
∀x, y ∈ U, x = y : Pr
h∈H
(h(x) = h(y)) ≤
1
m
(15)
Main result
With universal hashing the chance of collision between distinct keys k and
l is no more than the 1
m chance of collision if locations h(k) and h(l) were
randomly and independently chosen from the set {0, 1, ..., m − 1}.
49 / 81
Hashing methods: Universal hashing
Theorem 11.3
Suppose that a hash function h is chosen randomly from a universal
collection of hash functions and has been used to hash n keys into a table
T of size m, using chaining to resolve collisions. If key k is not in the
table, then the expected length E[nh(k)] of the list that key k hashes to is
at most the load factor α = n
m . If key k is in the table, then the expected
length E[nh(k)] of the list containing key k is at most 1 + α.
Corollary 11.4
Using universal hashing and collision resolution by chaining in an initially
empty table with m slots, it takes expected time Θ(n) to handle any
sequence of n INSERT, SEARCH, and DELETE operations O(m) INSERT
operations.
50 / 81
Hashing methods: Universal hashing
Theorem 11.3
Suppose that a hash function h is chosen randomly from a universal
collection of hash functions and has been used to hash n keys into a table
T of size m, using chaining to resolve collisions. If key k is not in the
table, then the expected length E[nh(k)] of the list that key k hashes to is
at most the load factor α = n
m . If key k is in the table, then the expected
length E[nh(k)] of the list containing key k is at most 1 + α.
Corollary 11.4
Using universal hashing and collision resolution by chaining in an initially
empty table with m slots, it takes expected time Θ(n) to handle any
sequence of n INSERT, SEARCH, and DELETE operations O(m) INSERT
operations.
50 / 81
Example of Universal Hash
Proceed as follows:
Choose a primer number p large enough so that every possible key k is in
the range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗
p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗
p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗
p and b ∈ Zp}
Important
a and b are chosen randomly at the beginning of execution.
The class Hp,m of hash functions is universal.
51 / 81
Example of Universal Hash
Proceed as follows:
Choose a primer number p large enough so that every possible key k is in
the range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗
p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗
p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗
p and b ∈ Zp}
Important
a and b are chosen randomly at the beginning of execution.
The class Hp,m of hash functions is universal.
51 / 81
Example: Universal hash functions
Example
p = 977, m = 50, a and b random numbers
ha,b(k) = ((ak + b) mod p) mod m
52 / 81
Example of key distribution
Example, mean = 488.5 and dispersion = 5
53 / 81
Example with 10 keys
Universal Hashing Vs Division Method
54 / 81
Example with 50 keys
Universal Hashing Vs Division Method
55 / 81
Example with 100 keys
Universal Hashing Vs Division Method
56 / 81
Example with 200 keys
Universal Hashing Vs Division Method
57 / 81
Another Example: Matrix Method
Then
Let us say keys are u-bits long.
Say the table size M is power of 2.
an index is b-bits long with M = 2b.
The h function
Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx
where after the inner product we apply mod 2
Example
h
b



1 0 0 0
0 1 1 1
1 1 1 0



u
x




1
0
1
0





=
h (x)


1
1
0



58 / 81
Another Example: Matrix Method
Then
Let us say keys are u-bits long.
Say the table size M is power of 2.
an index is b-bits long with M = 2b.
The h function
Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx
where after the inner product we apply mod 2
Example
h
b



1 0 0 0
0 1 1 1
1 1 1 0



u
x




1
0
1
0





=
h (x)


1
1
0



58 / 81
Another Example: Matrix Method
Then
Let us say keys are u-bits long.
Say the table size M is power of 2.
an index is b-bits long with M = 2b.
The h function
Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx
where after the inner product we apply mod 2
Example
h
b



1 0 0 0
0 1 1 1
1 1 1 0



u
x




1
0
1
0





=
h (x)


1
1
0



58 / 81
Proof of being a Universal Family
First fix assume that you have two different keys l = m
Without loosing generality assume the following
1 li = mi ⇒ li = 0 and mi = 1
2 lj = mj ∀j = i
Thus
The column i does not contribute to the final answer of h (l) because of
the zero!!!
Now
Imagine that we fix all the other columns in h, thus there is only one
answer for h (l)
59 / 81
Proof of being a Universal Family
First fix assume that you have two different keys l = m
Without loosing generality assume the following
1 li = mi ⇒ li = 0 and mi = 1
2 lj = mj ∀j = i
Thus
The column i does not contribute to the final answer of h (l) because of
the zero!!!
Now
Imagine that we fix all the other columns in h, thus there is only one
answer for h (l)
59 / 81
Proof of being a Universal Family
First fix assume that you have two different keys l = m
Without loosing generality assume the following
1 li = mi ⇒ li = 0 and mi = 1
2 lj = mj ∀j = i
Thus
The column i does not contribute to the final answer of h (l) because of
the zero!!!
Now
Imagine that we fix all the other columns in h, thus there is only one
answer for h (l)
59 / 81
Now
For ith column
There are 2b possible columns when changing the ones and zeros
Thus, given the randomness of the zeros and ones
The probability that we get the zero column









0
0
...
...
0









(16)
is equal
1
2b
(17)
with h (l) = h (m)
60 / 81
Now
For ith column
There are 2b possible columns when changing the ones and zeros
Thus, given the randomness of the zeros and ones
The probability that we get the zero column









0
0
...
...
0









(16)
is equal
1
2b
(17)
with h (l) = h (m)
60 / 81
Then
We get the probability
P (h (l) = h (m)) ≤
1
2b
(18)
61 / 81
Implementation of the column*vector mod 2
Code
i n t product ( i n t row , i n t v e c t o r ){
i n t i = row & v e c t o r ;
i = i − (( i >> 1) & 0x55555555 ) ;
i = ( i & 0x33333333 ) + (( i >> 2) & 0x33333333 ) ;
i = ( ( ( i + ( i >> 4)) & 0x0F0F0F0F ) ∗ 0x01010101 ) >> 24;
r e t u r n i & i & 0x00000001 ;
}
62 / 81
Advantages of universal hashing
Advantages
Universal hashing provides good results on average, independently of
the keys to be stored.
Guarantees that no input will always elicit the worst-case behavior.
Poor performance occurs only when the random choice returns an
inefficient hash function; this has a small probability.
63 / 81
Open addressing
Definition
All the elements occupy the hash table itself.
What is it?
We systematically examine table slots until either we find the desired
element or we have ascertained that the element is not in the table.
Advantages
The advantage of open addressing is that it avoids pointers altogether.
64 / 81
Open addressing
Definition
All the elements occupy the hash table itself.
What is it?
We systematically examine table slots until either we find the desired
element or we have ascertained that the element is not in the table.
Advantages
The advantage of open addressing is that it avoids pointers altogether.
64 / 81
Open addressing
Definition
All the elements occupy the hash table itself.
What is it?
We systematically examine table slots until either we find the desired
element or we have ascertained that the element is not in the table.
Advantages
The advantage of open addressing is that it avoids pointers altogether.
64 / 81
Insert in Open addressing
Extended hash function to probe
Instead of being fixed in the order 0, 1, 2, ..., m − 1 with Θ (n) search
time
Extend the hash function to
h : U × {0, 1, ..., m − 1} → {0, 1, ..., m − 1}
This gives the probe sequence h(k, 0), h(k, 1), ..., h(k, m − 1)
A permutation of 0, 1, 2, ..., m − 1
65 / 81
Hashing methods in Open Addressing
HASH-INSERT(T, k)
1 i = 0
2 repeat
3 j = h (k, i)
4 if T [j] == NIL
5 T [j] = k
6 return j
7 else i = i + 1
8 until i == m
9 error “Hash Table Overflow”
66 / 81
Hashing methods in Open Addressing
HASH-SEARCH(T,k)
1 i = 0
2 repeat
3 j = h (k, i)
4 if T [j] == k
5 return j
6 i = i + 1
7 until T [j] == NIL or i == m
8 return NIL
67 / 81
Outline
1 Basic data structures and operations
2 Hash tables
Hash tables: Concepts
Analysis of hashing under Chaining
3 Hashing Methods
The Division Method
The Multiplication Method
Clustering Analysis of Hashing Functions
A Possible Solution: Universal Hashing
4 Open Addressing
Linear Probing
Quadratic Probing
Double Hashing
5 Excercises
68 / 81
Linear probing: Definition and properties
Hash function
Given an ordinary hash function h : 0, 1, ..., m − 1 → U for
i = 0, 1, ..., m − 1, we get the extended hash function
h(k, i) = h (k) + i mod m, (19)
Sequence of probes
Given key k, we first probe T[h (k)], then T[h (k) + 1] and so on until
T[m − 1]. Then, we wrap around T[0] to T[h (k) − 1].
Distinct probes
Because the initial probe determines the entire probe sequence, there are
m distinct probe sequences.
69 / 81
Linear probing: Definition and properties
Hash function
Given an ordinary hash function h : 0, 1, ..., m − 1 → U for
i = 0, 1, ..., m − 1, we get the extended hash function
h(k, i) = h (k) + i mod m, (19)
Sequence of probes
Given key k, we first probe T[h (k)], then T[h (k) + 1] and so on until
T[m − 1]. Then, we wrap around T[0] to T[h (k) − 1].
Distinct probes
Because the initial probe determines the entire probe sequence, there are
m distinct probe sequences.
69 / 81
Linear probing: Definition and properties
Hash function
Given an ordinary hash function h : 0, 1, ..., m − 1 → U for
i = 0, 1, ..., m − 1, we get the extended hash function
h(k, i) = h (k) + i mod m, (19)
Sequence of probes
Given key k, we first probe T[h (k)], then T[h (k) + 1] and so on until
T[m − 1]. Then, we wrap around T[0] to T[h (k) − 1].
Distinct probes
Because the initial probe determines the entire probe sequence, there are
m distinct probe sequences.
69 / 81
Linear probing: Definition and properties
Disadvantages
Linear probing suffers of primary clustering.
Long runs of occupied slots build up increasing the average search
time.
Clusters arise because an empty slot preceded by i full slots gets filled
next with probability i+1
m .
Long runs of occupied slots tend to get longer, and the average
search time increases.
70 / 81
Example
Example using keys uniformly distributed
It was generated using the division method
Then
71 / 81
Example
Example using keys uniformly distributed
It was generated using the division method
Then
71 / 81
Example
Example using Gaussian keys
It was generated using the division method
Then
72 / 81
Example
Example using Gaussian keys
It was generated using the division method
Then
72 / 81
Outline
1 Basic data structures and operations
2 Hash tables
Hash tables: Concepts
Analysis of hashing under Chaining
3 Hashing Methods
The Division Method
The Multiplication Method
Clustering Analysis of Hashing Functions
A Possible Solution: Universal Hashing
4 Open Addressing
Linear Probing
Quadratic Probing
Double Hashing
5 Excercises
73 / 81
Quadratic probing: Definition and properties
Hash function
Given an auxiliary hash function h : 0, 1, ..., m − 1 → U for
i = 0, 1, ..., m − 1, we get the extended hash function
h(k, i) = (h (k) + c1i + c2i2
) mod m, (20)
where c1, c2 are auxiliary constants
Sequence of probes
Given key k, we first probe T[h (k)], later positions probed are offset
by amounts that depend in a quadratic manner on the probe number
i.
The initial probe determines the entire sequence, and so only m
distinct probe sequences are used.
74 / 81
Quadratic probing: Definition and properties
Hash function
Given an auxiliary hash function h : 0, 1, ..., m − 1 → U for
i = 0, 1, ..., m − 1, we get the extended hash function
h(k, i) = (h (k) + c1i + c2i2
) mod m, (20)
where c1, c2 are auxiliary constants
Sequence of probes
Given key k, we first probe T[h (k)], later positions probed are offset
by amounts that depend in a quadratic manner on the probe number
i.
The initial probe determines the entire sequence, and so only m
distinct probe sequences are used.
74 / 81
Quadratic probing: Definition and properties
Advantages
This method works much better than linear probing, but to make full use
of the hash table, the values of c1,c2, and m are constrained.
Disadvantages
If two keys have the same initial probe position, then their probe sequences
are the same, since h(k1, 0) = h(k2, 0) implies h(k1, i) = h(k2, i). This
property leads to a milder form of clustering, called secondary clustering.
75 / 81
Quadratic probing: Definition and properties
Advantages
This method works much better than linear probing, but to make full use
of the hash table, the values of c1,c2, and m are constrained.
Disadvantages
If two keys have the same initial probe position, then their probe sequences
are the same, since h(k1, 0) = h(k2, 0) implies h(k1, i) = h(k2, i). This
property leads to a milder form of clustering, called secondary clustering.
75 / 81
Outline
1 Basic data structures and operations
2 Hash tables
Hash tables: Concepts
Analysis of hashing under Chaining
3 Hashing Methods
The Division Method
The Multiplication Method
Clustering Analysis of Hashing Functions
A Possible Solution: Universal Hashing
4 Open Addressing
Linear Probing
Quadratic Probing
Double Hashing
5 Excercises
76 / 81
Double hashing: Definition and properties
Hash function
Double hashing uses a hash function of the form
h(k, i) = (h1(k) + ih2(k)) mod m, (21)
where i = 0, 1, ..., m − 1 and h1, h2 are auxiliary hash functions (Normally
for a Universal family)
Sequence of probes
Given key k, we first probe T[h1(k)], successive probe positions are
offset from previous positions by the amount h2(k) mod m.
Thus, unlike the case of linear or quadratic probing, the probe
sequence here depends in two ways upon the key k, since the initial
probe position, the offset, or both, may vary.
77 / 81
Double hashing: Definition and properties
Hash function
Double hashing uses a hash function of the form
h(k, i) = (h1(k) + ih2(k)) mod m, (21)
where i = 0, 1, ..., m − 1 and h1, h2 are auxiliary hash functions (Normally
for a Universal family)
Sequence of probes
Given key k, we first probe T[h1(k)], successive probe positions are
offset from previous positions by the amount h2(k) mod m.
Thus, unlike the case of linear or quadratic probing, the probe
sequence here depends in two ways upon the key k, since the initial
probe position, the offset, or both, may vary.
77 / 81
Double hashing: Definition and properties
Advantages
When m is prime or a power of 2, double hashing improves over linear
or quadratic probing in that Θ(m2) probe sequences are used, rather
than Θ(m) since each possible (h1(k), h2(k)) pair yields a distinct
probe sequence.
The performance of double hashing appears to be very close to the
performance of the “ideal” scheme of uniform hashing.
78 / 81
Why?
Look At This
79 / 81
Analysis of Open Addressing
Theorem 11.6
Given an open-address hash table with load factor α = n
m < 1, the
expected number of probes in an unsuccessful search is at most 1
1−α
assuming uniform hashing.
Corollary
Inserting an element into an open-address hash table with load factor˛
requires at most 1
1−α probes on average, assuming uniform hashing.
Theorem 11.8
Given an open-address hash table with load factor α < 1, the expected
number of probes in a successful search is at most 1
α ln 1
1−α assuming
uniform hashing and assuming that each key in the table is equally likely
to be searched for.
80 / 81
Analysis of Open Addressing
Theorem 11.6
Given an open-address hash table with load factor α = n
m < 1, the
expected number of probes in an unsuccessful search is at most 1
1−α
assuming uniform hashing.
Corollary
Inserting an element into an open-address hash table with load factor˛
requires at most 1
1−α probes on average, assuming uniform hashing.
Theorem 11.8
Given an open-address hash table with load factor α < 1, the expected
number of probes in a successful search is at most 1
α ln 1
1−α assuming
uniform hashing and assuming that each key in the table is equally likely
to be searched for.
80 / 81
Analysis of Open Addressing
Theorem 11.6
Given an open-address hash table with load factor α = n
m < 1, the
expected number of probes in an unsuccessful search is at most 1
1−α
assuming uniform hashing.
Corollary
Inserting an element into an open-address hash table with load factor˛
requires at most 1
1−α probes on average, assuming uniform hashing.
Theorem 11.8
Given an open-address hash table with load factor α < 1, the expected
number of probes in a successful search is at most 1
α ln 1
1−α assuming
uniform hashing and assuming that each key in the table is equally likely
to be searched for.
80 / 81
Excercises
From Cormen’s book, chapters 11
11.1-2
11.2-1
11.2-2
11.2-3
11.3-1
11.3-3
81 / 81

Más contenido relacionado

La actualidad más candente

Hashing Technique In Data Structures
Hashing Technique In Data StructuresHashing Technique In Data Structures
Hashing Technique In Data StructuresSHAKOOR AB
 
Data Structures- Part5 recursion
Data Structures- Part5 recursionData Structures- Part5 recursion
Data Structures- Part5 recursionAbdullah Al-hazmy
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structureSajid Marwat
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureBalwant Gorad
 
Turing Machine
Turing MachineTuring Machine
Turing MachineRajendran
 
Data Structure and Algorithms Hashing
Data Structure and Algorithms HashingData Structure and Algorithms Hashing
Data Structure and Algorithms HashingManishPrajapati78
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prologHarry Potter
 
BackTracking Algorithm: Technique and Examples
BackTracking Algorithm: Technique and ExamplesBackTracking Algorithm: Technique and Examples
BackTracking Algorithm: Technique and ExamplesFahim Ferdous
 
Data Structures : hashing (1)
Data Structures : hashing (1)Data Structures : hashing (1)
Data Structures : hashing (1)Home
 
3.9 external sorting
3.9 external sorting3.9 external sorting
3.9 external sortingKrish_ver2
 
Data structure - Graph
Data structure - GraphData structure - Graph
Data structure - GraphMadhu Bala
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)swapnac12
 
Symbol table in compiler Design
Symbol table in compiler DesignSymbol table in compiler Design
Symbol table in compiler DesignKuppusamy P
 
Non- Deterministic Algorithms
Non- Deterministic AlgorithmsNon- Deterministic Algorithms
Non- Deterministic AlgorithmsDipankar Boruah
 

La actualidad más candente (20)

Hashing Technique In Data Structures
Hashing Technique In Data StructuresHashing Technique In Data Structures
Hashing Technique In Data Structures
 
Data Structures- Part5 recursion
Data Structures- Part5 recursionData Structures- Part5 recursion
Data Structures- Part5 recursion
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data Structure
 
Turing Machine
Turing MachineTuring Machine
Turing Machine
 
Np cooks theorem
Np cooks theoremNp cooks theorem
Np cooks theorem
 
Data Structure and Algorithms Hashing
Data Structure and Algorithms HashingData Structure and Algorithms Hashing
Data Structure and Algorithms Hashing
 
Sorting Algorithms
Sorting AlgorithmsSorting Algorithms
Sorting Algorithms
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
 
BackTracking Algorithm: Technique and Examples
BackTracking Algorithm: Technique and ExamplesBackTracking Algorithm: Technique and Examples
BackTracking Algorithm: Technique and Examples
 
Hashing
HashingHashing
Hashing
 
Data Structures : hashing (1)
Data Structures : hashing (1)Data Structures : hashing (1)
Data Structures : hashing (1)
 
3.9 external sorting
3.9 external sorting3.9 external sorting
3.9 external sorting
 
Binary Tree Traversal
Binary Tree TraversalBinary Tree Traversal
Binary Tree Traversal
 
Data structure - Graph
Data structure - GraphData structure - Graph
Data structure - Graph
 
Collision in Hashing.pptx
Collision in Hashing.pptxCollision in Hashing.pptx
Collision in Hashing.pptx
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
Symbol table in compiler Design
Symbol table in compiler DesignSymbol table in compiler Design
Symbol table in compiler Design
 
Red black tree
Red black treeRed black tree
Red black tree
 
Non- Deterministic Algorithms
Non- Deterministic AlgorithmsNon- Deterministic Algorithms
Non- Deterministic Algorithms
 

Destacado

Coalesced hashing / Hash Coalescido
Coalesced hashing / Hash CoalescidoCoalesced hashing / Hash Coalescido
Coalesced hashing / Hash CoalescidoCriatividadeZeroDocs
 
Algoritma dan Matematika_tif305_reg-sns
Algoritma dan Matematika_tif305_reg-snsAlgoritma dan Matematika_tif305_reg-sns
Algoritma dan Matematika_tif305_reg-snsstaffpengajar
 
Steganography and its techniques
Steganography and its techniquesSteganography and its techniques
Steganography and its techniquesFatema Panvelwala
 
Steganography basic
Steganography basicSteganography basic
Steganography basicSanoj Kumar
 
GameMaker 1) intro to gamemaker
GameMaker 1) intro to gamemakerGameMaker 1) intro to gamemaker
GameMaker 1) intro to gamemakeriain bruce
 
Hashing Algorithm
Hashing AlgorithmHashing Algorithm
Hashing AlgorithmHayi Nukman
 
Fungsi Hash & Algoritma SHA-256 - Presentation
Fungsi Hash & Algoritma SHA-256 - PresentationFungsi Hash & Algoritma SHA-256 - Presentation
Fungsi Hash & Algoritma SHA-256 - PresentationAditya Gusti Tammam
 
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social WebGraphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social WebJoël Perras
 
SEO Website Analysis - example report
SEO Website Analysis - example reportSEO Website Analysis - example report
SEO Website Analysis - example reportLynn Holley III
 
Algorithm & Data Structure - Algoritma Pengurutan
Algorithm & Data Structure - Algoritma PengurutanAlgorithm & Data Structure - Algoritma Pengurutan
Algorithm & Data Structure - Algoritma PengurutanDudy Ali
 
Seo analysis report template (1)
Seo analysis report template (1)Seo analysis report template (1)
Seo analysis report template (1)Doiphode Vishal
 
Best for b trees
Best for b treesBest for b trees
Best for b treesDineshRaaja
 
BTree, Data Structures
BTree, Data StructuresBTree, Data Structures
BTree, Data StructuresJibrael Jos
 

Destacado (20)

Hashing
HashingHashing
Hashing
 
Coalesced hashing / Hash Coalescido
Coalesced hashing / Hash CoalescidoCoalesced hashing / Hash Coalescido
Coalesced hashing / Hash Coalescido
 
Algoritma dan Matematika_tif305_reg-sns
Algoritma dan Matematika_tif305_reg-snsAlgoritma dan Matematika_tif305_reg-sns
Algoritma dan Matematika_tif305_reg-sns
 
Steganography and its techniques
Steganography and its techniquesSteganography and its techniques
Steganography and its techniques
 
Steganography basic
Steganography basicSteganography basic
Steganography basic
 
GameMaker 1) intro to gamemaker
GameMaker 1) intro to gamemakerGameMaker 1) intro to gamemaker
GameMaker 1) intro to gamemaker
 
Hashing Algorithm
Hashing AlgorithmHashing Algorithm
Hashing Algorithm
 
Implementasi Teknik Kompresi Teks Huffman
Implementasi Teknik Kompresi Teks HuffmanImplementasi Teknik Kompresi Teks Huffman
Implementasi Teknik Kompresi Teks Huffman
 
Fungsi Hash & Algoritma SHA-256 - Presentation
Fungsi Hash & Algoritma SHA-256 - PresentationFungsi Hash & Algoritma SHA-256 - Presentation
Fungsi Hash & Algoritma SHA-256 - Presentation
 
Hashing
HashingHashing
Hashing
 
B trees and_b__trees
B trees and_b__treesB trees and_b__trees
B trees and_b__trees
 
Avl tree
Avl treeAvl tree
Avl tree
 
Graphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social WebGraphs, Edges & Nodes - Untangling the Social Web
Graphs, Edges & Nodes - Untangling the Social Web
 
SEO Website Analysis - example report
SEO Website Analysis - example reportSEO Website Analysis - example report
SEO Website Analysis - example report
 
Algorithm & Data Structure - Algoritma Pengurutan
Algorithm & Data Structure - Algoritma PengurutanAlgorithm & Data Structure - Algoritma Pengurutan
Algorithm & Data Structure - Algoritma Pengurutan
 
Seo analysis report template (1)
Seo analysis report template (1)Seo analysis report template (1)
Seo analysis report template (1)
 
Multi ways trees
Multi ways treesMulti ways trees
Multi ways trees
 
Best for b trees
Best for b treesBest for b trees
Best for b trees
 
Hash tables
Hash tablesHash tables
Hash tables
 
BTree, Data Structures
BTree, Data StructuresBTree, Data Structures
BTree, Data Structures
 

Similar a 08 Hash Tables

Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec IISajid Marwat
 
Sienna 9 hashing
Sienna 9 hashingSienna 9 hashing
Sienna 9 hashingchidabdu
 
Algorithms notes tutorials duniya
Algorithms notes   tutorials duniyaAlgorithms notes   tutorials duniya
Algorithms notes tutorials duniyaTutorialsDuniya.com
 
Algorithm chapter 7
Algorithm chapter 7Algorithm chapter 7
Algorithm chapter 7chidabdu
 
computer notes - Data Structures - 36
computer notes - Data Structures - 36computer notes - Data Structures - 36
computer notes - Data Structures - 36ecomputernotes
 
Hashing.pptx
Hashing.pptxHashing.pptx
Hashing.pptxkratika64
 
Skiena algorithm 2007 lecture06 sorting
Skiena algorithm 2007 lecture06 sortingSkiena algorithm 2007 lecture06 sorting
Skiena algorithm 2007 lecture06 sortingzukun
 
Hash in datastructures by using the c language.pptx
Hash in datastructures by using the c language.pptxHash in datastructures by using the c language.pptx
Hash in datastructures by using the c language.pptxmy6305874
 
Hashing Part Two: Static Perfect Hashing
Hashing Part Two: Static Perfect HashingHashing Part Two: Static Perfect Hashing
Hashing Part Two: Static Perfect HashingBenjamin Sach
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptxAgonySingh
 
Hashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfHashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfJaithoonBibi
 

Similar a 08 Hash Tables (20)

Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec II
 
Hashing
HashingHashing
Hashing
 
Lec5
Lec5Lec5
Lec5
 
Sienna 9 hashing
Sienna 9 hashingSienna 9 hashing
Sienna 9 hashing
 
Algorithms notes tutorials duniya
Algorithms notes   tutorials duniyaAlgorithms notes   tutorials duniya
Algorithms notes tutorials duniya
 
Algorithm chapter 7
Algorithm chapter 7Algorithm chapter 7
Algorithm chapter 7
 
computer notes - Data Structures - 36
computer notes - Data Structures - 36computer notes - Data Structures - 36
computer notes - Data Structures - 36
 
Hashing.pptx
Hashing.pptxHashing.pptx
Hashing.pptx
 
13-hashing.ppt
13-hashing.ppt13-hashing.ppt
13-hashing.ppt
 
Lec5
Lec5Lec5
Lec5
 
Randamization.pdf
Randamization.pdfRandamization.pdf
Randamization.pdf
 
Skiena algorithm 2007 lecture06 sorting
Skiena algorithm 2007 lecture06 sortingSkiena algorithm 2007 lecture06 sorting
Skiena algorithm 2007 lecture06 sorting
 
Hashing Part One
Hashing Part OneHashing Part One
Hashing Part One
 
Hashing .pptx
Hashing .pptxHashing .pptx
Hashing .pptx
 
Hashing
HashingHashing
Hashing
 
Presentation Of Analysis
Presentation Of Analysis  Presentation Of Analysis
Presentation Of Analysis
 
Hash in datastructures by using the c language.pptx
Hash in datastructures by using the c language.pptxHash in datastructures by using the c language.pptx
Hash in datastructures by using the c language.pptx
 
Hashing Part Two: Static Perfect Hashing
Hashing Part Two: Static Perfect HashingHashing Part Two: Static Perfect Hashing
Hashing Part Two: Static Perfect Hashing
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Hashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfHashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdf
 

Más de Andres Mendez-Vazquez

01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectorsAndres Mendez-Vazquez
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issuesAndres Mendez-Vazquez
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiationAndres Mendez-Vazquez
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep LearningAndres Mendez-Vazquez
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learningAndres Mendez-Vazquez
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusAndres Mendez-Vazquez
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusAndres Mendez-Vazquez
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesAndres Mendez-Vazquez
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variationsAndres Mendez-Vazquez
 

Más de Andres Mendez-Vazquez (20)

2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
05 linear transformations
05 linear transformations05 linear transformations
05 linear transformations
 
01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues
 
01.02 linear equations
01.02 linear equations01.02 linear equations
01.02 linear equations
 
01.01 vector spaces
01.01 vector spaces01.01 vector spaces
01.01 vector spaces
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation
 
Zetta global
Zetta globalZetta global
Zetta global
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning Syllabus
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
 
Ideas 09 22_2018
Ideas 09 22_2018Ideas 09 22_2018
Ideas 09 22_2018
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data Sciences
 
Analysis of Algorithms Syllabus
Analysis of Algorithms  SyllabusAnalysis of Algorithms  Syllabus
Analysis of Algorithms Syllabus
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
 
18.1 combining models
18.1 combining models18.1 combining models
18.1 combining models
 
17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension
 
A basic introduction to learning
A basic introduction to learningA basic introduction to learning
A basic introduction to learning
 

Último

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 

Último (20)

UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 

08 Hash Tables

  • 1. Analysis of Algorithms Hash Tables Andres Mendez-Vazquez October 3, 2014 1 / 81
  • 2. Outline 1 Basic data structures and operations 2 Hash tables Hash tables: Concepts Analysis of hashing under Chaining 3 Hashing Methods The Division Method The Multiplication Method Clustering Analysis of Hashing Functions A Possible Solution: Universal Hashing 4 Open Addressing Linear Probing Quadratic Probing Double Hashing 5 Excercises 2 / 81
  • 3. First: About Basic Data Structures Remark It is quite interesting to notice that many data structures actually share similar operations!!! Yes If you think them as ADT 3 / 81
  • 4. First: About Basic Data Structures Remark It is quite interesting to notice that many data structures actually share similar operations!!! Yes If you think them as ADT 3 / 81
  • 5. Examples Search(S,k) Example: Search in a BST 8 3 1 6 4 7 10 14 13 k=7 4 / 81
  • 6. Examples Insert(S,x) Example: Insert in a linked list CA EDB firstNode NULL K 5 / 81
  • 7. And Again Delete(S,x) Example: Delete in a BST 8 3 1 4 7 10 14 13 8 3 1 6 4 7 10 14 13 Delete = 6 6 / 81
  • 8. Basic data structures and operations. Therefore This are basic structures, it is up to you to read about them. Chapter 10 Cormen’s book 7 / 81
  • 9. Outline 1 Basic data structures and operations 2 Hash tables Hash tables: Concepts Analysis of hashing under Chaining 3 Hashing Methods The Division Method The Multiplication Method Clustering Analysis of Hashing Functions A Possible Solution: Universal Hashing 4 Open Addressing Linear Probing Quadratic Probing Double Hashing 5 Excercises 8 / 81
  • 10. Hash tables: Concepts Definition A hash table or hash map T is a data structure, most commonly an array, that uses a hash function to efficiently map certain identifiers of keys (e.g. person names) to associated values. Advantages They have the advantage of having a expected complexity of operations of O(1 + α) Still, be aware of α However, If you have a large number of keys, U Then, it is impractical to store a table of the size of |U|. Thus, you can use a hash function h : U→{0, 1, ..., m − 1} 9 / 81
  • 11. Hash tables: Concepts Definition A hash table or hash map T is a data structure, most commonly an array, that uses a hash function to efficiently map certain identifiers of keys (e.g. person names) to associated values. Advantages They have the advantage of having a expected complexity of operations of O(1 + α) Still, be aware of α However, If you have a large number of keys, U Then, it is impractical to store a table of the size of |U|. Thus, you can use a hash function h : U→{0, 1, ..., m − 1} 9 / 81
  • 12. Hash tables: Concepts Definition A hash table or hash map T is a data structure, most commonly an array, that uses a hash function to efficiently map certain identifiers of keys (e.g. person names) to associated values. Advantages They have the advantage of having a expected complexity of operations of O(1 + α) Still, be aware of α However, If you have a large number of keys, U Then, it is impractical to store a table of the size of |U|. Thus, you can use a hash function h : U→{0, 1, ..., m − 1} 9 / 81
  • 13. When you have a small universe of keys, U Remarks It is not necessary to map the key values. Key values are direct addresses in the array. Direct implementation or Direct-address tables. Operations 1 Direct-Address-Search(T, k) return T[k] 2 Direct-Address-Search(T, x) T [x.key] = x 3 Direct-Address-Delete(T, x) T [x.key] = NIL 10 / 81
  • 14. When you have a small universe of keys, U Remarks It is not necessary to map the key values. Key values are direct addresses in the array. Direct implementation or Direct-address tables. Operations 1 Direct-Address-Search(T, k) return T[k] 2 Direct-Address-Search(T, x) T [x.key] = x 3 Direct-Address-Delete(T, x) T [x.key] = NIL 10 / 81
  • 15. When you have a large universe of keys, U Then Then, it is impractical to store a table of the size of |U|. You can use a especial function for mapping h : U→{0, 1, ..., m − 1} (1) Problem With a large enough universe U, two keys can hash to the same value This is called a collision. 11 / 81
  • 16. When you have a large universe of keys, U Then Then, it is impractical to store a table of the size of |U|. You can use a especial function for mapping h : U→{0, 1, ..., m − 1} (1) Problem With a large enough universe U, two keys can hash to the same value This is called a collision. 11 / 81
  • 17. When you have a large universe of keys, U Then Then, it is impractical to store a table of the size of |U|. You can use a especial function for mapping h : U→{0, 1, ..., m − 1} (1) Problem With a large enough universe U, two keys can hash to the same value This is called a collision. 11 / 81
  • 18. Collisions This is a problem We might try to avoid this by using a suitable hash function h. Idea Make appear to be “random” enough to avoid collisions altogether (Highly Improbable) or to minimize the probability of them. You still have the problem of collisions Possible Solutions to the problem: 1 Chaining 2 Open Addressing 12 / 81
  • 19. Collisions This is a problem We might try to avoid this by using a suitable hash function h. Idea Make appear to be “random” enough to avoid collisions altogether (Highly Improbable) or to minimize the probability of them. You still have the problem of collisions Possible Solutions to the problem: 1 Chaining 2 Open Addressing 12 / 81
  • 20. Collisions This is a problem We might try to avoid this by using a suitable hash function h. Idea Make appear to be “random” enough to avoid collisions altogether (Highly Improbable) or to minimize the probability of them. You still have the problem of collisions Possible Solutions to the problem: 1 Chaining 2 Open Addressing 12 / 81
  • 21. Hash tables: Chaining A Possible Solution Insert the elements that hash to the same slot into a linked list. U (Universe of Keys) 13 / 81
  • 22. Outline 1 Basic data structures and operations 2 Hash tables Hash tables: Concepts Analysis of hashing under Chaining 3 Hashing Methods The Division Method The Multiplication Method Clustering Analysis of Hashing Functions A Possible Solution: Universal Hashing 4 Open Addressing Linear Probing Quadratic Probing Double Hashing 5 Excercises 14 / 81
  • 23. Analysis of hashing with Chaining: Assumptions Assumptions We have a load factor α = n m , where m is the size of the hash table T, and n is the number of elements to store. Simple uniform hashing property: This means that any of the m slots can be selected. This means that if n = n0 + n1 + ... + nm−1, we have that E(nj ) = α. To simplify the analysis, you need to consider two cases Unsuccessful search Successful search 15 / 81
  • 24. Analysis of hashing with Chaining: Assumptions Assumptions We have a load factor α = n m , where m is the size of the hash table T, and n is the number of elements to store. Simple uniform hashing property: This means that any of the m slots can be selected. This means that if n = n0 + n1 + ... + nm−1, we have that E(nj ) = α. To simplify the analysis, you need to consider two cases Unsuccessful search Successful search 15 / 81
  • 25. Why? After all You are always looking for keys when Searching Inserting Deleting It is clear that we have two possibilities Finding the key or not finding the key 16 / 81
  • 26. Why? After all You are always looking for keys when Searching Inserting Deleting It is clear that we have two possibilities Finding the key or not finding the key 16 / 81
  • 27. For this, we have the following theorems Theorem 11.1 In a hash table in which collisions are resolved by chaining, an unsuccessful search takes average-case time Θ (1 + α), under the assumption of simple uniform hashing. Theorem 11.2 In a hash table in which collisions are resolved by chaining, a successful search takes average-case time Θ (1 + α) under the assumption of simple uniform hashing. 17 / 81
  • 28. For this, we have the following theorems Theorem 11.1 In a hash table in which collisions are resolved by chaining, an unsuccessful search takes average-case time Θ (1 + α), under the assumption of simple uniform hashing. Theorem 11.2 In a hash table in which collisions are resolved by chaining, a successful search takes average-case time Θ (1 + α) under the assumption of simple uniform hashing. 17 / 81
  • 29. Analysis of hashing: Constant time. Finally These two theorems tell us that if n = O(m) α = n m = O(m) m = O(1) Or search time is constant. 18 / 81
  • 30. Analysis of hashing: Which hash function? Consider that: Good hash functions should maintain the property of simple uniform hashing! The keys have the same probability 1/m to be hashed to any bucket!!! A uniform hash function minimizes the likelihood of an overflow when keys are selected at random. Then: What should we use? If we know how the keys are distributed uniformly at the following interval 0 ≤ k < 1 then h(k) = km . 19 / 81
  • 31. Analysis of hashing: Which hash function? Consider that: Good hash functions should maintain the property of simple uniform hashing! The keys have the same probability 1/m to be hashed to any bucket!!! A uniform hash function minimizes the likelihood of an overflow when keys are selected at random. Then: What should we use? If we know how the keys are distributed uniformly at the following interval 0 ≤ k < 1 then h(k) = km . 19 / 81
  • 32. What if... Question: What about something with keys in a normal distribution? 20 / 81
  • 33. Possible hash functions when the keys are natural numbers The division method h(k) = k mod m. Good choices for m are primes not too close to a power of 2. The multiplication method h(k) = m(kA mod 1) with 0 < A < 1. The value of m is not critical. Easy to implement in a computer. 21 / 81
  • 34. Possible hash functions when the keys are natural numbers The division method h(k) = k mod m. Good choices for m are primes not too close to a power of 2. The multiplication method h(k) = m(kA mod 1) with 0 < A < 1. The value of m is not critical. Easy to implement in a computer. 21 / 81
  • 35. When they are not, we need to interpreting the keys as natural numbers Keys interpreted as natural numbers Given a string “pt”, we can say p = 112 and t=116 (ASCII numbers) ASCII has 128 possible symbols. Then (128 × 112) + 1280 × 116 = 14452 Nevertheless This is highly dependent on the origins of the keys!!! 22 / 81
  • 36. Outline 1 Basic data structures and operations 2 Hash tables Hash tables: Concepts Analysis of hashing under Chaining 3 Hashing Methods The Division Method The Multiplication Method Clustering Analysis of Hashing Functions A Possible Solution: Universal Hashing 4 Open Addressing Linear Probing Quadratic Probing Double Hashing 5 Excercises 23 / 81
  • 37. Hashing methods: The division method Hash function h(k) = k mod m Problems with some selections m = 2p, h(k) is only the p lowest-order bits. m = 2p − 1, when k is interpreted as a character string interpreted in radix 2p, permuting characters in k does not change the value. It is better to select Prime numbers not too close to an exact power of two. For example, given n = 2000 elements. We can use m = 701 because it is near to 2000/3 but not near a power of two. 24 / 81
  • 38. Hashing methods: The division method Hash function h(k) = k mod m Problems with some selections m = 2p, h(k) is only the p lowest-order bits. m = 2p − 1, when k is interpreted as a character string interpreted in radix 2p, permuting characters in k does not change the value. It is better to select Prime numbers not too close to an exact power of two. For example, given n = 2000 elements. We can use m = 701 because it is near to 2000/3 but not near a power of two. 24 / 81
  • 39. Hashing methods: The division method Hash function h(k) = k mod m Problems with some selections m = 2p, h(k) is only the p lowest-order bits. m = 2p − 1, when k is interpreted as a character string interpreted in radix 2p, permuting characters in k does not change the value. It is better to select Prime numbers not too close to an exact power of two. For example, given n = 2000 elements. We can use m = 701 because it is near to 2000/3 but not near a power of two. 24 / 81
  • 40. Outline 1 Basic data structures and operations 2 Hash tables Hash tables: Concepts Analysis of hashing under Chaining 3 Hashing Methods The Division Method The Multiplication Method Clustering Analysis of Hashing Functions A Possible Solution: Universal Hashing 4 Open Addressing Linear Probing Quadratic Probing Double Hashing 5 Excercises 25 / 81
  • 41. Hashing methods: The multiplication method The multiplication method for creating hash functions has two steps 1 Multiply the key k by a constant A in the range 0 < A < 1 and extract the fractional part of kA. 2 Then, you multiply the value by m an take the floor, h(k) = m (kA mod 1) . The mod allows to extract that fractional part!!! kA mod 1 = kA − kA , 0 < A < 1. Advantages: m is not critical, normally m = 2p. 26 / 81
  • 42. Hashing methods: The multiplication method The multiplication method for creating hash functions has two steps 1 Multiply the key k by a constant A in the range 0 < A < 1 and extract the fractional part of kA. 2 Then, you multiply the value by m an take the floor, h(k) = m (kA mod 1) . The mod allows to extract that fractional part!!! kA mod 1 = kA − kA , 0 < A < 1. Advantages: m is not critical, normally m = 2p. 26 / 81
  • 43. Hashing methods: The multiplication method The multiplication method for creating hash functions has two steps 1 Multiply the key k by a constant A in the range 0 < A < 1 and extract the fractional part of kA. 2 Then, you multiply the value by m an take the floor, h(k) = m (kA mod 1) . The mod allows to extract that fractional part!!! kA mod 1 = kA − kA , 0 < A < 1. Advantages: m is not critical, normally m = 2p. 26 / 81
  • 44. Implementing in a computer First First, imagine that the word in a machine has w bits size and k fits on those bits. Second Then, select an s in the range 0 < s < 2w and assume A = s 2w . Third Now, we multiply k by the number s = A2w . 27 / 81
  • 45. Implementing in a computer First First, imagine that the word in a machine has w bits size and k fits on those bits. Second Then, select an s in the range 0 < s < 2w and assume A = s 2w . Third Now, we multiply k by the number s = A2w . 27 / 81
  • 46. Implementing in a computer First First, imagine that the word in a machine has w bits size and k fits on those bits. Second Then, select an s in the range 0 < s < 2w and assume A = s 2w . Third Now, we multiply k by the number s = A2w . 27 / 81
  • 47. Example Fourth The result of that is r12w + r0, a 2w-bit value word, where the first p-most significative bits of r0 are the desired hash value. Graphically 28 / 81
  • 48. Example Fourth The result of that is r12w + r0, a 2w-bit value word, where the first p-most significative bits of r0 are the desired hash value. Graphically extract p bits 28 / 81
  • 49. Outline 1 Basic data structures and operations 2 Hash tables Hash tables: Concepts Analysis of hashing under Chaining 3 Hashing Methods The Division Method The Multiplication Method Clustering Analysis of Hashing Functions A Possible Solution: Universal Hashing 4 Open Addressing Linear Probing Quadratic Probing Double Hashing 5 Excercises 29 / 81
  • 50. However Sooner or Latter We can pick up a hash function that does not give us the desired uniform randomized property Thus We are required to analyze the possible clustering of the data by the hash function 30 / 81
  • 51. However Sooner or Latter We can pick up a hash function that does not give us the desired uniform randomized property Thus We are required to analyze the possible clustering of the data by the hash function 30 / 81
  • 52. Measuring Clustering through a metric C Definition If bucket i contains ni elements, then C = m n − 1 m i=1 n2 i n − 1 (2) Properties 1 If C = 1, then you have uniform hashing. 2 If C > 1, it means that the performance of the hash table is slowed down by clustering by approximately a factor of C. 3 If C < 1, the spread of the elements is more even than uniform!!! Not going to happen!!! 31 / 81
  • 53. Measuring Clustering through a metric C Definition If bucket i contains ni elements, then C = m n − 1 m i=1 n2 i n − 1 (2) Properties 1 If C = 1, then you have uniform hashing. 2 If C > 1, it means that the performance of the hash table is slowed down by clustering by approximately a factor of C. 3 If C < 1, the spread of the elements is more even than uniform!!! Not going to happen!!! 31 / 81
  • 54. However Unfortunately Hash table do not give a way to measure clustering Thus, table designers They should provide some clustering estimation as part of the interface. Thus The reason the clustering measure works is because it is based on an estimate of the variance of the distribution of bucket sizes. 32 / 81
  • 55. However Unfortunately Hash table do not give a way to measure clustering Thus, table designers They should provide some clustering estimation as part of the interface. Thus The reason the clustering measure works is because it is based on an estimate of the variance of the distribution of bucket sizes. 32 / 81
  • 56. However Unfortunately Hash table do not give a way to measure clustering Thus, table designers They should provide some clustering estimation as part of the interface. Thus The reason the clustering measure works is because it is based on an estimate of the variance of the distribution of bucket sizes. 32 / 81
  • 57. Thus First If clustering is occurring, some buckets will have more elements than they should, and some will have fewer. Second There will be a wider range of bucket sizes than one would expect from a random hash function. 33 / 81
  • 58. Thus First If clustering is occurring, some buckets will have more elements than they should, and some will have fewer. Second There will be a wider range of bucket sizes than one would expect from a random hash function. 33 / 81
  • 59. Analysis of C: First, keys are uniformly distributed Consider the following random variable Consider bucket i containing ni elements, with Xij= I{element j lands in bucket i} Then, given ni = n j=1 Xij (3) We have that E [Xij] = 1 m , E X2 ij = 1 m (4) 34 / 81
  • 60. Analysis of C: First, keys are uniformly distributed Consider the following random variable Consider bucket i containing ni elements, with Xij= I{element j lands in bucket i} Then, given ni = n j=1 Xij (3) We have that E [Xij] = 1 m , E X2 ij = 1 m (4) 34 / 81
  • 61. Analysis of C: First, keys are uniformly distributed Consider the following random variable Consider bucket i containing ni elements, with Xij= I{element j lands in bucket i} Then, given ni = n j=1 Xij (3) We have that E [Xij] = 1 m , E X2 ij = 1 m (4) 34 / 81
  • 62. Next We look at the dispersion of Xij Var [Xij] = E X2 ij − (E [Xij])2 = 1 m − 1 m2 (5) What about the expected number of elements at each bucket E [ni ] = E   n j=1 Xij   = n m = α (6) 35 / 81
  • 63. Next We look at the dispersion of Xij Var [Xij] = E X2 ij − (E [Xij])2 = 1 m − 1 m2 (5) What about the expected number of elements at each bucket E [ni ] = E   n j=1 Xij   = n m = α (6) 35 / 81
  • 64. Then, we have Because independence of {Xij}, the scattering of ni Var [ni ] = Var   n j=1 Xij   = n j=1 Var [Xij] = nVar [Xij] 36 / 81
  • 65. Then What about the range of the possible number of elements at each bucket? Var [ni ] = n m − n m2 = α − α m But, we have that E n2 i = E   n j=1 X2 ij + n j=1 n k=1,k=j XijXik   (7) Or E n2 i = n m + n j=1 n k=1,k=j 1 m2 (8) 37 / 81
  • 66. Then What about the range of the possible number of elements at each bucket? Var [ni ] = n m − n m2 = α − α m But, we have that E n2 i = E   n j=1 X2 ij + n j=1 n k=1,k=j XijXik   (7) Or E n2 i = n m + n j=1 n k=1,k=j 1 m2 (8) 37 / 81
  • 67. Then What about the range of the possible number of elements at each bucket? Var [ni ] = n m − n m2 = α − α m But, we have that E n2 i = E   n j=1 X2 ij + n j=1 n k=1,k=j XijXik   (7) Or E n2 i = n m + n j=1 n k=1,k=j 1 m2 (8) 37 / 81
  • 68. Thus We re-express the range on term of expected values of ni E n2 i = n m + n (n − 1) m2 (9) Then E n2 i − E [ni ]2 = n m + n (n − 1) m2 − n2 m2 = n m − n m2 = α − α m 38 / 81
  • 69. Thus We re-express the range on term of expected values of ni E n2 i = n m + n (n − 1) m2 (9) Then E n2 i − E [ni ]2 = n m + n (n − 1) m2 − n2 m2 = n m − n m2 = α − α m 38 / 81
  • 70. Then Finally, we have that E n2 i = α 1 − 1 m + α2 (10) 39 / 81
  • 71. Then, we have that Now we build an estimator of the mean of n2 i which is part of C 1 n m i=1 n2 i (11) Thus E 1 n m i=1 n2 i = 1 n m i=1 E n2 i = m n α 1 − 1 m + α2 = 1 α α 1 − 1 m + α2 = 1 − 1 m + α 40 / 81
  • 72. Then, we have that Now we build an estimator of the mean of n2 i which is part of C 1 n m i=1 n2 i (11) Thus E 1 n m i=1 n2 i = 1 n m i=1 E n2 i = m n α 1 − 1 m + α2 = 1 α α 1 − 1 m + α2 = 1 − 1 m + α 40 / 81
  • 73. Finally We can plug back on C using the expected value E [C] = m n − 1 E m i=1 n2 i n − 1 = m n − 1 1 − 1 m + α − 1 = m n − 1 n m − 1 m = m n − 1 n − 1 m = 1 41 / 81
  • 74. Explanation Using a hash table that enforce a uniform distribution in the buckets We get that C = 1 or the best distribution of keys 42 / 81
  • 75. Now, we have a really horrible hash function ≡ It hits only one of every b buckets Thus E [Xij] = E X2 ij = b m (12) Thus, we have E [ni ] = αb (13) Then, we have E 1 n m i=1 n2 i = 1 n m i=1 E n2 i = αb − b m + 1 43 / 81
  • 76. Now, we have a really horrible hash function ≡ It hits only one of every b buckets Thus E [Xij] = E X2 ij = b m (12) Thus, we have E [ni ] = αb (13) Then, we have E 1 n m i=1 n2 i = 1 n m i=1 E n2 i = αb − b m + 1 43 / 81
  • 77. Now, we have a really horrible hash function ≡ It hits only one of every b buckets Thus E [Xij] = E X2 ij = b m (12) Thus, we have E [ni ] = αb (13) Then, we have E 1 n m i=1 n2 i = 1 n m i=1 E n2 i = αb − b m + 1 43 / 81
  • 78. Finally We can plug back on C using the expected value E [C] = m n − 1 E m i=1 n2 i n − 1 = m n − 1 αb − b m + 1 − 1 = m n − 1 nb m − b m = m n − 1 b (n − 1) m = b 44 / 81
  • 79. Explanation Using a hash table that enforce a uniform distribution in the buckets We get that C = b > 1 or a really bad distribution of the keys!!! Thus, you only need the following to evaluate a hash function 1 n m i=1 n2 i (14) 45 / 81
  • 80. Explanation Using a hash table that enforce a uniform distribution in the buckets We get that C = b > 1 or a really bad distribution of the keys!!! Thus, you only need the following to evaluate a hash function 1 n m i=1 n2 i (14) 45 / 81
  • 81. Outline 1 Basic data structures and operations 2 Hash tables Hash tables: Concepts Analysis of hashing under Chaining 3 Hashing Methods The Division Method The Multiplication Method Clustering Analysis of Hashing Functions A Possible Solution: Universal Hashing 4 Open Addressing Linear Probing Quadratic Probing Double Hashing 5 Excercises 46 / 81
  • 82. A Possible Solution: Universal Hashing Issues In practice, keys are not randomly distributed. Any fixed hash function might yield retrieval Θ(n) time. Goal To find hash functions that produce uniform random table indexes irrespective of the keys. Idea To select a hash function at random from a designed class of functions at the beginning of the execution. 47 / 81
  • 83. A Possible Solution: Universal Hashing Issues In practice, keys are not randomly distributed. Any fixed hash function might yield retrieval Θ(n) time. Goal To find hash functions that produce uniform random table indexes irrespective of the keys. Idea To select a hash function at random from a designed class of functions at the beginning of the execution. 47 / 81
  • 84. A Possible Solution: Universal Hashing Issues In practice, keys are not randomly distributed. Any fixed hash function might yield retrieval Θ(n) time. Goal To find hash functions that produce uniform random table indexes irrespective of the keys. Idea To select a hash function at random from a designed class of functions at the beginning of the execution. 47 / 81
  • 85. Hashing methods: Universal hashing Example Set of hash functions Choose a hash function randomly (At the beginning of the execution) HASH TABLE 48 / 81
  • 86. Definition of universal hash functions Definition Let H = {h : U → {0, 1, ..., m − 1}} be a family of hash functions. H is called a universal family if ∀x, y ∈ U, x = y : Pr h∈H (h(x) = h(y)) ≤ 1 m (15) Main result With universal hashing the chance of collision between distinct keys k and l is no more than the 1 m chance of collision if locations h(k) and h(l) were randomly and independently chosen from the set {0, 1, ..., m − 1}. 49 / 81
  • 87. Definition of universal hash functions Definition Let H = {h : U → {0, 1, ..., m − 1}} be a family of hash functions. H is called a universal family if ∀x, y ∈ U, x = y : Pr h∈H (h(x) = h(y)) ≤ 1 m (15) Main result With universal hashing the chance of collision between distinct keys k and l is no more than the 1 m chance of collision if locations h(k) and h(l) were randomly and independently chosen from the set {0, 1, ..., m − 1}. 49 / 81
  • 88. Hashing methods: Universal hashing Theorem 11.3 Suppose that a hash function h is chosen randomly from a universal collection of hash functions and has been used to hash n keys into a table T of size m, using chaining to resolve collisions. If key k is not in the table, then the expected length E[nh(k)] of the list that key k hashes to is at most the load factor α = n m . If key k is in the table, then the expected length E[nh(k)] of the list containing key k is at most 1 + α. Corollary 11.4 Using universal hashing and collision resolution by chaining in an initially empty table with m slots, it takes expected time Θ(n) to handle any sequence of n INSERT, SEARCH, and DELETE operations O(m) INSERT operations. 50 / 81
  • 89. Hashing methods: Universal hashing Theorem 11.3 Suppose that a hash function h is chosen randomly from a universal collection of hash functions and has been used to hash n keys into a table T of size m, using chaining to resolve collisions. If key k is not in the table, then the expected length E[nh(k)] of the list that key k hashes to is at most the load factor α = n m . If key k is in the table, then the expected length E[nh(k)] of the list containing key k is at most 1 + α. Corollary 11.4 Using universal hashing and collision resolution by chaining in an initially empty table with m slots, it takes expected time Θ(n) to handle any sequence of n INSERT, SEARCH, and DELETE operations O(m) INSERT operations. 50 / 81
  • 90. Example of Universal Hash Proceed as follows: Choose a primer number p large enough so that every possible key k is in the range [0, ..., p − 1] Zp = {0, 1, ..., p − 1}and Z∗ p = {1, ..., p − 1} Define the following hash function: ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗ p and b ∈ Zp The family of all such hash functions is: Hp,m = {ha,b : a ∈ Z∗ p and b ∈ Zp} Important a and b are chosen randomly at the beginning of execution. The class Hp,m of hash functions is universal. 51 / 81
  • 91. Example of Universal Hash Proceed as follows: Choose a primer number p large enough so that every possible key k is in the range [0, ..., p − 1] Zp = {0, 1, ..., p − 1}and Z∗ p = {1, ..., p − 1} Define the following hash function: ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗ p and b ∈ Zp The family of all such hash functions is: Hp,m = {ha,b : a ∈ Z∗ p and b ∈ Zp} Important a and b are chosen randomly at the beginning of execution. The class Hp,m of hash functions is universal. 51 / 81
  • 92. Example: Universal hash functions Example p = 977, m = 50, a and b random numbers ha,b(k) = ((ak + b) mod p) mod m 52 / 81
  • 93. Example of key distribution Example, mean = 488.5 and dispersion = 5 53 / 81
  • 94. Example with 10 keys Universal Hashing Vs Division Method 54 / 81
  • 95. Example with 50 keys Universal Hashing Vs Division Method 55 / 81
  • 96. Example with 100 keys Universal Hashing Vs Division Method 56 / 81
  • 97. Example with 200 keys Universal Hashing Vs Division Method 57 / 81
  • 98. Another Example: Matrix Method Then Let us say keys are u-bits long. Say the table size M is power of 2. an index is b-bits long with M = 2b. The h function Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx where after the inner product we apply mod 2 Example h b    1 0 0 0 0 1 1 1 1 1 1 0    u x     1 0 1 0      = h (x)   1 1 0    58 / 81
  • 99. Another Example: Matrix Method Then Let us say keys are u-bits long. Say the table size M is power of 2. an index is b-bits long with M = 2b. The h function Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx where after the inner product we apply mod 2 Example h b    1 0 0 0 0 1 1 1 1 1 1 0    u x     1 0 1 0      = h (x)   1 1 0    58 / 81
  • 100. Another Example: Matrix Method Then Let us say keys are u-bits long. Say the table size M is power of 2. an index is b-bits long with M = 2b. The h function Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx where after the inner product we apply mod 2 Example h b    1 0 0 0 0 1 1 1 1 1 1 0    u x     1 0 1 0      = h (x)   1 1 0    58 / 81
  • 101. Proof of being a Universal Family First fix assume that you have two different keys l = m Without loosing generality assume the following 1 li = mi ⇒ li = 0 and mi = 1 2 lj = mj ∀j = i Thus The column i does not contribute to the final answer of h (l) because of the zero!!! Now Imagine that we fix all the other columns in h, thus there is only one answer for h (l) 59 / 81
  • 102. Proof of being a Universal Family First fix assume that you have two different keys l = m Without loosing generality assume the following 1 li = mi ⇒ li = 0 and mi = 1 2 lj = mj ∀j = i Thus The column i does not contribute to the final answer of h (l) because of the zero!!! Now Imagine that we fix all the other columns in h, thus there is only one answer for h (l) 59 / 81
  • 103. Proof of being a Universal Family First fix assume that you have two different keys l = m Without loosing generality assume the following 1 li = mi ⇒ li = 0 and mi = 1 2 lj = mj ∀j = i Thus The column i does not contribute to the final answer of h (l) because of the zero!!! Now Imagine that we fix all the other columns in h, thus there is only one answer for h (l) 59 / 81
  • 104. Now For ith column There are 2b possible columns when changing the ones and zeros Thus, given the randomness of the zeros and ones The probability that we get the zero column          0 0 ... ... 0          (16) is equal 1 2b (17) with h (l) = h (m) 60 / 81
  • 105. Now For ith column There are 2b possible columns when changing the ones and zeros Thus, given the randomness of the zeros and ones The probability that we get the zero column          0 0 ... ... 0          (16) is equal 1 2b (17) with h (l) = h (m) 60 / 81
  • 106. Then We get the probability P (h (l) = h (m)) ≤ 1 2b (18) 61 / 81
  • 107. Implementation of the column*vector mod 2 Code i n t product ( i n t row , i n t v e c t o r ){ i n t i = row & v e c t o r ; i = i − (( i >> 1) & 0x55555555 ) ; i = ( i & 0x33333333 ) + (( i >> 2) & 0x33333333 ) ; i = ( ( ( i + ( i >> 4)) & 0x0F0F0F0F ) ∗ 0x01010101 ) >> 24; r e t u r n i & i & 0x00000001 ; } 62 / 81
  • 108. Advantages of universal hashing Advantages Universal hashing provides good results on average, independently of the keys to be stored. Guarantees that no input will always elicit the worst-case behavior. Poor performance occurs only when the random choice returns an inefficient hash function; this has a small probability. 63 / 81
  • 109. Open addressing Definition All the elements occupy the hash table itself. What is it? We systematically examine table slots until either we find the desired element or we have ascertained that the element is not in the table. Advantages The advantage of open addressing is that it avoids pointers altogether. 64 / 81
  • 110. Open addressing Definition All the elements occupy the hash table itself. What is it? We systematically examine table slots until either we find the desired element or we have ascertained that the element is not in the table. Advantages The advantage of open addressing is that it avoids pointers altogether. 64 / 81
  • 111. Open addressing Definition All the elements occupy the hash table itself. What is it? We systematically examine table slots until either we find the desired element or we have ascertained that the element is not in the table. Advantages The advantage of open addressing is that it avoids pointers altogether. 64 / 81
  • 112. Insert in Open addressing Extended hash function to probe Instead of being fixed in the order 0, 1, 2, ..., m − 1 with Θ (n) search time Extend the hash function to h : U × {0, 1, ..., m − 1} → {0, 1, ..., m − 1} This gives the probe sequence h(k, 0), h(k, 1), ..., h(k, m − 1) A permutation of 0, 1, 2, ..., m − 1 65 / 81
  • 113. Hashing methods in Open Addressing HASH-INSERT(T, k) 1 i = 0 2 repeat 3 j = h (k, i) 4 if T [j] == NIL 5 T [j] = k 6 return j 7 else i = i + 1 8 until i == m 9 error “Hash Table Overflow” 66 / 81
  • 114. Hashing methods in Open Addressing HASH-SEARCH(T,k) 1 i = 0 2 repeat 3 j = h (k, i) 4 if T [j] == k 5 return j 6 i = i + 1 7 until T [j] == NIL or i == m 8 return NIL 67 / 81
  • 115. Outline 1 Basic data structures and operations 2 Hash tables Hash tables: Concepts Analysis of hashing under Chaining 3 Hashing Methods The Division Method The Multiplication Method Clustering Analysis of Hashing Functions A Possible Solution: Universal Hashing 4 Open Addressing Linear Probing Quadratic Probing Double Hashing 5 Excercises 68 / 81
  • 116. Linear probing: Definition and properties Hash function Given an ordinary hash function h : 0, 1, ..., m − 1 → U for i = 0, 1, ..., m − 1, we get the extended hash function h(k, i) = h (k) + i mod m, (19) Sequence of probes Given key k, we first probe T[h (k)], then T[h (k) + 1] and so on until T[m − 1]. Then, we wrap around T[0] to T[h (k) − 1]. Distinct probes Because the initial probe determines the entire probe sequence, there are m distinct probe sequences. 69 / 81
  • 117. Linear probing: Definition and properties Hash function Given an ordinary hash function h : 0, 1, ..., m − 1 → U for i = 0, 1, ..., m − 1, we get the extended hash function h(k, i) = h (k) + i mod m, (19) Sequence of probes Given key k, we first probe T[h (k)], then T[h (k) + 1] and so on until T[m − 1]. Then, we wrap around T[0] to T[h (k) − 1]. Distinct probes Because the initial probe determines the entire probe sequence, there are m distinct probe sequences. 69 / 81
  • 118. Linear probing: Definition and properties Hash function Given an ordinary hash function h : 0, 1, ..., m − 1 → U for i = 0, 1, ..., m − 1, we get the extended hash function h(k, i) = h (k) + i mod m, (19) Sequence of probes Given key k, we first probe T[h (k)], then T[h (k) + 1] and so on until T[m − 1]. Then, we wrap around T[0] to T[h (k) − 1]. Distinct probes Because the initial probe determines the entire probe sequence, there are m distinct probe sequences. 69 / 81
  • 119. Linear probing: Definition and properties Disadvantages Linear probing suffers of primary clustering. Long runs of occupied slots build up increasing the average search time. Clusters arise because an empty slot preceded by i full slots gets filled next with probability i+1 m . Long runs of occupied slots tend to get longer, and the average search time increases. 70 / 81
  • 120. Example Example using keys uniformly distributed It was generated using the division method Then 71 / 81
  • 121. Example Example using keys uniformly distributed It was generated using the division method Then 71 / 81
  • 122. Example Example using Gaussian keys It was generated using the division method Then 72 / 81
  • 123. Example Example using Gaussian keys It was generated using the division method Then 72 / 81
  • 124. Outline 1 Basic data structures and operations 2 Hash tables Hash tables: Concepts Analysis of hashing under Chaining 3 Hashing Methods The Division Method The Multiplication Method Clustering Analysis of Hashing Functions A Possible Solution: Universal Hashing 4 Open Addressing Linear Probing Quadratic Probing Double Hashing 5 Excercises 73 / 81
  • 125. Quadratic probing: Definition and properties Hash function Given an auxiliary hash function h : 0, 1, ..., m − 1 → U for i = 0, 1, ..., m − 1, we get the extended hash function h(k, i) = (h (k) + c1i + c2i2 ) mod m, (20) where c1, c2 are auxiliary constants Sequence of probes Given key k, we first probe T[h (k)], later positions probed are offset by amounts that depend in a quadratic manner on the probe number i. The initial probe determines the entire sequence, and so only m distinct probe sequences are used. 74 / 81
  • 126. Quadratic probing: Definition and properties Hash function Given an auxiliary hash function h : 0, 1, ..., m − 1 → U for i = 0, 1, ..., m − 1, we get the extended hash function h(k, i) = (h (k) + c1i + c2i2 ) mod m, (20) where c1, c2 are auxiliary constants Sequence of probes Given key k, we first probe T[h (k)], later positions probed are offset by amounts that depend in a quadratic manner on the probe number i. The initial probe determines the entire sequence, and so only m distinct probe sequences are used. 74 / 81
  • 127. Quadratic probing: Definition and properties Advantages This method works much better than linear probing, but to make full use of the hash table, the values of c1,c2, and m are constrained. Disadvantages If two keys have the same initial probe position, then their probe sequences are the same, since h(k1, 0) = h(k2, 0) implies h(k1, i) = h(k2, i). This property leads to a milder form of clustering, called secondary clustering. 75 / 81
  • 128. Quadratic probing: Definition and properties Advantages This method works much better than linear probing, but to make full use of the hash table, the values of c1,c2, and m are constrained. Disadvantages If two keys have the same initial probe position, then their probe sequences are the same, since h(k1, 0) = h(k2, 0) implies h(k1, i) = h(k2, i). This property leads to a milder form of clustering, called secondary clustering. 75 / 81
  • 129. Outline 1 Basic data structures and operations 2 Hash tables Hash tables: Concepts Analysis of hashing under Chaining 3 Hashing Methods The Division Method The Multiplication Method Clustering Analysis of Hashing Functions A Possible Solution: Universal Hashing 4 Open Addressing Linear Probing Quadratic Probing Double Hashing 5 Excercises 76 / 81
  • 130. Double hashing: Definition and properties Hash function Double hashing uses a hash function of the form h(k, i) = (h1(k) + ih2(k)) mod m, (21) where i = 0, 1, ..., m − 1 and h1, h2 are auxiliary hash functions (Normally for a Universal family) Sequence of probes Given key k, we first probe T[h1(k)], successive probe positions are offset from previous positions by the amount h2(k) mod m. Thus, unlike the case of linear or quadratic probing, the probe sequence here depends in two ways upon the key k, since the initial probe position, the offset, or both, may vary. 77 / 81
  • 131. Double hashing: Definition and properties Hash function Double hashing uses a hash function of the form h(k, i) = (h1(k) + ih2(k)) mod m, (21) where i = 0, 1, ..., m − 1 and h1, h2 are auxiliary hash functions (Normally for a Universal family) Sequence of probes Given key k, we first probe T[h1(k)], successive probe positions are offset from previous positions by the amount h2(k) mod m. Thus, unlike the case of linear or quadratic probing, the probe sequence here depends in two ways upon the key k, since the initial probe position, the offset, or both, may vary. 77 / 81
  • 132. Double hashing: Definition and properties Advantages When m is prime or a power of 2, double hashing improves over linear or quadratic probing in that Θ(m2) probe sequences are used, rather than Θ(m) since each possible (h1(k), h2(k)) pair yields a distinct probe sequence. The performance of double hashing appears to be very close to the performance of the “ideal” scheme of uniform hashing. 78 / 81
  • 134. Analysis of Open Addressing Theorem 11.6 Given an open-address hash table with load factor α = n m < 1, the expected number of probes in an unsuccessful search is at most 1 1−α assuming uniform hashing. Corollary Inserting an element into an open-address hash table with load factor˛ requires at most 1 1−α probes on average, assuming uniform hashing. Theorem 11.8 Given an open-address hash table with load factor α < 1, the expected number of probes in a successful search is at most 1 α ln 1 1−α assuming uniform hashing and assuming that each key in the table is equally likely to be searched for. 80 / 81
  • 135. Analysis of Open Addressing Theorem 11.6 Given an open-address hash table with load factor α = n m < 1, the expected number of probes in an unsuccessful search is at most 1 1−α assuming uniform hashing. Corollary Inserting an element into an open-address hash table with load factor˛ requires at most 1 1−α probes on average, assuming uniform hashing. Theorem 11.8 Given an open-address hash table with load factor α < 1, the expected number of probes in a successful search is at most 1 α ln 1 1−α assuming uniform hashing and assuming that each key in the table is equally likely to be searched for. 80 / 81
  • 136. Analysis of Open Addressing Theorem 11.6 Given an open-address hash table with load factor α = n m < 1, the expected number of probes in an unsuccessful search is at most 1 1−α assuming uniform hashing. Corollary Inserting an element into an open-address hash table with load factor˛ requires at most 1 1−α probes on average, assuming uniform hashing. Theorem 11.8 Given an open-address hash table with load factor α < 1, the expected number of probes in a successful search is at most 1 α ln 1 1−α assuming uniform hashing and assuming that each key in the table is equally likely to be searched for. 80 / 81
  • 137. Excercises From Cormen’s book, chapters 11 11.1-2 11.2-1 11.2-2 11.2-3 11.3-1 11.3-3 81 / 81