Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
1
CS143: Hash Index
2
What is a Hash Table?
• Hash Table
– Hash function
• h(k): key ‡ integer [0…n]
• e.g., h(‘Susan’) = 7
– Array for keys: ...
3
Why Hash Table?
• Direct access
• Saved space
– Do not reserve a space for
every possible key
• How can we use this idea...
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Próximo SlideShare
03.01 hash tables
03.01 hash tables
Cargando en…3
×

Eche un vistazo a continuación

1 de 26 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Anuncio

Similares a Hash (20)

Anuncio

Más reciente (20)

Hash

  1. 1. 1 CS143: Hash Index
  2. 2. 2 What is a Hash Table? • Hash Table – Hash function • h(k): key ‡ integer [0…n] • e.g., h(‘Susan’) = 7 – Array for keys: T[0…n] – Given a key k, store it in T[h(k)] 5 Susan4 James3 2 Neil1 0 h(Susan) = 4 h(James) = 3 h(Neil) = 1
  3. 3. 3 Why Hash Table? • Direct access • Saved space – Do not reserve a space for every possible key • How can we use this idea for a record search in a DBMS?
  4. 4. 4 Hashing for DBMS (Static Hashing) (key, record) . . . Disk blocks (buckets) search key Æ h(key) 0 1 2 3 4
  5. 5. 5 Issues? • What hash function? • Overflow? • How to store records?
  6. 6. 6 Record storage 1. Whole record 2. Key and pointer . . . record . . . record key 1 h(key) h(key)
  7. 7. 7 Overflow and Chaining • Insert h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 h(e) = 1 • Delete h(b) = 2 h(c) = 1 0 1 2 3 Do we keep keys sorted inside a block?
  8. 8. 8 Questions • Q: Do we keep keys sorted inside a block? • Q: How much space should we use for “good” performance?
  9. 9. 9 What Hash Function? • Example – Key = ‘x1 x2 … xn’ n byte character string – Have b buckets – h: (x1 + x2 + ….. xn) mod b • Desired property – Uniformity: same # of keys for every bucket • Reference – “The Art of Computer Programming Vol. 3” by Knuth
  10. 10. 10 Major Problem of Static Hashing • How to cope with growth? – Data tends to grow in size – Overflow blocks unavoidable 10 20 30 40 50 60 70 80 90 39 31 35 36 32 38 34 33 overflow blocks hash buckets
  11. 11. 11 Idea 1 • Periodic reorganization – New hash function – Complete reassignment of records – Very expensive 10 20 30 40 50 … 33 10 20 30 40 50 60 70 80 90 33 40 50 60
  12. 12. 12 Idea 2 • Why don’t we split a bucket when it overflows? 0 a 3 b 2 d 1 c e Insert: h(g) = 1
  13. 13. 13 (a) Use i of b bits output by hash function Extendible Hashing (two ideas) 00110101h(K) Æ b use i Æ grows over time
  14. 14. 14 (b) Use directory that maintains pointers to hash buckets (indirection) Extendible Hashing (two ideas) . . . . . . c e hash bucketdirectory h(c)
  15. 15. 15 Example • h(k) is 4 bits; 2 keys/bucket Insert 0111, 1010 1 1 1 0001 1001 1100 0 1 i =
  16. 16. 16 Example continued Insert 0000 1 0001 2 1001 1010 2 1100 00 01 10 11 2i = 0111
  17. 17. 17 Example continued Insert 1011 00 01 10 11 2i = 21001 1010 21100 20111 20000 0001
  18. 18. 18 Questions on Extendible Hashing • Q: What will happen if we have a lot of duplicate search keys?
  19. 19. 19 Duplicate keys • Insert 0001 1 1 1 0001 0 1 i = 0001
  20. 20. 20 Questions on Extendible Hashing • Can we provide minimum space guarantee?
  21. 21. 21 Space Waste 2 1 40001 40000 0000 4i = 3
  22. 22. 22 Extendible Hashing: Deletion • Two options a) No merging of buckets b) Merge buckets and shrink directory if possible
  23. 23. 23 Bucket Merge Example Can we merge a and b? b and c? Can we shrink the directory? 1 0001 2 1001 2 1100 00 01 10 11 2i = a b c
  24. 24. 24 Bucket Merge Condition • Bucket merge condition – Bucket i’s are the same – First (i-1) bits of the hash key are the same • Directory shrink condition – All bucket i’s are smaller than the directory i
  25. 25. 25 Summary of Extendible Hashing • Can handle growing files – No periodic reorganizations • Indirection – Up to 2 disk accesses to access a key • Directory doubles in size – Not too bad if the data is not too large
  26. 26. 26 Hashing vs. Tree • Can an extendible-hash index support? SELECT FROM R WHERE R.A > 5 • Which one is better, B+tree or Extendible hashing? SELECT FROM R WHERE R.A = 5

×