Collision Resolution: Bucket Hashing
Bucket Hashing
Bucket hashing is treating the hash table as a two dimensional array instead of a linear array.
Consider a hash table with S
slots that are divided into B
buckets, with each bucket consisting of S/B
slots.
The hash function assigns each record to the first slot within one of the buckets. If the slot was already occupied then the bucket slots are searched sequentially until an empty slot is found.
If the bucket is completely full, the record will be stored in an overflow bucket
of infinite capacity at the end of the table, which is shared by all buckets. Which makes bucket hashing a form of closed hashing implementation.
An ideal implementation will use a hash function that distributes the records evenly among all buckets so there will be as few records as possible to store in the overflow bucket.
Index | Bucket | Data |
---|---|---|
0 | B0[0] | 30 |
1 | B0[1] | 20 |
2 | B1[0] | |
3 | B1[1] | |
4 | B2[0] | |
5 | B2[1] | |
6 | B3[0] | 18 |
7 | B3[1] | 38 |
8 | B4[0] | |
9 | B4[1] |
Overflow Bucket |
---|
48 |
25 |
Given this bucket hash table for an array of size 10 storing 5 buckets, each bucket having two slots in size, let's demonstrate how this method works in practice. We also have an overflow bucket of infinite size on the right to store records when the buckets in the main hash table are occupied. I will be using mod operation as the hash function.
Let us start by inserting the number 18 as our first record. Since we have 5 buckets, we take mod 5. 18 % 5 is 3.
We put this into the top of B3, which is slot 6 of the hash table.
Now inserting a record for 30. 30 % 5 is 0. 30 goes into B0[0].
Next we insert a record for 38; 38 % 5 is 3 so it will be placed in B3[1].
Next up we have 48. 48 % 5 is 3, but the B3 is already full, hence we store 48 in the first available slot of our overflow bucket.
We can now try with 20. 20 % 5 is 0; B0[0] is occupied hence it will be stored in B0[1].
Now if we insert 25, 25 % 5 is 0 and we know both slots of B0 are occupied now, hence it will end up in our overflow bucket.
When looking for a record, we first take its hash value and search the resulting bucket.
If we search for key value 20, we search in B0, first checking B0[0] which holds a different value, so we check B0[1] and we find our key.
When searching for the key value 25, we look in B0 sequentially. We see it doesn't hold our key value and it is full, hence we look through the overflow bucket. First checking OB[0], then OB[1] and we have found it.
Note that if there are many records in the overflow bucket, this will be an expensive process.
Bucket methods are good for implementing hash tables stored on disk, because the bucket size can be set to the size of a disk block. Whenever search or insertion occurs, the entire bucket is read into memory. Because the entire bucket is then in memory, processing an insert or search operation requires only one disk access, unless the bucket is full. If the bucket is full, then the overflow bucket must be retrieved from disk as well. Naturally, overflow should be kept small to minimize unnecessary disk accesses.