How hashing is done?
A hash works by allocating a value into one of the many storage spaces it has, allowing for fast retrieval later. This storage space is also known as buckets.
Hash identifies where to insert which data and that also hash tells in constant time in which bucket the value is stored.
Problems with Hash:
Now though hashing has the benefit of fast retrieval of data items, it also has some disadvantages. It primarily has 2 problems associated with it
1. Values that are complex and difficult to compare:
- It is easy to solve this problem if you decompose complex values into a key or hash that is easy to work with. The easiest way to accomplish this is of course to generate unique numbers from your value. The number must be unique because we want to distinguish one value from another. Primes are a great tool for making numbers unique.
- Using primes in hashes allows us to ensure that a product of a prime with any other number has the best chance of being unique (not as unique as the prime itself, however).
- In order to generate a unique hash, you multiply each digit or letter in the string “Dev” with a prime number and add them up.
Nevertheless, prime hashing is an old technique. The key is to understand you can move on to other hashing systems as long as your key is sufficiently unique.
2. Sequential searches are not fast for large data sets:
- In this case, the sequential search would adversely affect hash performance, directly proportional to the number of values contained. Therefore, you would have a linear performance cost (O(n) ), which becomes progressively worse as the number of keys(n) increases.
- Additionally, if you are dealing with strings or other complex types, the number of checks or comparisons becomes prohibitively expensive.
What is the purpose of 31?
Let’s say your container is a fixed array of 16 items, so you have your unique identifier for each value now.
How do you assign this value to these buckets or containers?
- Using the same location as the value or the generated unique number, known as the hash key is the easiest way to decide where to place the hash key.
- When you place your keys in the containers, you would receive a different key number and therefore a different distribution of the keys in your array depending on the prime you used in your array.
- Therefore, the same key “DEV” would be D * 31 + E * 31 + V * 31 for prime 31 and it would generate a different key for the “DEV” with D * 30 + E* 30 + V * 30, so data would go into a different location based on the prime used because the key produced is different.
To avoid collision also:
It is possible that certain strings will generate the same key. In such cases, the individual hash storage can be converted into a link list or another type of storage that can store all the duplicate keys. This is why the bucket is called the individual hash storage. So to keep the minimum number of collisions we consider unique values.