|
| 1 | +# Hash Table |
| 2 | + |
| 3 | +A hash table (or hash map) is a fundamental data structure in computer science, used for fast data retrieval. Here’s an in-depth look at hash tables: |
| 4 | + |
| 5 | +## Overview |
| 6 | +A hash table is a collection of key-value pairs, where each key is mapped to a value. The key is transformed into an index in an array through a process called hashing. This index is used to quickly locate the value associated with the key. |
| 7 | + |
| 8 | +## Key Concepts |
| 9 | +1. *Hash Function:* This function takes an input (the key) and returns an integer (the hash code), which is used as an index to store the corresponding value in the array. A good hash function distributes keys uniformly across the array. |
| 10 | +2. *Buckets:* Each index in the array is referred to as a bucket. In cases where multiple keys hash to the same index, these keys are stored in the same bucket. |
| 11 | +3. *Collision Handling:* Collisions occur when multiple keys hash to the same index. Common collision handling techniques include: |
| 12 | + - Chaining: Each bucket points to a linked list of entries that map to the same index. |
| 13 | + - Open Addressing: When a collision occurs, the algorithm searches for the next open bucket using a probing sequence (linear probing, quadratic probing, or double hashing). |
| 14 | + |
| 15 | +## Operations |
| 16 | +**Insertion:** |
| 17 | + |
| 18 | +Compute the hash code of the key using the hash function. |
| 19 | +Map the hash code to an index in the array. |
| 20 | +Store the key-value pair in the corresponding bucket. Handle collisions appropriately. |
| 21 | + |
| 22 | +**Search:** |
| 23 | + |
| 24 | +Compute the hash code of the key. |
| 25 | +Map the hash code to an index. |
| 26 | +Check the corresponding bucket. If using chaining, search the linked list. If using open addressing, follow the probing sequence until the key is found or an empty bucket is reached. |
| 27 | + |
| 28 | +**Deletion:** |
| 29 | + |
| 30 | +Compute the hash code of the key. |
| 31 | +Map the hash code to an index. |
| 32 | +Locate the key in the bucket and remove the key-value pair. Adjust the structure to maintain efficiency (e.g., rehash elements if necessary). |
| 33 | + |
| 34 | +## Complexity |
| 35 | +Average Time Complexity: O(1) for insert, delete, and search operations in a well-implemented hash table with a good hash function and a load factor that avoids excessive collisions. |
| 36 | + |
| 37 | +Worst-Case Time Complexity: O(n) for insert, delete, and search operations if all keys hash to the same index (highly unlikely with a good hash function). |
| 38 | + |
| 39 | +## Load Factor |
| 40 | +The load factor (α) is the ratio of the number of entries to the number of buckets in the hash table. A high load factor can lead to more collisions, affecting performance. A common practice is to resize the hash table (rehash) when the load factor exceeds a certain threshold. |
| 41 | + |
| 42 | + |
| 43 | +Load Factor = No. of entries / No. of Buckets |
| 44 | + |
| 45 | + |
| 46 | +**Rehashing** |
| 47 | + |
| 48 | +When the load factor exceeds a threshold, the hash table is resized (usually doubled) and all existing keys are rehashed to the new array. This operation ensures that the hash table maintains efficient performance. |
| 49 | + |
| 50 | +## Practical Considerations |
| 51 | +*Choosing a Hash Function:* A good hash function minimizes collisions and uniformly distributes keys. Common hash functions include division-remainder, multiplication, and universal hashing. |
| 52 | + |
| 53 | +*Dynamic Resizing:* To maintain efficient operations, hash tables are dynamically resized (rehashing) when the load factor becomes too high. |
| 54 | + |
| 55 | +*Memory Usage:* Hash tables can be memory-inefficient if they are sparsely populated. A balance between memory usage and performance needs to be struck. |
| 56 | + |
| 57 | +## Applications |
| 58 | + |
| 59 | +1. Database Indexing: Hash tables are used for indexing databases to allow quick retrieval of records. |
| 60 | +2. Caches: Hash tables underpin the implementation of caches. |
| 61 | +3. Symbol Tables in Compilers: Used for fast lookup of identifiers (variables, functions, etc.) during compilation. |
| 62 | +4. Sets: Hash tables can be used to implement set data structures for efficient membership checking. |
| 63 | + |
| 64 | +## Implementation in CPP |
| 65 | + |
| 66 | +```cpp |
| 67 | +#include <iostream> |
| 68 | +#include <vector> |
| 69 | +#include <list> |
| 70 | +#include <string> |
| 71 | + |
| 72 | +class HashTable { |
| 73 | +private: |
| 74 | + int size; |
| 75 | + std::vector<std::list<std::pair<std::string, std::string>>> table; |
| 76 | + |
| 77 | + int hashFunction(const std::string& key) const { |
| 78 | + std::hash<std::string> hashFunc; |
| 79 | + return hashFunc(key) % size; |
| 80 | + } |
| 81 | + |
| 82 | +public: |
| 83 | + HashTable(int s) : size(s), table(s) {} |
| 84 | + |
| 85 | + void insert(const std::string& key, const std::string& value) { |
| 86 | + int index = hashFunction(key); |
| 87 | + for (auto& kvp : table[index]) { |
| 88 | + if (kvp.first == key) { |
| 89 | + kvp.second = value; |
| 90 | + return; |
| 91 | + } |
| 92 | + } |
| 93 | + table[index].emplace_back(key, value); |
| 94 | + } |
| 95 | + |
| 96 | + std::string search(const std::string& key) const { |
| 97 | + int index = hashFunction(key); |
| 98 | + for (const auto& kvp : table[index]) { |
| 99 | + if (kvp.first == key) { |
| 100 | + return kvp.second; |
| 101 | + } |
| 102 | + } |
| 103 | + return "Not Found"; |
| 104 | + } |
| 105 | + |
| 106 | + bool remove(const std::string& key) { |
| 107 | + int index = hashFunction(key); |
| 108 | + for (auto it = table[index].begin(); it != table[index].end(); ++it) { |
| 109 | + if (it->first == key) { |
| 110 | + table[index].erase(it); |
| 111 | + return true; |
| 112 | + } |
| 113 | + } |
| 114 | + return false; |
| 115 | + } |
| 116 | +}; |
| 117 | + |
| 118 | +int main() { |
| 119 | + HashTable ht(10); |
| 120 | + ht.insert("key1", "value1"); |
| 121 | + ht.insert("key2", "value2"); |
| 122 | + |
| 123 | + std::cout << "Search key1: " << ht.search("key1") << std::endl; // Output: value1 |
| 124 | + std::cout << "Search key2: " << ht.search("key2") << std::endl; // Output: value2 |
| 125 | + |
| 126 | + ht.remove("key1"); |
| 127 | + std::cout << "Search key1: " << ht.search("key1") << std::endl; // Output: Not Found |
| 128 | + |
| 129 | + return 0; |
| 130 | +} |
| 131 | +``` |
0 commit comments