Hash Table Visualizer
Understand how hash tables work with collision handling. Visualize hash functions, chaining, and open addressing in action.
Hash Function Example
Understanding Hash Tables
Hash tables (also known as hash maps, dictionaries, or associative arrays) are among the most important data structures in computer science. They provide average-case O(1) constant-time operations for insertion, search, and deletion—making them dramatically faster than trees or lists for many use cases. Hash tables power everything from database indexes to programming language implementations to caching systems.
The magic of hash tables lies in the hash function: a function that transforms a key into an array index. Instead of searching through elements, you compute where an element should be and go directly there. This direct access is what enables constant-time operations.
How Hash Tables Work
A hash table consists of an array of slots (buckets) and a hash function. To insert a key-value pair, the hash function computes an index from the key, and the pair is stored at that index. To retrieve a value, the same hash function computes the index, and the value is found directly.
The hash function must be deterministic: the same key always produces the same hash. It should also distribute keys uniformly across the array to minimize collisions. A poor hash function that clusters many keys into few buckets degrades performance significantly.
Common hash functions for strings sum character codes with positional weighting, for integers they use modular arithmetic or bit manipulation, and for complex objects they combine hashes of components. Cryptographic hash functions like SHA-256 are sometimes used but are typically overkill for hash tables.
Handling Collisions
Collisions occur when two different keys hash to the same index. No hash function can avoid collisions entirely when the key space exceeds the table size (by the pigeonhole principle). Collision handling is essential for correctness.
Chaining (Separate Chaining): Each bucket contains a linked list (or another structure) of all key-value pairs that hash to that index. On collision, the new pair is added to the list. Search traverses the list at the computed index. With good hashing, lists stay short and operations remain fast.
Open Addressing: All elements live directly in the array. On collision, probe for an empty slot using a defined sequence (linear probing, quadratic probing, or double hashing). The same probe sequence is used for search. Deletion requires special marking to avoid breaking probe sequences.
Chaining is simpler and handles high load factors gracefully but uses extra memory for pointers. Open addressing has better cache performance but degrades as the table fills. Both achieve O(1) average case with good hash functions and reasonable load factors.
Load Factor and Resizing
The load factor is the ratio of stored elements to table size. As it increases, collisions become more likely and operations slow down. Chaining handles high load factors better but still degrades; open addressing becomes problematic above 70-80% load.
When load factor exceeds a threshold (typically 0.75), the table resizes—usually doubling in size. All existing elements are rehashed into the new, larger array. Resizing is O(n) but happens infrequently enough that amortized insertion remains O(1).
Some implementations also shrink when load factor drops too low, reclaiming memory after many deletions. The threshold for shrinking is lower than for growing to avoid thrashing (repeated grow/shrink cycles near the boundary).
Hash Functions in Practice
For integers, a simple modulo by table size works but can be problematic with patterns in keys. Multiplicative hashing (multiply by a constant, take high bits) distributes more uniformly. For best results, use a prime table size with modular hashing.
For strings, polynomial rolling hash functions treat the string as coefficients of a polynomial and evaluate at a chosen base. The formula is often: hash = (c0 * p^n + c1 * p^(n-1) + ... + cn) mod m, where p is a prime base and m is the table size.
Many languages provide built-in hash functions. Java's Object.hashCode(), Python's hash(), JavaScript's internal hashing—these are carefully crafted for general-purpose use. Custom hash functions are only needed for special cases or when implementing your own hash table.
Time Complexity Analysis
Average case: O(1) for insert, search, and delete. This assumes a good hash function with uniform distribution and a controlled load factor. Each operation computes one hash and accesses one bucket with a short list.
Worst case: O(n) when all keys hash to the same bucket, creating a single long chain. This happens with adversarial input or a poor hash function. Real-world hash tables rarely hit worst case with proper implementation.
Hash tables do not support ordered operations efficiently. Finding the minimum/maximum, iterating in sorted order, or range queries require O(n) scans. For these operations, balanced trees are more appropriate despite higher constant factors.
Applications of Hash Tables
Symbol tables: compilers and interpreters use hash tables to map variable names to their information (type, scope, memory location). Fast lookup is essential for compilation performance.
Caching: memoization stores computed results keyed by input parameters. Database query caches, DNS caches, and CPU caches all use hash-based data structures.
Counting and grouping: counting word frequencies, grouping records by category, detecting duplicates—all are natural hash table applications. The key is what you are counting; the value is the count.
Set operations: hash-based sets provide O(1) membership testing. Two-pointer techniques for finding pairs summing to target use hash sets. Graph algorithms track visited nodes in hash sets.
Implementation Variations
Robin Hood hashing: a variant of open addressing where elements with longer probe sequences "steal" slots from elements with shorter sequences. This reduces variance in probe lengths and improves worst-case performance.
Cuckoo hashing: uses multiple hash functions and tables. An element can be in one of several positions. On collision, existing elements are "kicked out" to their alternative positions. Provides O(1) worst-case lookup but complex insertion.
Swiss Table (flat_hash_map): Google's high-performance implementation uses SIMD instructions to probe multiple slots in parallel. Achieves excellent cache behavior and has become a reference for modern hash table design.