[algorithm] algorithm diagram note _ hash table

Time:2019-10-23

The time complexity of linear search is O(n), binary search is O(logn), is there any search with time complexity of O(1)? Yes, of course. That’s the hash table.

The hash function

Hash function “maps input to number”. It must meet some requirements.

  1. It has to be consistent. For the same input, the output must be the same.
  2. Ideally, map different inputs to different Numbers. In this way, different inputs are mapped to different locations.

You can then use the hash function to map the input to different places in the array to get a simple oneHash table(hash table). Hash tables are the first data structure so far introduced in the book to contain additional logic. Both arrays and linked lists are mapped directly into memory, but hash tables are more complex, using hash functions to determine where elements are stored. Hash table bykeyandvalueComposition, mapping keys to values.

A hash table is also called a hash tableHash mapmappingThe dictionaryandAn associative array. For example, the hash table implementation provided by Python isThe dictionary. Python built-in dictionary usage:

>>> book = dict() # create an empty dictionary
>>> book["apple"] = 0.67 # add key-value pair
>>> book["milk"] = 1.49 # add key-value pair
>>> book["avocado"] = 1.49 # add key-value pair
>>> print(book) # prints the current hash table
{'apple': 0.67, 'milk': 1.49, 'avocado': 1.49}
>>> print(book["avocado"]) # hash table USES keys to find values
1.49

The application case

Use hash tables for lookups

Quickly find its associated value by key.

For instance,The phone book
Name is key, phone number is value

DNS resolution
The domain name is the key and the IP address is the value

To prevent the repeat

For example, voting is limited to one vote per person. You can store someone’s information (such as name, IP, etc.) as a key in the hash table. Before each user votes, check whether they have voted before.

Use the hash table as a cache

Caching is a common way to speed things up. All large websites use caching, and the cached data is stored in hash tables. How caching works: websites remember data instead of recalculating it, which reduces response time and saves the server computing resources.

When visiting a web page, it first checks to see if the page is stored in the hash table. Only when the URL is not in the cache do you ask the server to do some processing, store the generated data in the cache, and return it. This way, the next time someone requests the URL, you can send the cached data instead of having the server process it.

Conflict (collision)

Ideally, hash functions map different inputs to different Numbers, but it is almost impossible to write such hash functions.
conflict: assign the same position to the two keys.

There are many ways to handle conflicts. The simplest way is as follows: if two keys map to the same location, store a linked list there.
[algorithm] algorithm diagram note _ hash table

The worst that can happen is if the hash table is empty except for the first position, which contains a long list, and the lookup speed is the same as that of the linked list, which is slow.

Good hash functions rarely cause collisions, which map keys evenly to different locations in the packet hash so that the linked list is not too long.

performance

On average, hash tables take O(1) to perform various operations. The O (1) is calledconstantTime. Constant time doesn’t mean immediately, but it does mean that no matter how big the hash table is, it takes the same amount of time. In the worst case, all hash table operations run at O(n) — linear time.

To avoid conflict, you need:

  • Lower filling factor;

The filling factor = the number of elements/total number of positions contained in the hash table
Filling in a factor measures how many places in the hash table are empty.

Once the fill factor starts to increase, you need to add places in the hash table, which is called resizing. Resizing the length requires recreating the new storage space and then using the hash function to insert all the elements into the new hash table, which is expensive. On average, however, the hash table operation takes O(1), even taking into account the time required to adjust the length.

The lower the filling factor, the smaller the possibility of conflict, and the higher the hash table performance. A good rule of thumb is to adjust the length of the hash table once the filling factor is greater than 0.7.

  • Good hash function.

A good hash function distributes the values in the array evenly. Bad hash functions heap values and cause lots of collisions.

Please continue to follow my public article
[algorithm] algorithm diagram note _ hash table