HashMap source code line by line analysis, j + oldCap bucket location reallocation formula handwritten verification

Time:2022-11-25

illustrate

This article is written based on jdk 8.

Structure of HashMap

HashMap source code line by line analysis, j + oldCap bucket location reallocation formula handwritten verification

  1. The array in the figure is the table property, the basic property of hashMap. An array is used to carry nodes, and each cell of the table is called a bucket.
  2. node is the basic node in hashMap, used to store key and value.
  3. The formula for bucket position calculation is(n - 1) & hash, n refers to the length of the table, hash refers to the hash value of the key.
  4. There may be hash conflicts when calculating the bucket position. Before jdk 1.7, the node is spliced ​​into a linked list. However, if hash conflicts are severe, the linked list at the bucket location will be very long, affecting query performance. Starting from jdk 1.8, it has been changed to a linked list + red-black tree method. When there are many elements in a bucket position, the query efficiency of the tree is better than that of the linked list.

key attributes

table

/**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     *
     * HashMap basic properties. An array, used to carry node, each grid of table is called bucket
     */
    transient Node<K,V>[] table;

Node

/**
     * Basic hash bin node, used for most entries.  (See below for
     * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
     *
     * The basic node node in hashMap is used to store key and value
     */
    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
    }

modCount

This attribute has nothing to do with understanding the core process of HashMap. If the reader only cares about the core process, you can ignore it.

/**
     * The number of times this HashMap has been structurally modified
     * Structural modifications are those that change the number of mappings in
     * the HashMap or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the HashMap fail-fast.  (See ConcurrentModificationException).
     *
     * It is used to record the number of modifications, and its value will be maintained every time it is added, deleted or modified
     * At the beginning of an iterator, modCount will be recorded with a local variable mc. After the iterator traversal is completed, if modCount is found to be different from mc, it means that the hashMap has been modified during iteration, and an exception will be thrown.
     * For iterator traversal, you can take a look at the forEach method of the EntrySet inner class
     */
    transient int modCount;

For iterator traversal, you can take a look at the forEach method of the EntrySet inner class.

/**
     * Please note that other member properties and member methods of EntrySet in the source code are not shown here
     */
    final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
        public final void forEach(Consumer<? super Map.Entry<K,V>> action) {
            Node<K,V>[] tab;
            if (action == null)
                throw new NullPointerException();
            if (size > 0 && (tab = table) != null) {
                // Record modCount with a local variable mc
                int mc = modCount;
                for (int i = 0; i < tab.length; ++i) {
                    for (Node<K,V> e = tab[i]; e != null; e = e.next)
                        action.accept(e);
                }
                // After the iterator traversal is completed, if modCount and mc are found to be different, it means that the hashMap has been modified during iteration, and an exception will be thrown.
                if (modCount != mc)
                    throw new ConcurrentModificationException();
            }
        }
    }

threshold expansion threshold

/**
     * The next size value at which to resize (capacity * load factor).
     *
     * Expansion threshold, obtained by capacity * loadFactor. Decide when hashMap executes the resize method to expand
     *
     * @serial
     */
    // (The javadoc description is true upon serialization.
    // Additionally, if the table array has not been allocated, this
    // field holds the initial array capacity, or zero signifying
    // DEFAULT_INITIAL_CAPACITY.)
    int threshold;

loadFactor load factor

/**
     * The load factor for the hash table.
     *
     * Load factor, which determines the actual capacity of elements that hashMap can store
     *
     * @serial
     */
    final float loadFactor;

    /**
     * The load factor used when none specified in constructor.
     *
     * The default loadFactor load factor
     */
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

The load factor is the degree to which the elements in the HashMap are filled. The purpose of the load factor is to reduce the probability of hash conflicts in HashMap, prevent a large number of nodes from becoming linked lists or trees due to hash conflicts, and balance the occupied space overhead.

The larger the load factor, the more elements are filled. The advantage is that the space utilization rate is high. The disadvantage is that the chance of hash collision increases.

The smaller the load factor, the fewer elements will be filled. The advantage is that the chance of conflict is reduced. The disadvantage is that more space is wasted.

The default load factor DEFAULT_LOAD_FACTOR = 0.75f ​​is a trade-off between hash collision probability and space overhead.

Construction method

/**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param initialCapacity the initial capacity. The initial capacity, from capacity * loadFactor can get the expansion threshold threshold
     * @throws IllegalArgumentException if the initial capacity is negative.
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

We can notice that there is no capacity attribute in HashMap. The capacity we pass in in the construction method will actually be calculated by capacity * loadFactor to get the expansion threshold threshold.

put method

/**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * put method, call Val to add the corresponding value to the specified key
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        // The first boolean false means: when the key to be put already exists in the hashMap, the original value will be overwritten directly. Don't care about the second boolean true, it has nothing to do with the main process.
        return putVal(hash(key), key, value, false, true);
    }

putVal

/**
     * Implements Map.put and related methods.
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value If the target key already exists in the hashMap, the original value will not be overwritten
     * @param evict if false, the table is in creation mode. Don't care, it has nothing to do with the main process
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // If the table has not been initialized, initialize the table
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // Bucket position calculation formula: (n - 1) & hash. If the located bucket position is empty, insert node into the bucket position. p points to the bucket position
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        // The bucket position is not empty, indicating that there is a hash collision, go to the else branch
        else {
            Node<K,V> e; K k;
            // Use the hash value of the key and the equals method to determine whether the key at the bucket position is the same. If they are the same, use e to point to this node
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // Determine whether the position of the bucket is a tree, if it is a tree, call the method of adding elements to the tree, and then use e to point to this node on the tree
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            // If it is not a tree, it means it is a linked list
            else {
                // Iterate the linked list, binCount is the length count of the linked list
                for (int binCount = 0; ; ++binCount) {
                    // Use e to point to the current element of this iteration. If this iteration, the current element is empty, that is, it has reached the end of the linked list
                    if ((e = p.next) == null) {
                        // Append node to the end of the linked list
                        p.next = newNode(hash, key, value, null);
                        // If the length of the linked list reaches the threshold, convert the linked list into a tree. The threshold of the linked list conversion tree cannot be modified because it is final modified.
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    // Determine whether the keys of this iteration are the same through the hash value of the key and the equals method. If they are the same, use e to point to this node. Then stop iterating the linked list.
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    // Maintain the current element and prepare for the next iteration
                    p = e;
                }
            }
            // If e is not empty, it means that the key to be added already exists in this bucket, overwriting the original value. This shows that when the onlyIfAbsent option of hashMap is false, when the key is the same, the original value will be directly overwritten
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                // When the onlyIfAbsent option is false, or when the original value is null, it will directly overwrite the original value
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                // Don't care about this method. Leave it to the LinkedHashMap callback.
                afterNodeAccess(e);
                // return the original value
                return oldValue;
            }
        }
        // maintain modification count
        ++modCount;
        // If the expansion threshold is reached, resize
        if (++size > threshold)
            resize();
        // Don't care about this method. Leave it to the LinkedHashMap callback.
        afterNodeInsertion(evict);
        // If the value corresponding to the key is not found, return null
        return null;
    }

There is a detail, the threshold TREEIFY_THRESHOLD of the linked list conversion tree cannot be modified, because it is final modified, which was asked in the previous interview.

/**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     *
     * The threshold of the linked list conversion tree cannot be modified because it is final modified
     */
    static final int TREEIFY_THRESHOLD = 8;

get method

/**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        Node<K,V> e;
        // Find the node according to the specified key and return the value of the node
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

getNode

/**
     * Implements Map.get and related methods.
     * 
     * According to the specified key, find the node
     *
     * @param hash hash for key
     * @param key the key
     * @return the node, or null if none
     */
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        // If the table is not empty, and the bucket position corresponding to the key is not empty
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            // Use the hash value of the key and the equals method to determine whether the key at the bucket position is the same
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                // Same as the target key, return the current node
                return first;
            // The bucket position is not empty, indicating that there may be a hash collision, and determine whether the element at the bucket position has a next node
            if ((e = first.next) != null) {
                // Determine whether the bucket position is a tree, if it is a tree, call the method of the tree to find the element
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // If it is not a tree, it means it is a linked list, and iterates the linked list
                do {
                    // Determine whether the keys of this iteration are the same through the hash value of the key and the equals method
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        // Same as the target key, return the current node
                        return e;
                } while ((e = e.next) != null);
            }
        }
        // The element corresponding to the key is not found, return null
        return null;
    }

resize method

Two common cases of executing the resize() method

In the putVal method of HashMap, if the table is not initialized, resize() will be executed, and then the table will be initialized. Initializing the table is the responsibility of resize.

// If the table has not been initialized, initialize the table        
if ((tab = table) == null || (n = tab.length) == 0)
    n = (tab = resize()).length;

In the putVal method of HashMap, when the amount of stored data is greater than the threshold, the resize() method will be executed.

// If the number of elements in the hashMap reaches the expansion threshold, resize
        if (++size > threshold)
            resize();

In the putVal() method, size represents the current data volume of the HashMap. If the size is greater than the threshold, this method will be executed to expand the capacity.

resize method source code

/**
     * Initializes or doubles table size.  If null, allocates in
     * accord with initial capacity target held in field threshold.
     * Otherwise, because we are using power-of-two expansion, the
     * elements from each bin must either stay at same index, or move
     * with a power of two offset in the new table.
     *
     * Initialize hashMap or double the table size of hashMap
     *
     * @return the table
     */
    final Node<K,V>[] resize() {
        // oldTab oriented old table
        Node<K,V>[] oldTab = table;
        // the length of the old table
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        // oldThr represents the old expansion threshold threshold. threshold = array length * load factor
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            // Processing when the length of the old table is greater than the maximum capacity
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // If the old array length * 2 is less than the maximum value of int, and the old array length is greater than 16
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                // expansion threshold * 2
                newThr = oldThr << 1; // double threshold
        }
        // If the old threshold is greater than 0, the initial capacity is set to the old threshold. This will be used when the table is initialized
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        // A capacity expansion threshold of 0 means using the default value, DEFAULT_INITIAL_CAPACITY = 16, DEFAULT_LOAD_FACTOR = 0.75, so the default capacity expansion threshold is 12
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        // Boundary condition processing when the expansion threshold is 0
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        // Assign the calculated threshold to the threshold attribute
        threshold = newThr;
        // Don't care about this annotation. This annotation is shielding some irrelevant warnings, so that developers can see some warnings they really care about, reducing the mental burden of developers.
        // Create a new table for use after expansion
        @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        // Assign the new table to the property of hashMap
        table = newTab;
        // If the old table is not empty, start expanding
        if (oldTab != null) {
            // Iterate through the old table
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                // e points to the element node at the current bucket position, when the node at the current bucket position is not empty
                if ((e = oldTab[j]) != null) {
                    // Clear the current bucket position of the old table
                    oldTab[j] = null;
                    if (e.next == null)
                        // If the node at the current bucket position is not a linked list or a red-black tree, reassign the bucket position of the node according to the calculation formula of the bucket position
                        newTab[e.hash & (newCap - 1)] = e;
                    // If the node at the current bucket position is a tree, use the tree method to reassign the node on the old tree to the new tree
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    // Not a tree, but a linked list
                    else { // preserve order
                        // Divide all nodes in the linked list into two linked lists
                        // The node of a linked list does not need to replace the table subscript
                        Node<K,V> loHead = null, loTail = null;
                        // The node of a linked list needs to replace the table subscript
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        // iterate through the linked list
                        do {
                            next = e.next;
                            // If e.hash & oldCap perform binary AND operation, the calculated result is 0, which means that the array subscript corresponding to this node does not need to be changed. Append the node to the loHead list
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            // Otherwise, it means that the array subscript corresponding to the node needs to be changed. Append the node to the hiHead linked list
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        // If there is no need to replace the node linked list of the table subscript -- loTail is not empty, put loTail in the current bucket position
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        // If you need to replace the node linked list of the table subscript -- hiTail is not empty, then put hiTail in the new bucket position. And the calculation formula is to directly add the current table subscript + the length of the old table
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        // return the newly created table
        return newTab;
    }

two formulas

(e.hash & oldCap) == 0 Determine whether the bucket location needs to be reallocated

e is the current node, oldCap is the length of the old array. The calculated result of this formula is 0, indicating that the array subscript corresponding to the node (ie e) does not need to be changed. The result is not 0, indicating that the array subscript corresponding to the node needs to be changed.

(e.hash & oldCap) == 0Why can it be determined whether the bucket location needs to be reassigned?

This formula is derived, and the derivation process is mathematics, we don’t need to pay attention. If you want to understand the derivation of this formula, please see:In the rehash method of HashMap expansion, (e.hash & oldCap) == 0 algorithm derivation

j + oldCap bucket position reallocation formula

j is the old bucket position of the node, and oldCap is the length of the old table. That is, the old bucket position + the length of the old table. The result of this formula is the new bucket position of the element after expansion. It can be understood as a formula for redistribution of bucket positions.

Why can it be obtained in this way? Let’s answer with an example.

Now we have a node key whose hash value is 9, the corresponding binary bit. The length oldCap of the old table is 8, and the length newCap of the new table is 16. Here is the handwritten calculus verification:

HashMap source code line by line analysis, j + oldCap bucket location reallocation formula handwritten verification

Why is the expansion of HashMap 2 times?

Formula for reassignment by this bucket positionj + oldCapHandwritten verification, we can see that when the HashMap is expanded twice, the formula for reassigning the bucket position can just be usedj + oldCap, to speed up the calculation of the bucket position after reallocation. at the same time,newCap = oldCap << 1The length of the new table = the length of the old table is shifted to the left by one bit in binary, and this kind of bit operation is also very efficient.

In fact, the cooperation between the expansion multiple and the bucket position redistribution formula here can reflect the author’s careful thinking and profound mathematical skills.