Java collection container interview questions (2021 latest version)

Time:2021-11-25

Collection container overview

What is a collection

Collection framework: a container for storing data.

Collection framework is a unified standard architecture for representing and operating collections.
Any set framework contains three parts: external interface, interface implementation and set operation algorithm.

Interface: represents the abstract data type of the collection. The interface allows us to operate a collection without paying attention to the specific implementation, so as to achieve “polymorphism”. In object-oriented programming languages, interfaces are often used to form specifications.

realization: the concrete implementation of the collection interface is a highly reusable data structure.

algorithm: a method of performing some useful calculation on an object that implements the interface in a collection framework, such as searching, sorting, etc. These algorithms are usually polymorphic because the same method can behave differently when the same interface is implemented by multiple classes. In fact, the algorithm is a reusable function.
It reduces the effort of programming.

Collection framework enables you to focus on important parts of your program by providing useful data structures and algorithms, rather than on low-level design in order to make the program work normally.

Through these simple interoperability between unrelated APIs, you avoid writing a lot of code to adapt objects or transform code in order to combine these APIs. It improves the speed and quality of the program.

Characteristics of sets

The characteristics of the set are as follows:

  • Objects encapsulate data, and more objects need to be stored. Collection is used to store objects.
  • Arrays can be used when the number of objects is determined, and collections can be used when the number of objects is uncertain. Because the set is variable length.

The difference between sets and arrays

  • Arrays are fixed length; Set variable length.
  • Arrays can store basic data types or reference data types; A collection can only store reference data types.
  • The elements stored in the array must be of the same data type; The objects stored in a collection can be different data types.

data structure: is how data is stored in containers.

There are many kinds of collection containers. Because each container has different characteristics, the principle lies in the different internal data structure of each container.

In the process of continuous upward extraction of collection containers, there is a collection system.The principle of using a system: see the top-level content. Create the underlying object.

Benefits of using the collection framework

  1. Capacity self growth;
  2. It provides high-performance data structure and algorithm, makes coding easier, and improves program speed and quality;
  3. Allow the interoperability between different APIs, and the collection can be passed back and forth between APIs;
  4. The collection can be easily extended or rewritten to improve code reusability and operability.
  5. By using the JDK’s own collection classes, you can reduce the cost of code maintenance and learning new APIs.

What are the common collection classes?

Map interface and collection interface are the parent interfaces of all collection frameworks:

  1. The sub interfaces of the collection interface include: set interface and list interface
  2. The implementation classes of map interface mainly include HashMap, treemap, hashtable, concurrenthashmap and properties
  3. The implementation classes of the set interface mainly include: HashSet, TreeSet, linkedhashset, etc
  4. The implementation classes of the list interface mainly include ArrayList, LinkedList, stack and vector

What is the difference between list, set and map? Do list, set and map inherit from the collection interface? What are the characteristics of list, map and set interfaces when accessing elements?

Java collection container interview questions (2021 latest version)

Java containers are divided into two categories: collection and map. The sub interfaces of collection include set, list and queue. Set and list are commonly used. The map interface is not a sub interface of collection.

Collection mainly has two interfaces: list and set

  • List: an ordered container (the order in which elements are stored in the collection is consistent with the order in which they are taken out). Elements can be repeated, multiple null elements can be inserted, and all elements have indexes. Common implementation classes are ArrayList, LinkedList and vector.
  • Set: an unordered (the order of storage and retrieval may be inconsistent) container. Duplicate elements cannot be stored. Only one null element is allowed to be stored. Element uniqueness must be guaranteed. The common implementation classes of the set interface are HashSet, linkedhashset and TreeSet.

A map is a collection of key value pairs that stores mappings between keys, values, and. Key is unordered and unique; Value does not require order and can be repeated. Map does not inherit from the collection interface. When retrieving elements from the map collection, as long as the key object is given, the corresponding value object will be returned.

Common implementation classes of map: HashMap, treemap, hashtable, LinkedHashMap and concurrenthashmap

Collection framework underlying data structure

Collection

  1. List

    ArrayList: object array

    Vector: object array

    LinkedList: bidirectional circular linked list

  2. Set

    HashSet (unordered, unique): it is implemented based on HashMap, and the bottom layer uses HashMap to save elements
    Linkedhashset: linkedhashset inherits from HashSet and is internally implemented through LinkedHashMap. It is a bit similar to the LinkedHashMap we mentioned earlier. Its internal implementation is based on HashMap, but there is still a little difference.
    TreeSet (ordered, unique): red black tree (self balanced sorted binary tree.)

Map

  • HashMap: before JDK1.8, HashMap was composed of array + linked list. The array is the main body of HashMap, and the linked list is mainly used to solve hash conflicts (“zipper method” to solve conflicts). After JDK1.8, great changes have been made in solving hash conflicts. When the length of the linked list is greater than the threshold value (8 by default), the linked list is transformed into a red black tree to reduce search time
  • LinkedHashMap: LinkedHashMap inherits from HashMap, so its bottom layer is still based on zipper hash structure, that is, it is composed of array, linked list or red black tree. In addition, LinkedHashMap adds a two-way linked list on the basis of the above structure, so that the above structure can maintain the insertion order of key value pairs. At the same time, through the corresponding operation of the linked list, the access order related logic is realized.
  • Hashtable: it is composed of array + linked list. Array is the main body of HashMap, and linked list mainly exists to solve hash conflicts
  • Treemap: red black tree (self balanced sorted binary tree)

Which collection classes are thread safe?

  • Vector: there is more synchronization mechanism (thread safety) than ArrayList. Because of its low efficiency, it is not recommended to use it now. In web applications, especially foreground pages, efficiency (page response speed) is often the priority.
  • Statck: Stack class, first in, last out.
  • Hashtable: more thread safety than HashMap.
  • Enumeration: enumeration, equivalent to iterator.

The fast failure mechanism of Java collection “fail fast”?

It is an error detection mechanism of Java collection. When multiple threads change the structure of the collection, it may produce a fail fast mechanism.

For example, if there are two threads (thread 1 and thread 2), thread 1 traverses the elements in set a through the iterator, and thread 2 modifies the structure of set a at some time (it is the modification above the structure, rather than simply modifying the content of the set elements), then this time program will throw a concurrentmodificationexception exception, resulting in a fail fast mechanism.

Reason: the iterator directly accesses the contents of the collection during traversal, and uses a modcount variable during traversal. If the contents of the collection change during traversal, the value of modcount will change. Whenever the iterator uses hashnext() / next() to traverse the next element, it will detect whether the modcount variable is the expectedmodcount value. If so, it will return the traversal; Otherwise, an exception is thrown and the traversal is terminated.

terms of settlement:

  1. In the traversal process, all places involving changing the modcount value are added with synchronized.
  2. Replace the ArrayList with copyonwritearraylist

How to ensure that a set cannot be modified?

You can use the collections. Unmodifiablecollection (collection C) method to create a read-only collection, so that any operation that changes the collection will throw a Java. Lang. unsupported operation exception.

The example code is as follows:

List<String> list = new ArrayList<>();
list. add("x");
Collection<String> clist = Collections. unmodifiableCollection(list);
clist. add("y"); //  An error is reported on this line during operation
System. out. println(list. size());

Collection interface

List interface

What is the iterator?

The iterator interface provides an interface to traverse any collection. We can use the iterator method to get the iterator instance from a collection. Instead of enumeration in the Java collection framework, iterators allow callers to remove elements during iterations.

How does the iterator work? What are the characteristics?

The code used by iterator is as follows:

List<String> list = new ArrayList<>();
Iterator<String> it = list. iterator();
while(it. hasNext()){
  String obj = it. next();
  System. out. println(obj);
}

Iterator is characterized by only one-way traversal, but it is more secure because it can ensure that a concurrentmodificationexception will be thrown when the currently traversed collection elements are changed.

How to remove elements from a collection while traversing?

The only correct way to modify the collection while traversing is to use the iterator. Remove() method, as follows:

Iterator<Integer> it = list.iterator();
while(it.hasNext()){
   *// do something*
   it.remove();
}

One of the most commonerrorThe code is as follows:

for(Integer i : list){
   list.remove(i)
}

Running the above error code will report a concurrentmodificationexception. This is because when using the foreach (for (integer I: list)) statement, an iterator will be automatically generated to traverse the list, but at the same time, the list is being modified by iterator. Remove(). Java generally does not allow one thread to modify a collection while another thread is traversing it.

What is the difference between iterator and listiterator?

  • The iterator can traverse set and list collections, while the listiterator can only traverse list.
  • The iterator can only traverse in one direction, while the listiterator can traverse in both directions (forward / backward).
  • Listiterator implements the iterator interface, and then adds some additional functions, such as adding an element, replacing an element, and getting the index position of the previous or subsequent elements.

What are the different ways to traverse a list? What is the implementation principle of each method? What are the best practices for list traversal in Java?

There are several traversal methods:

  1. For loop traversal, based on counters. Maintain a counter outside the collection, then read the elements at each position in turn, and stop when the last element is read.
  2. Iterator traversal, iterator. Iterator is an object-oriented design pattern. Its purpose is to shield the characteristics of different data sets and uniformly traverse the interface of the set. Java supports the iterator pattern in collections.
  3. Foreach loop traversal. The internal implementation of foreach is also implemented in the way of iterator, and there is no need to explicitly declare iterator or counter. The advantage is that the code is concise and not easy to make mistakes; The disadvantage is that you can only do simple traversal, and you can’t operate the data collection in the traversal process, such as deletion and replacement.

Best practice: a randomaccess interface is provided in the Java Collections Framework to mark whether the list implementation supports random access.

  • If a data set implements this interface, it means that it supports random access. The average time complexity of reading elements by location is O (1), such as ArrayList.
  • If the interface is not implemented, random access, such as LinkedList, is not supported.

It is recommended that lists that support random access can be traversed by a for loop, otherwise iterator foreach is recommended.

Talk about the advantages and disadvantages of ArrayList

The advantages of ArrayList are as follows:

  • The bottom layer of ArrayList is implemented in array, which is a random access mode. ArrayList implements the randomaccess interface, so the search time is very fast.
  • ArrayList is very convenient when adding an element in sequence

The disadvantages of ArrayList are as follows:

  • When deleting an element, you need to copy the element once. If there are many elements to be copied, it will cost more performance.
  • When inserting an element, you also need to copy the element. The disadvantages are the same as above.

ArrayList is more suitable for scenes with sequential addition and random access.

How to realize the conversion between array and list?

  • Array to list: use arrays. Aslist (array) for conversion.
  • List to array: use the toArray () method of list.

Code example:

// list to array
List<String> list = new ArrayList<String>();
list.add("123");
list.add("456");
list.toArray();

// array to list
String[] array = new String[]{"123","456"};
Arrays.asList(array);

What is the difference between ArrayList and LinkedList?

  • Data structure implementation: ArrayList is the data structure implementation of dynamic array, and LinkedList is the data structure implementation of bidirectional linked list.
  • Random access efficiency: ArrayList is more efficient than LinkedList in random access. Because LinkedList is a linear data storage method, you need to move the pointer to find it from front to back.
  • Addition and deletion efficiency: LinkedList is more efficient than ArrayList in non head and tail addition and deletion operations, because ArrayList addition and deletion operations affect the subscripts of other data in the array
  • Memory space occupation: LinkedList occupies more memory than ArrayList, because in addition to storing data, LinkedList nodes also store two references, one to the previous element and the other to the latter element.
  • Thread safety: both ArrayList and LinkedList are asynchronous, that is, thread safety is not guaranteed;

Generally speaking, ArrayList is more recommended when the elements in the collection need to be read frequently, and LinkedList is more recommended when there are many insert and delete operations.

Supplement: bidirectional linked list based on data structure

Bidirectional linked list, also known as double linked list, is a kind of linked list. There are two pointers in each data node, pointing to the direct successor and direct precursor respectively. Therefore, starting from any node in the two-way linked list, you can easily access its predecessor node and successor node.
Generally speaking, ArrayList is more recommended when the elements in the collection need to be read frequently, and LinkedList is more recommended when there are many insert and delete operations.

Supplement: bidirectional linked list based on data structure

Bidirectional linked list, also known as double linked list, is a kind of linked list. There are two pointers in each data node, pointing to the direct successor and direct precursor respectively. Therefore, starting from any node in the two-way linked list, you can easily access its predecessor node and successor node.

What is the difference between ArrayList and vector?

  • Both classes implement the list interface (the list interface inherits the collection interface), and they are ordered collections
  • Thread safety: vector uses synchronized to realize thread synchronization, which is thread safe, while ArrayList is non thread safe.
    Performance: ArrayList is better than vector in performance.
  • Capacity expansion: both ArrayList and vector will dynamically adjust the capacity according to the actual needs, but the capacity expansion of vector will double each time, while ArrayList will only increase by 50%.

All methods of the vector class are synchronized. Two threads can safely access a vector object, but if one thread accesses a vector, the code will spend a lot of time on synchronous operation.

ArrayList is not synchronous, so it is recommended to use ArrayList when thread safety is not required.

Who is faster in ArrayList, LinkedList and vector when inserting data? Describe the storage performance and characteristics of ArrayList, vector and LinkedList?

The underlying implementations of ArrayList, LinkedList and vector use array to store data. The number of array elements is larger than the actual stored data in order to add and insert elements. They all allow elements to be indexed directly by sequence number, but inserting elements involves memory operations such as array element movement, so indexing data is fast and inserting data is slow.

The method in vector is modified by synchronized, soVector is a thread safe container, but its performance is worse than ArrayList.

LinkedList uses a two-way linked list to store data. Indexing data by serial number requires forward or backward traversal, but only the front and rear items of the current item need to be recorded when inserting data, soLinkedList inserts faster.

How to use ArrayList in a multithreaded scenario?

ArrayList is not thread safe. If you encounter a multi-threaded scenario, you can convert it into a thread safe container through the synchronizedlist method of collections and then use it. For example, as follows:

List<String> synchronizedList = Collections.synchronizedList(list);
synchronizedList.add("aaa");
synchronizedList.add("bbb");

for (int i = 0; i < synchronizedList.size(); i++) {
    System.out.println(synchronizedList.get(i));
}

Why is the elementdata of ArrayList decorated with transient?

The array in ArrayList is defined as follows:

private transient Object[] elementData;

Let’s take another look at the definition of ArrayList:

public class ArrayList<E> extends AbstractList<E>
     implements List<E>, RandomAccess, Cloneable, java.io.Serializable

You can see that ArrayList implements the serializable interface, which means that ArrayList supports serialization. Transient does not want the elementdata array to be serialized. It rewrites the writeobject implementation:

private void writeObject(java.io.ObjectOutputStream s) throws java.io.IOException{
    *// Write out element count, and any hidden stuff*
        int expectedModCount = modCount;
    s.defaultWriteObject();
    *// Write out array length*
        s.writeInt(elementData.length);
    *// Write out all elements in the proper order.*
        for (int i=0; i<size; i++)
            s.writeObject(elementData[i]);
    if (modCount != expectedModCount) {
        throw new ConcurrentModificationException();
}

During each serialization, first call the defaultwriteobject () method to serialize the non transient elements in the ArrayList, and then traverse the elementdata to serialize only the stored elements, which not only speeds up the serialization speed, but also reduces the file size after serialization.

The difference between list and set

List and set inherit from the collection interface

List features: an ordered container (the order in which elements are stored in the collection is consistent with the order in which they are taken out). Elements can be repeated, multiple null elements can be inserted, and all elements have indexes. Common implementation classes are ArrayList, LinkedList and vector.

Set features: an unordered (the order of storage and retrieval may be inconsistent) container can not store duplicate elements. Only one null element is allowed to be stored. The uniqueness of the element must be guaranteed. The common implementation classes of the set interface are HashSet, linkedhashset and TreeSet.

In addition, list supports for loops, that is, traversal through subscripts or iterators, but set can only use iterations because it is out of order and cannot use subscripts to obtain the desired value.

Comparison between set and list

Set: the efficiency of retrieving elements is low, and the efficiency of deletion and insertion is high. Insertion and deletion will not change the position of elements.
List: similar to array, list can grow dynamically, with high efficiency in finding elements and low efficiency in inserting and deleting elements, because it will change the position of other elements

Set interface

Tell me about the implementation principle of HashSet?

HashSet is implemented based on HashMap. The value of HashSet is stored on the key of HashMap, and the value of HashMap is unified as present. Therefore, the implementation of HashSet is relatively simple. The operations related to HashSet are basically completed by directly calling the relevant methods of the underlying HashMap. HashSet does not allow duplicate values.

How does HashSet check for duplicates? How does HashSet ensure that data cannot be repeated?

When adding a () element to a HashSet, the basis for judging whether the element exists is not only to compare the hash value, but also to compare it in combination with the equles method.
The add () method in the HashSet uses the put () method of the HashMap.

The key of HashMap is unique. It can be seen from the source code that the value added by HashSet is the key of HashMap. If the K / V in HashMap is the same, the old V will be overwritten with the new V, and then the old V will be returned. Therefore, it will not be repeated (HashMap compares whether keys are equal by comparing hashcode first and then equals).

The following is part of the HashSet source code:

private static final Object PRESENT = new Object();
private transient HashMap<E,Object> map;

public HashSet() {
    map = new HashMap<>();
}

public boolean add(E e) {
    //Call the put method of HashMap. Present is a virtual value that is the same from beginning to end
    return map.put(e, PRESENT)==null;
}

Hashcode() and equals()

  1. If two objects are equal, the hashcode must be the same
  2. If two objects are equal, return true for two equals methods
  3. Two objects have the same hashcode value, and they are not necessarily equal
  4. To sum up, if the equals method is overridden, the hashcode method must also be overridden
  5. The default behavior of hashcode () is to generate unique values for objects on the heap. If hashcode () is not overridden, the two objects of the class will not be equal in any case (even if they point to the same data).

==Difference from equals

  1. ==Determines whether two variables or instances point to the same memory space. Equals determines whether the values of the memory space pointed to by two variables or instances are the same
  2. ==It refers to the comparison of memory addresses. Equals() refers to the comparison of the contents of strings. 3. = = refers to whether the references are the same. Equals() refers to whether the values are the same

Differences between HashSet and HashMap

Java collection container interview questions (2021 latest version)

Queue

What is BlockingQueue?

Java.util.concurrent.blockingqueue is a queue. When retrieving or removing an element, it will wait for the queue to become non empty; When an element is added, it waits for free space in the queue. The BlockingQueue interface is a part of the Java collection framework and is mainly used to implement the producer consumer pattern. We don’t need to worry about waiting for the producer to have available space or the consumer to have available objects, because it is processed in the BlockingQueue implementation class. Java provides the implementation of centralized BlockingQueue, such as arrayblockingqueue, linkedblockingqueue, priorityblockingqueue, synchronous queue, etc.

What is the difference between poll () and remove () in the queue?

  • Same point: both return the first element and delete the returned object in the queue.
  • Difference: if there is no element, poll () will return null, while remove () will directly throw NoSuchElementException.

Code example:

Queue<String> queue = new LinkedList<String>();
queue. offer("string"); // add
System. out. println(queue. poll());
System. out. println(queue. remove());
System. out. println(queue. size());

Map interface

Tell me about the implementation principle of HashMap?

HashMap overview: HashMap is an asynchronous implementation of map interface based on hash table. This implementation provides all optional mapping operations and allows the use of null values and null keys. This class does not guarantee the order of mapping, especially it does not guarantee that the order is constant.

HashMap data structure: in the Java programming language, there are two basic structures, one is array and the other is analog pointer (Reference). All data structures can be constructed with these two basic structures, and HashMap is no exception. HashMap is actually a “linked list hash” data structure, that is, the combination of array and linked list.

HashMap is implemented based on hash algorithm

  1. When we put elements into the HashMap, we use the hashcode of the key to re hash the subscripts of the elements of the current object in the array
  2. During storage, if a key with the same hash value appears, there are two situations. (1) If the keys are the same, the original value is overwritten; (2) If the keys are different (conflict occurs), put the current key value into the linked list
  3. When obtaining, directly find the subscript corresponding to the hash value, and further judge whether the keys are the same, so as to find the corresponding value.
  4. After understanding the above process, it is not difficult to understand how HashMap solves the problem of hash conflict. The core is to use the storage method of array, and then put the object of the conflicting key into the linked list. Once a conflict is found, it will be further compared in the linked list.

It should be noted that the implementation of HashMap is optimized in JDK 1.8. When there are more than eight node data in the linked list, the linked list will be transformed into a red black tree to improve the query efficiency, from the original o (n) to o (logn)

What are the differences between HashMap in JDK 1.7 and JDK 1.8? The underlying implementation of HashMap

In Java, there are two simple data structures for saving data: array and linked list.The characteristics of array are: easy addressing, difficult insertion and deletion; The characteristics of linked list are: it is difficult to address, but easy to insert and delete*Therefore, we combine arrays and linked lists to give full play to their respective advantages and use a method called*Zipper methodCan resolve hash conflicts.

Before JDK1.8

The zipper method was used before JDK1.8.Zipper method: combine linked list and array. In other words, create a linked list array, and each cell in the array is a linked list. If hash conflicts are encountered, the conflicting values can be added to the linked list.

Java collection container interview questions (2021 latest version)

After JDK1.8

Compared with the previous version, JDK1.8 has made great changes in resolving hash conflicts. When the length of the linked list is greater than the threshold (the default is 8), the linked list is transformed into a red black tree to reduce the search time.

Java collection container interview questions (2021 latest version)

Jdk1.7 vs JDK1.8 comparison

JDK1.8 mainly solves or optimizes the following problems:

  1. Resize capacity expansion optimization
  2. The red black tree is introduced to avoid the impact of a single linked list on query efficiency. Please refer to the red black tree algorithm
  3. It solves the problem of multithreaded dead loop, but it is still non thread safe. Multithreading may cause data loss.

Java collection container interview questions (2021 latest version)

What is the specific flow of the put method of HashMap?

When we put, we first calculate the hash value of the key. Here we call the hash method. The hash method actually allows key. Hashcode() and key. Hashcode() > > > 16 to perform XOR operation. The high 16bit complements 0, and a number and 0 XOR remain unchanged. Therefore, the approximate function of the hash function is: the high 16bit remains unchanged, and the low 16bit and high 16bit make an XOR to reduce collision. According to the function note, because the bucket array size is a power of 2, calculate the subscript index = (table. Length – 1) & hash. If hash processing is not performed, it is equivalent to that only a few low bit bits are effective for hash. In order to reduce hash collision, the designer uses high 16bit and low 16bit XOR to reduce collision after considering speed, function and quality, Moreover, jdk8 uses the tree structure of complexity O (logn) to improve the performance under collision.

Putval method execution flowchart

Java collection container interview questions (2021 latest version)

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

//Implement map.put and related methods
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    //Step ①: if tab is empty, create 
    //The table is uninitialized or has a length of 0. Capacity expansion is required
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //Step ②: calculate the index and handle null  
    //(n - 1) & hash determines the bucket in which the elements are stored. The bucket is empty, and the newly generated node is placed in the bucket (at this time, the node is placed in the array)
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    //Element already exists in bucket
    else {
        Node<K,V> e; K k;
        //Step ③: the node key exists and directly overwrites the value 
        //The hash value of the first element (node in the array) in the comparison bucket is equal, and the key is equal
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
                //Assign the first element to e and record it with E
                e = p;
        //Step ④: judge that the chain is a red black tree 
        //Hash values are not equal, that is, keys are not equal; Red black tree node
        //If the current element type is treenode, it represents a red black tree, puttreeval returns the node to be stored, and E may be null
        else if (p instanceof TreeNode)
            //Put it in the tree
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        //Step ⑤: the chain is a linked list 
        //Is a linked list node
        else {
            //Insert a node at the end of the linked list
            for (int binCount = 0; ; ++binCount) {
                //Reach the end of the linked list
                
                //Judge whether the pointer at the end of the linked list is empty
                if ((e = p.next) == null) {
                    //Insert a new node at the end
                    p.next = newNode(hash, key, value, null);
                    //Judge whether the length of the linked list reaches the critical value of transforming red black tree, and the critical value is 8
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        //Linked list structure to tree structure
                        treeifyBin(tab, hash);
                    //Jump out of loop
                    break;
                }
                //Judge whether the key value of the node in the linked list is equal to the key value of the inserted element
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    //Equal, jump out of loop
                    break;
                //Used to traverse the linked list in the bucket. Combined with the previous e = p.next, you can traverse the linked list
                p = e;
            }
        }
        //If it is judged that the current key already exists, the new value will be returned when the same hash value and key value are added
        if (e != null) { 
            //Record the value of E
            V oldValue = e.value;
            //Onlyifabsent is false or the old value is null
            if (!onlyIfAbsent || oldValue == null)
                //Replace old value with new value
                e.value = value;
            //Post access callback
            afterNodeAccess(e);
            //Return old value
            return oldValue;
        }
    }
    //Structural modification
    ++modCount;
    //Step ⑥: expand the capacity if the maximum capacity is exceeded 
    //If the actual size is greater than the threshold, the capacity will be expanded
    if (++size > threshold)
        resize();
    //Post insert callback
    afterNodeInsertion(evict);
    return null;
}

① . judge whether the key value is empty or null for the array table [i]. Otherwise, execute resize() to expand the capacity;

② . calculate the hash value to the inserted array index I according to the key value key. If table [i] = = null, directly create a new node and add it, and turn to ⑥. If table [i] is not empty, turn to ③;

③ . judge whether the first element of table [i] is the same as key. If it is the same, directly overwrite value, otherwise turn to ④. The same here refers to hashcode and equals;

④ . judge whether table [i] is a treenode, that is, whether table [i] is a red black tree. If it is a red black tree, directly insert key value pairs into the tree, otherwise turn to ⑤;

⑤ . traverse the table [i] to determine whether the length of the linked list is greater than 8. If it is greater than 8, convert the linked list into a red black tree and perform the insertion operation in the red black tree. Otherwise, perform the insertion operation of the linked list; If the key is found to exist during traversal, you can directly overwrite the value;

⑥ . after successful insertion, judge whether the actual number of key value pairs exceeds the maximum capacity threshold. If so, expand the capacity.

How is HashMap capacity expansion implemented?

① In JDK1.8, the resize method calls the resize method to expand the capacity when the key value pair in HashMap is greater than the threshold value or during initialization;

② . each time it is expanded, it is expanded twice;

③ . after expansion, the position of the node object is either in the original position or moved to a position twice the original offset.

In putval(), we see that the resize() method is used twice in this function. The resize() method indicates that the array will be expanded during the first initialization, or when the actual size of the array is greater than its critical value (12 for the first time), the elements on the bucket will be redistributed during the expansion, This is also an optimization in JDK1.8. In 1.7, after capacity expansion, it is necessary to recalculate its hash value and distribute it according to the hash value. However, in 1.8, it is determined whether (e.hash & oldcap) is 0 according to the position of the same bucket. After re hash allocation, the position of the element either stays in the original position, Either move to the original position + the increased array size

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;// Oldtab points to the hash bucket array
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    If (oldcap > 0) {// if oldcap is not empty, the hash bucket array is not empty
        If (oldcap > = maximum_capacity) {// if it is greater than the maximum capacity, it is assigned as the threshold value of the maximum integer
            threshold = Integer.MAX_VALUE;
            return oldTab;// return
        }//If the length of the current hash bucket array is still less than the maximum capacity after capacity expansion, and the oldcap is greater than the default value of 16
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; //  Double threshold double expansion threshold
    }
    //The old capacity is 0, but the threshold is greater than zero, which means that the parametric structure has cap input, and the threshold has been initialized to the N-power of minimum 2
    //Assign this value directly to the new capacity
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    //The map created by parameterless construction gives the default capacity and threshold 16, 16 * 0.75
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    //New threshold = new cap * 0.75
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    //Calculate the new array length and assign it to the current member variable table
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];// New hash bucket array
    table = newTab;// Copy the values of the new array to the old hash bucket array
    //If the original array is not initialized, the initialization of resize ends here. Otherwise, enter the expansion element rearrangement logic to make it evenly dispersed
    if (oldTab != null) {
        //Traverses all bucket subscripts of the new array
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                //The bucket subscript of the old array is assigned to the temporary variable E, and the reference in the old array is released, otherwise the array cannot be recycled by GC
                oldTab[j] = null;
                //If e.next = = null, it means that there is only one element in the bucket, and there is no linked list or red black tree
                if (e.next == null)
                    //Use the same hash mapping algorithm to add the element to the new array
                    newTab[e.hash & (newCap - 1)] = e;
                //If e is a treenode and e.next= Null, the rearrangement of elements in the tree is processed
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                //E is the head of the linked list and e.next= Null, then handle the rearrangement of elements in the linked list
                else { // preserve order
                    //Lohead and lotail represent that the subscript does not need to be changed after capacity expansion. See Note 1
                    Node<K,V> loHead = null, loTail = null;
                    //Hihead and hitail represent the subscript after capacity expansion. See Note 1
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    //Traversal linked list
                    do {             
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                //The initialization head points to the current element E of the linked list. E is not necessarily the first element of the linked list. After initialization, lohead
                                //Represents the header element of the linked list whose subscript remains unchanged
                                loHead = e;
                            else                                
                                //Lotail.next points to the current e
                                loTail.next = e;
                            //Lotail points to the current element E
                            //After initialization, lotail and lohead point to the same memory, so when lotail.next points to the next element,
                            //The next reference of the elements in the underlying array also changes accordingly, resulting in lowhead. Next. Next
                            //Follow lotail synchronization so that lowhead can link to all elements belonging to the linked list.
                            loTail = e;                           
                        }
                        else {
                            if (hiTail == null)
                                //The initialization head points to the current element E of the linked list. After initialization, hihead represents the linked list head element with subscript changed
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    //After traversal, point tail to null, and put the chain header into the corresponding subscript of the new array to form a new mapping.
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

How does HashMap resolve hash conflicts?

A: before solving this problem, we first need to knowWhat is a hash conflict, and before we know about hash conflicts, we need to knowWhat is hashOnly then;

What is hash?

Hash is generally translated as “hash” or directly transliterated as “hash”, which is to transform the input of any length into the output of fixed length through the hash algorithm, and the output is the hash value (hash value); This transformation is a compression mapping, that is, the hash value space is usually much smaller than the input space. Different inputs may be hashed into the same output, so it is impossible to uniquely determine the input value from the hash value. In short, it is a function that compresses a message of any length to a message digest of a fixed length.

All hash functions have the following basic characteristics: if the hash value calculated by the same hash function is different, the input value must be different. However, if the hash values calculated by the same hash function are the same, the input values are not necessarily the same

What is a hash conflict?

When two different input values calculate the same hash value according to the same hash function, we call it collision (hash collision)

Data structure of HashMap

In Java, there are two simple data structures for saving data: array and linked list.The characteristics of array are: easy addressing, difficult insertion and deletion; The characteristic of linked list is that it is difficult to address, but easy to insert and delete; Therefore, we combine arrays and linked lists to give full play to their respective advantages and use a method calledChain address methodTo resolve hash conflicts:

Java collection container interview questions (2021 latest version)

In this way, we can organize objects with the same hash value into a linked list under the bucket corresponding to the hash value, but compared with the int type returned by hashcode, the initial capacity of HashMap is default_ INITIAL_ Capability = 1 < < 4 (i.e. the fourth power of 2 16) is much smaller than the range of int type, so if we simply use hashcode remainder to obtain the corresponding bucket, it will greatly increase the probability of hash collision, and in the worst case, the HashMap will become a single linked list, so we also need to optimize the hashcode

Hash() function

The problem mentioned above is mainly because if hashcode remainder is used, only the low bit of hashcode is involved in the operation, and the high bit does not play any role. Therefore, our idea is to let the high bit of hashcode also participate in the operation, so as to further reduce the probability of hash collision and make the data distribution more even. We call this operation disturbance, The hash() function in JDK 1.8 is as follows:

static final int hash(Object key) {
    int h;
    return (key == null) ?  0 : (h = key.hashCode()) ^ (h >>> 16);//  Exclusive or operation with 16 bits shifted to the right (high and low XOR)
}

This is better than inJDK 1.7More concise,Compared with 4 bit operations in 1.7 and 5 exclusive or operations (9 disturbances), only 1 bit operation and 1 exclusive or operation (2 disturbances) are performed in 1.8

JDK1.8 new red black tree

Java collection container interview questions (2021 latest version)

Through the above chain address method (using hash table) and perturbation function, we successfully make our data distribution more even and reduce hash collision. However, when there is a large amount of data in our HashMap, the corresponding linked list under a bucket has n elements, so the traversal time complexity is O (n). In order to solve this problem, JDK1.8 adds the data structure of red black tree in HashMap, which further reduces the traversal complexity to o (logn);

summary

Briefly summarize the methods used by HashMap to effectively solve hash conflicts:

1. Use the chain address method (hash table) to link data with the same hash value;
2. The quadratic perturbation function (hash function) is used to reduce the probability of hash conflict and make the data distribution more even;
3. The red black tree is introduced to further reduce the time complexity of traversal and make the traversal faster;

Can any class be used as the key of map?

You can use any class as the key of map. However, before using it, you need to consider the following points:

  • If your class overrides the equals () method, you should also override the hashcode () method.
  • All instances of the class need to follow the rules related to equals () and hashcode ().
  • If a class does not use equals (), it should not be used in hashcode ().
  • The best practice for users to customize the key class is to make it immutable, so that the hashcode () value can be cached and has better performance. Immutable classes can also ensure that hashcode () and equals () will not change in the future, which will solve the problems related to mutability.

Why are wrapper classes such as string and integer in HashMap suitable for K?

A: the characteristics of string, integer and other packaging classes can ensure the immutability and calculation accuracy of hash value, and can effectively reduce the probability of hash collision

  1. They are all final types, that is, they are immutable, which ensures that the key is immutable, and there will be no different access to hash values
  2. Internal has been overriddenequals()hashCode()And other methods comply with the internal specifications of HashMap (if you are not clear, you can go to the above to see the putvalue process), and it is not easy to make hash value calculation errors;

What should I do if I use object as the key of HashMap?

Answer: RewritehashCode()andequals()method

  1. The reason for rewriting hashcode () is that it is necessary to calculate the storage location of stored data. Be careful not to try to exclude the key parts of an object from hash code calculation to improve performance. Although it can be faster, it may lead to more hash collisions;
  2. To rewrite the equals () method, you need to abide by the reflexivity, symmetry, transitivity, consistency, and the characteristics that X. equals (null) must return false for any non null reference value x, in order to ensure the uniqueness of the key in the hash table;

Why doesn’t HashMap directly use the hash value processed by hashcode () as the subscript of table?

A: the hashcode () method returns an int integer type. Its range is – (2 ^ 31) ~ (2 ^ 31 – 1). There are about 4 billion mapping spaces, and the capacity range of HashMap is 16 (initialization default value) ~ 2 ^ 30. Generally, HashMap cannot get the maximum value, and it is difficult to provide so much storage space on the device, resulting in hashcode () The calculated hash value may not be within the size range of the array, so it cannot match the storage location;

How to solve it?

  1. HashMap implements its own hash () method. Through two perturbations, it makes its own high and low hash values perform exclusive or operation by itself, which reduces the hash collision probability and makes the data distribution more even;
  2. When ensuring that the array length is to the power of 2, use the value after hash () operation and operation (&) (array length – 1) to obtain the array subscript for storage, which is more efficient than the remainder operation. Secondly, it is also because H & (length-1) is equivalent to h% length only when the array length is to the power of 2. Thirdly, it is solved “The hash value does not match the array size range”;

Why is the length of HashMap to the power of 2

In order to make HashMap access efficient and minimize collisions, that is, try to distribute the data evenly, and the length of each linked list / red black tree is roughly the same. This implementation is the algorithm of which linked list / red black tree to store the data.

How should this algorithm be designed?

We may first think of using% remainder operation. However, the key point is: “in the remainder (%) operation, if the divisor is a power of 2, it is equivalent to the and (&) operation minus one by its divisor (that is, the premise of hash% length = = hash & (length-1) is that length is the nth power of 2;).” and the binary bit operation &, relative to% can improve the operation efficiency, which explains why the length of HashMap is a power of 2.

So why two disturbances?

A: This is to increase the randomness of the low bit of the hash value to make the distribution more uniform, so as to improve the randomness & uniformity of the storage subscript position of the corresponding array, and finally reduce the hash conflict. Two times is enough. The purpose of high bit and low bit participating in the operation at the same time has been achieved;

What is the difference between HashMap and hashtable?

  1. Thread safety: HashMap is non thread safe, and hashtable is thread safe; The internal methods of hashtable are basically modified by synchronized. (if you want to ensure thread safety, use concurrent HashMap!);
  2. Efficiency: because of thread safety, HashMap is a little more efficient than hashtable. In addition, hashtable is basically eliminated and should not be used in code;
  3. Support for null key and null value: in HashMap, null can be used as a key. There is only one such key, and the value corresponding to one or more keys can be null. However, as long as the key value put in the hashtable has a null, NullPointerException is thrown directly.
  4. Difference between initial capacity and each expansion capacity: ① if the initial capacity value is not specified during creation, the default initial size of hashtable is 11. After each expansion, the capacity becomes the original 2n + 1. The default initialization size of HashMap is 16. After each expansion, the capacity becomes twice the original. ② If the initial capacity value is given during creation, the hashtable will directly use the size you give, and the HashMap will expand it to the power of 2. That is, HashMap always uses the power of 2 as the size of the hash table. Later, we will explain why it is the power of 2.
  5. Underlying data structure: HashMap after JDK1.8 has made great changes in resolving hash conflicts. When the length of the linked list is greater than the threshold (8 by default), the linked list is transformed into a red black tree to reduce the search time. Hashtable does not have such a mechanism.
  6. Recommended use: it can be seen from the class annotation of hashtable that hashtable is a reserved class and is not recommended. It is recommended to use HashMap instead in a single threaded environment. If multithreading is required, use concurrenthashmap instead.

How do I decide whether to use HashMap or treemap?

For operations such as inserting, deleting, and locating elements in a map, HashMap is the best choice. However, if you need to traverse an ordered set of keys, treemap is a better choice. Based on the size of your collection, it may be faster to add elements to the HashMap. Replace the map with a treemap for orderly key traversal.

Differences between HashMap and concurrenthashmap

  1. Concurrenthashmap divides the whole bucket array into segments, and then protects each segment with a lock lock. Compared with the synchronized lock of hashtable, the granularity of the lock is finer and the concurrency performance is better. However, HashMap has no lock mechanism and is not thread safe. (after JDK1.8, concurrenthashmap enables a new way to implement it, using CAS algorithm.)
  2. Null is allowed for key value pairs of HashMap, but not for concurrent HashMap.

What is the difference between concurrent HashMap and hashtable?

The difference between concurrent HashMap and hashtable is mainly reflected in the way of implementing thread safety.

  1. Underlying data structure: the underlying layer of concurrenthashmap in jdk1.7 is implemented by segmented array + linked list. The data structure adopted in JDK1.8 is the same as that of hashmap1.8, array + linked list / Red Black binary tree. The underlying data structure of hashtable is similar to that of HashMap before JDK1.8. Both of them adopt the form of array + linked list. Array is the main body of HashMap, and linked list mainly exists to solve hash conflict;
  2. Methods to realize thread safety (important): ① in jdk1.7, the concurrence HashMap (segment lock) divides the entire bucket array. Each lock locks only part of the data in the container. If multiple threads access the data of different data segments in the container, there will be no lock competition and improve the concurrent access rate. (16 segments are allocated by default, which is 16 times more efficient than hashtable.) by JDK1.8, the concept of segment has been abandoned, but the data structure of node array + linked list + red black tree is directly used to realize concurrency control, and synchronized and CAS are used to operate. (many optimizations have been made for synchronized locks since JDK 1.6). The whole looks like an optimized and thread safe HashMap. Although the data structure of segment can be seen in JDK 1.8, the attributes have been simplified just to be compatible with the old version; ② Hashtable (same lock): using synchronized to ensure thread safety is very inefficient. When one thread accesses the synchronization method, other threads also access the synchronization method, which may enter the blocking or polling state. For example, if put is used to add elements, the other thread cannot use put to add elements or get. The competition will become more and more fierce and the efficiency will be lower.

Comparison of the two

HashTable:

Java collection container interview questions (2021 latest version)

Concurrent HashMap of jdk1.7:

Java collection container interview questions (2021 latest version)

Concurrent HashMap of JDK1.8 (treebin: Red Black binary tree node: linked list node):

Java collection container interview questions (2021 latest version)

A: concurrent HashMap combines the advantages of HashMap and hashtable. HashMap does not consider synchronization, and hashtable considers synchronization. However, hashtable must lock the entire structure every time it is synchronized. The method of concurrent HashMap locking is slightly fine-grained.

Do you know the underlying implementation of concurrenthashmap? What is the implementation principle?

JDK1.7

Firstly, the data is divided into sections for storage, and then each section of data is equipped with a lock. When a thread occupies the lock to access one section of data, the data of other sections can also be accessed by other threads.

In jdk1.7, concurrenthashmap is implemented in the way of segment + hashentry. The structure is as follows:

A concurrenthashmap contains a segment array. The structure of segment is similar to that of HashMap. It is an array and linked list structure. A segment contains a hashentry array. Each hashentry is an element of a linked list structure. Each segment guards the elements in a hashentry array. When modifying the data of the hashentry array, you must first obtain the lock of the corresponding segment.

Java collection container interview questions (2021 latest version)

  1. This class contains two static inner classes, hashentry and segment; The former is used to encapsulate the key value pairs of the mapping table, and the latter is used to act as a lock;
  2. Segment is a reentrant lock. Each segment guards an element in the hashentry array. When modifying the data of the hashentry array, you must first obtain the corresponding segment lock.

JDK1.8

stayIn JDK1.8, the design of segment bloated is abandoned and replaced by node + CAS + synchronized to ensure concurrent security, synchronized only locks the first node of the current linked list or red black binary tree, so that as long as the hash does not conflict, there will be no concurrency, and the efficiency will be improved by N times.

The structure is as follows:

Java collection container interview questions (2021 latest version)

Additional source code, you can see if you need it

Insert element process (it is recommended to check the source code):

If the node at the corresponding location has not been initialized, call CAS to insert the corresponding data;

else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
    if (casTabAt(tab, i, null, new Node<K,V>(hash, key, value, null)))
        break;                   // no lock when adding to empty bin
}

If the node in the corresponding position is not empty and the node is not currently in the moving state, the synchronized lock is added to the node. If the hash of the node is not less than 0, the linked list is traversed to update the node or insert a new node;

if (fh >= 0) {
    binCount = 1;
    for (Node<K,V> e = f;; ++binCount) {
        K ek;
        if (e.hash == hash &&
            ((ek = e.key) == key ||
             (ek != null && key.equals(ek)))) {
            oldVal = e.val;
            if (!onlyIfAbsent)
                e.val = value;
            break;
        }
        Node<K,V> pred = e;
        if ((e = e.next) == null) {
            pred.next = new Node<K,V>(hash, key, value, null);
            break;
        }
    }
}
  1. If the node is a treebin type node, it indicates that it is a red black tree structure, then insert the node into the red black tree through the puttreeval method; If bincount is not 0, it indicates that the put operation has an impact on the data. If the number of current linked lists reaches 8, it will be transformed into a red black tree through the treeifybin method. If oldval is not empty, it indicates that it is an update operation and has no impact on the number of elements, and the old value will be returned directly;
  2. If a new node is inserted, execute the addcount () method to try to update the number of elements basecount;

Auxiliary tools

What is the difference between array and ArrayList?

  1. Array can store basic data types and objects, while ArrayList can only store objects.
  2. Array specifies a fixed size, while ArrayList size is automatically expanded.
  3. The array built-in methods are not as many as ArrayList. For example, addall, removeAll, iteration and other methods are only available in ArrayList.
    For basic type data, the collection uses auto boxing to reduce the coding workload. However, this approach is relatively slow when dealing with fixed size basic data types.

How to implement the conversion between array and list?

  • Array to list: arrays. Aslist (array);
  • List to array: toarray() method of list.

What is the difference between comparable and comparator?

  • The comparable interface is actually from the java.lang package. It has a CompareTo (object obj) method for sorting
  • The comparator interface is actually from the Java. Util package. It has a compare (object obj1, object obj2) method for sorting

Generally, when we need to use custom sorting for a collection, we need to override the CompareTo method or compare method. When we need to implement two sorting methods for a collection, such as one sorting method for song name and singer name in a song object, We can rewrite the CompareTo method and use the self-made comparator method, or use two comparators to sort song names and star names. The second means that we can only use the two parameter version of collections. Sort()

What is the difference between collections and collections?

  • Java. Util. Collection is a collection interface (a top-level interface of a collection class). It provides general interface methods for basic operations on collection objects. The collection interface has many specific implementations in the Java class library. The meaning of the collection interface is to provide a maximized unified operation mode for various specific collections. Its direct inheritance interfaces include list and set.
  • Collections is a tool class / help class of the collection class, which provides a series of static methods for sorting, searching, thread safety and other operations on the elements in the collection

How do treemap and TreeSet compare elements when sorting? How do the sort () method in the collections tool class compare elements?

TreeSet requires that the class of the stored object must implement the comparable interface, which provides the CompareTo () method of comparing elements. When inserting elements, it will call back this method to compare the size of elements. Treemap requires that the stored key value pair and the mapped key must implement the comparable interface to sort the elements according to the key.

The sort method of the collections tool class has two overloaded forms,

The first requires that the objects stored in the passed in container to be sorted implement the comparable interface to realize the comparison of elements;

The second non mandatory requirement is that the elements in the container must be comparable, but the second parameter is required to be passed in. The parameter is a subtype of the comparator interface (the compare method needs to be rewritten to compare elements), which is equivalent to a temporarily defined sorting rule. In fact, it is an algorithm for comparing element sizes through the interface, and it is also an application of the callback mode (support for functional programming in Java).

Author: thinkwon
Source:https://blog.csdn.net/ThinkWo…

Recent hot article recommendations:
https://www.laopaojava.com/posts/38510.html