A dictionary tree

Time:2022-4-28

What is a dictionary tree

Dictionary tree is aSpace for timeThe data structure, also known as trie tree and prefix tree, is a tree structure (dictionary tree is a data structure), which is typically used for statistics, sorting and saving a large number of strings. Therefore, it is often used in text word frequency statistics by search engine system. Its advantages are: using the common prefix of string to reduce the query time, minimize unnecessary string comparison, and the query efficiency is higher than that of hash tree.

A dictionary tree

In most cases, it may be difficult for you to be intuitive or have contact experience. You may have no concept of prefix. You may also muddle through the prefix problem by violent matching. If there are few strings, you may muddle through by using hash table and other structures. However, if the character string is relatively long and there are many same prefixes, using dictionary tree can greatly reduce the use and efficiency of memory. A dictionary tree application scenario: enter some words in the search box, and there will be some god related search contents. Sometimes you are very magical. How do you do it? This is actually an idea of the dictionary tree.

A dictionary tree

For dictionary trees, there are three important properties:

1: The root node does not contain characters. Except for the root node, each node contains only one character. The root node does not contain characters. The purpose of this is to be able to include all strings.

2: From the root node to a node, the passing string is the string corresponding to the node.

3: The characters of the child nodes of each node are different, that is, the corresponding words and characters are unique.

A dictionary tree

Design and implement dictionary tree

What is a dictionary tree has been introduced above, so let’s start designing a dictionary tree!

For the dictionary tree, there may be some detailed differences in the design of different scenarios or requirements, but generally speaking, the general dictionary tree includes insertion, query (specified string) and query (prefix).

Let’s first analyze the simple case, that is, the string is all 26 lowercase letters, just rightForce button 208 realizes trie treeCan be used as an implementation template.

Implement the trie class:

  • Trie() initializes the prefix tree object.
  • Void insert (string word) inserts the string word into the prefix tree.
  • Boolean search (string word) returns true if the string word is in the prefix tree (that is, it has been inserted before retrieval); Otherwise, false is returned.
  • Boolean startswith (string prefix) returns true if one of the prefixes of the previously inserted string word is prefix; Otherwise, false is returned.

How to design this dictionary tree?

For a dictionary tree trie class, there must be a root node root, and this node type trienode also has many design methods. Here, we simply put a 26 size trienode type array corresponding to the characters’ a ‘-‘ Z ‘, and use a boolean type variable Isend to indicate whether it ends at the end of the string (if it is true).

class TrieNode {
    TrieNode son[];
    boolean isEnd;// End flag
    Public trienode() // initialization
    {
        son=new TrieNode[26];
    }
}

Using an array may consume some memory space if there are many characters, but 26 consecutive characters here are OK. If you add them to a dictionary treebig,bit,bzSo it’s actually like this:

A dictionary tree

Then analyze the specific operation:

Insert operation: traverse the string and start from the root node of the dictionary tree. Find the corresponding position of each character. First judge whether it is empty. If it is empty, you need to create a new trie. For example, insertbigThe first B time a trienode is created in the enumeration of, the same is true later. However, it is important to set Isend to true on the trienode that stops, indicating that this node is the end node of the string.

A dictionary tree

The key codes corresponding to this part are:

TrieNode root;
/**Initialize*/
public Trie() {
    root=new TrieNode();
}

/** Inserts a word into the trie. */
public void insert(String word) {
    TrieNode node=root;// Temporary nodes are used to enumerate
    for(int i=0;i<word.length(); I + +) // enumeration string
    {
        int index=word. charAt(i)-'a';// 26 corresponding positions found
        If (node. Son [index] = = null) // if it is empty, it needs to be created
        {
            node.son[index]=new TrieNode();
        }
        node=node.son[index];
    }
    node. isEnd=true;// Last node
}

Query operation: the query is established when the dictionary tree has been built. This process is somewhat similar to the query, but there is no need to create a trienode. If the enumeration process finds that the trienode has not been initialized (i.e. empty), it will return false. If it is successful, finally check whether the Isend of the node is true (whether the string ending with the changed character has been inserted). If it is true, it will return true.

It may be better to use an example here. insertbigString, if foundbaBecause the second timeaThe corresponding trienode is null or empty. If findbiFailure will also be returned because of the previously insertedbigOnly ingThe character corresponds to trienode ID Isend = true, butiThe Isend under the character is false, that is, it does not existbicharacter string.

The corresponding core code of this part is:

public boolean search(String word) {
    TrieNode node=root;
    for(int i=0;i<word.length();i++)
    {
        int index=word.charAt(i)-'a';
        If (node. Son [index] = = null) // if NULL, return false directly
        {
            return false;
        }
        node=node.son[index];
    }
    return node.isEnd==true;
}

Prefix lookup: it is similar to the query, but there are some differences. If the search fails, it returns false, but if it can go to the last step, it returns true. The above example is insertedbiglookupbiAlso return true because there is a string prefixed with it.

The corresponding core code is:

public boolean startsWith(String prefix) {
    TrieNode node=root;
    for(int i=0;i<prefix.length();i++)
    {
        int index=prefix.charAt(i)-'a';
        if(node.son[index]==null)
        {
            return false;
        }
        node=node.son[index];
    }
  //It returns true when it can be executed to the end
    return  true;
}

The above code together is the complete dictionary tree, the most basic version. The full version is:

A dictionary tree

Dictionary tree thinking

The basic class of dictionary tree is easy, but there are likely to be some extensions.

For the above 26 characters, it is easy for us to find the corresponding index in ASCII. If there are many possibilities of characters and the space may be wasted by using array, we can also use HashMap or list to store elements. If you use list, you need to enumerate in order, and you can query directly with HashMap. Here is a dictionary tree implemented by HashMap ().

Use HashMap instead of array (but hash does not have its own sorting function). In fact, the logic is the same. You only need to use HashMap to judge whether there is a corresponding key when judging. The type of HashMap is:

Map<Character,TrieNode> sonMap;

The complete code of dictionary tree implemented by HashMap is:

import java.util.HashMap;
import java.util.Map;

public  class Trie{
    class TrieNode{
        Map<Character,TrieNode> sonMap;
        boolean idEnd;
        public TrieNode()
        {
            sonMap=new HashMap<>();
        }
    }
    TrieNode root;
    public Trie()
    {
        root=new TrieNode();
    }
   
    public void insert(String word) {
        TrieNode node=root;
        for(int i=0;i<word.length();i++)
        {
            char ch=word.charAt(i);
            If (! Node. Somap. Containskey (CH)) // no insert exists
            {
                node.sonMap.put(ch,new TrieNode());
            }
            node=node.sonMap.get(ch);
        }
        node.idEnd=true;
    }
    
    public boolean search(String word) {
        TrieNode node=root;
        for(int i=0;i<word.length();i++)
        {
            char ch=word.charAt(i);
            if(!node.sonMap.containsKey(ch))
            {
                return false;
            }
            node=node.sonMap.get(ch);
        }
        return node. idEnd==true;// Must be marked as true to prove the existence of the string
    }


    public boolean startsWith(String prefix) {
        TrieNode node=root;
        for(int i=0;i<prefix.length();i++)
        {
            char ch=prefix.charAt(i);
            if(!node.sonMap.containsKey(ch))
            {
                return false;
            }
            node=node.sonMap.get(ch);
        }
        return true;// Just go to the last step
    }
}

As mentioned earlier, the dictionary tree is used for the statistics, sorting and storage of a large number of characters. In fact, sorting is to sort by array. Because the ASCII characters are orderly, they can be read according to this rule when reading. This idea is a bit like Radix sorting.

The statistics may face the statistics of quantity, which may be the statistics of the number of occurrences or the number of prefix words. If you enumerate every time, it may be a waste of time, but you can add a variable to trienode to count the number of times each time you insert. If the string is repeated, it can be added directly. If the string needs to be de duplicated, it can be determined that the insertion is successful, and then the total number of prefix words on the path will increase automatically. In this case, we need to analyze specific problems.

In addition, the dictionary tree is also used to solve the problem of seeking difference or maximum value in ACM. We call it:01 dictionary tree, you can also learn by yourself if you are interested (which may be introduced later).

summary

Through this article, you must have a better understanding of the dictionary tree. The purpose of this article is to enable readers to understand and learn the basic dictionary tree and have a preliminary understanding of other deformation optimization.

Dictionary tree can minimize unnecessary string comparison, which is used for word frequency statistics and a large number of string sorting. With its own sorting function, you can get the sorting sequence by using the middle order traversal sequence. However, if there are many characters and few prefixes, the dictionary tree has no efficiency advantage (because it needs to access nodes one by one).

There are many real applications of dictionary tree, such as string retrieval, text prediction, automatic completion, see also, spell check, word frequency statistics, sorting, the longest common prefix of string, prefix matching of string search, auxiliary structure as other data structures and algorithms, etc., which will not be introduced here.

It’s not easy to be original. Please click like, pay attention to and collect three company support, and search through wechat【bigsai】, follow me and get the content of dry goods at the first time!