Introduction:The 3rd 83 line code challenge in 2021 hosted by Alibaba cloud has ended. More than 20000 people watched, nearly 4000 people participated, and 85 teams came in groups. The competition adopts the game breakthrough playing method, integrating the elements of meta universe science fiction and script killing, so that a group of developers have a lot of fun.
The second question of the competition, known as the devil algorithm question, stopped many code heroes.
We invited Liu Lihua (Alibaba cloud efficient code platform), the author of the second question, to uncover the secrets of the system, from design to strategy, and excellent code analysis for your reference.
This competition adopts the string prefix matching algorithm. The parameter participants need to obtain the data set to be matched through OSS, and then the participants need to find the string data matching the specified prefix string. Why choose this algorithm? Recently, when I was using our own development plug-in, it can automatically provide the completion prompt of API name when I input Java API keywords in the input box of code search. Inspired by this function module, I hope to design a topic to let participants implement a prefix matching algorithm of Java API name.
Here’s an advertisement: our plug-in is Alibaba cloud intelligent coding plug-in, which has been put on the JetBrains plug-in market. You can search in IntelliJ ideaAlibaba Cloud AI Coding AssistantDownload and use. The plug-in includes code intelligent completion and code example search functions, allowing developers to complete coding more quickly and fluently.
The design difficulty of the whole competition problem lies in how to make this competition problem challenging. But it also ensures a certain customs clearance rate. There are similar algorithm problems in the external question bank, which belong to medium difficulty, so medium difficulty can improve a certain pass rate. In order to avoid contestants passing Java string Startswith and double cycle pass directly, which increases the amount of evaluation data. However, in order to control the difficulty of the game, the evaluation data set is divided into hundreds of thousands of small data sets and millions of large data sets. As long as the problem of small data sets can be solved better, we can pass the game.
The scoring system of this competition question is designed based on function calculation. The system will randomly select a small-scale data set and a large-scale data set, and run the contestant’s code in series, and evaluate the contestant’s code from the dimensions of accuracy, performance cost and memory consumption.
Introduction to competition questions
OSS data acquisition
The data set of this contest is stored in OSS, so participants need to obtain the data through the OSS SDK. Participants can learn how to use the OSS SDK through the document link in the code comments, or quickly view the sample code of the OSS SDK through the recently released Alibaba cloud intelligent coding plug-in (cosy).
As shown in the figure above, if participants want to know how to obtain OSS object data, they can right-click “view code examples” on the API, and the intelligent coding plug-in can quickly find out the code examples related to OSS data acquisition. Participants only need to selectively copy and modify some codes.
In addition, developers can also quickly write the code to obtain OSS data by selecting the code completion result of the whole line through the code intelligent completion function, as shown in the figure below.
The problem-solving methods of the competition are relatively diversified. The methods adopted by the contestants are mainly divided into the following types.
The first is to realize the algorithm and data structure by yourself
Trie tree, a relatively basic data structure, is mentioned in the competition strategy. A better trie tree can also pass the competition. However, the performance overhead and memory consumption of trie tree are also relatively large. Many variants of trie tree, such as double array trie, can be considered. Double array trie is used to save space, such as radix tree and its many variants. The memory occupation of trie tree is compressed by reducing the depth of tree. In order to compare the effects of various trie tree implementations, reference is made here《MergedTrie: Efficient textual indexing》Relevant evaluation data of the paper.
Many contestants also got high scores by reducing the number of double-layer loops and reducing the performance overhead of prefix matching.
Second, use the built-in data structure of JDK
The performance of JDK’s built-in data structure is also relatively high. Some contestants use TreeSet to sort, and then use the subset method to directly obtain the matching prefix string data. This method is relatively simple and fast, and the amount of code is small.
Competition code show
The contestants have many solutions to this competition problem. Due to space limitation, this article only shows the code fragments of four contestants.
Contestant code snippet 1
The solution is through string Substring method intercepts the prefixes of different lengths of strings, and finds the intercepted prefixes directly from the map of result. This method does not need to be like string Startswith compares characters one by one, so the efficiency will be greatly improved. There are many contestants who use this solution, and some contestants will limit the length of interception. For example, count the shortest and longest length of prefix string in advance, and just traverse to generate the prefix within this length range.
Contestant code snippet 2
The solution will first exclude the data beyond the shortest and longest length range of prefix string, and then sort the data set. Then traverse the list of prefix strings to be matched, find and calculate the upper and lower bounds of the array at its matching position through binary search for each prefix string, and then extract the data within this range.
Contestant code snippet 3
The solution is similar to the sorting method. The TreeSet built in JDK is used for sorting, and then through TreeSet The subset method intercepts the data matching the prefix string.
In order to improve the challenge, some contestants gave up using TreeSet and realized the relevant sorting and search methods by themselves
Contestant code snippet 4
This is the code of the first contestant. The contestant optimized the trie tree and used charnode and arraynode to reduce some storage consumption. The offset is stored in arraynode through shift. The subscript position of the array plus the shift offset is the ASCII code of the character.
And the contestant did not store the string in the string array when outputting data, but defined bytebuf, which was directly stored in the byte array in the form of JSON string when outputting.
Although many contestants encounter more difficulties in this level, especially when dealing with large-scale data sets. But because of this, we also found that many players do not stop at customs clearance. They will try different solutions and constantly optimize the algorithms, even if they only improve by 0.01 points, which is also a great touch to our team. After the competition, we also reflected on some areas that were not done well enough, such as the lack of display of specific figures such as memory consumption and performance cost in the IDE scoring bar, which led some contestants to not know the actual memory consumption. In the future, we will optimize these details. If you have good ideas and suggestions, please give us feedback!
After reading the introduction, if you still want to experience the game, you can still go therehttps://code83.ide.aliyun.com, we are still open for you to experience.
Finally, welcome to try the smart coding plug-in:https://developer.aliyun.com/tool/cosy, as an AI development plug-in, it has powerful functions such as intelligent code completion and code example search, so that the development can code like a cloud and water, and get twice the result with half the effort. You can search in the IntelliJ idea or JetBrains plug-in marketAlibaba Cloud AI Coding AssistantorCosy, if you have any questions or suggestions in use, you can feed them back toGithub IssuesIn, we will listen to your voice carefully.
At present, all levels of the competition are open for experience. Domain name and address:https://code83.ide.aliyun.com/, welcome.