Uncover the “underlying logic” of your data processing and explain the calculation of formula engine (I)

Time:2022-5-29

background

In the information age, the most obvious thing we can feel is that a large number of intensive data burst, and people accumulate more and more data. These huge and miscellaneous data appear together. Many traditional data recording, querying and summarizing tools can not meet people’s needs. More effectively process these large amounts of data, so that the computer can understand the data effect that human needs, so as to form a more automatic and intelligent data processing method.

In order to deal with these massive data, various big data engines, search engines, computing engines, 3D engines, etc. have emerged to better solve the problem that human beings cannot deal with the huge amount of data. The calculation formula engine, which is the basis of comparison, is the core part responsible for data processing in the calculation program. Next, we will introduce the basic principle, calculation chain and asynchronous function composition of the calculation engine. Starting from the basic concept of the calculation formula engine, we will use our table electronic component as an example to demonstrate how to implement these contents in JavaScript.

Calculation principle of formula engine

The computing engine is responsible for the statistics of data sources, data operations, and data management, and returns appropriate computing results as required. For different purposes of data processing, the contents to be returned are different, and there are many different categories.

In order to make the computer better recognize the processing operations we need, we need to go through the compilation process to translate the language we write into a language that the machine can recognize.

The process of the whole compilation phase is divided according to the following figure:

Uncover the

The two key links are lexical analysis and grammar analysis. In these two parts, our input will be gradually split into the content that the program can recognize.

After inputting the content, the compiler first performs lexical analysis on the content. In this step, the task of the compiler is to identify whether the words in the source program are wrong. The part of the compiler that implements this function is generally called lexical analyzer. Usually, the output of lexical analysis is a single word symbol.

Taking JS as an example, there are three main parts in this process: analyzing function parameters, analyzing variable declarations, and analyzing function declarations. The purpose of the grammar analysis stage is to identify whether the grammar structure (i.e. statements or sentences) of the source program is wrong. In this stage, grammar errors can usually be found. In this phase, the compiler actually processes the word symbols from lexical analysis.

In the calculation formula engine, the way we process data is very similar to the process of processing language in the compilation principle. From the practical application, we can implement a calculation formula engine similar to excel calculation formula. The idea we can adopt is to start from lexical analysis, split the complete long string of formula statements into small pieces of content, then carry out syntax analysis, and finally calculate the generated syntax structure tree. Next, let’s take a look at how the details are implemented.

Implementation details of formula engine

Let’s start with formula calculation. Formula calculation is to calculate a formula string to get the expression result. For example, the formula “=1+10*11”
The result is 111. Computers are not human beings. If such a simple expression wants to be calculated correctly, it will eventually become the data content we need. It is not simple that we can get the answer by oral arithmetic after a glance. To realize this kind of Excel table calculation function, we need to go through the processes of lexical analysis, syntax analysis and syntax structure tree calculation.

1. lexical analysis

withUncover the andUncover the Common formulas in.

First, we perform lexical analysis. In this process, we split formula characters into string arrays. In Excel table formula calculation, the formula strings of expressions only include operators, symbols, strings, numbers, arrays, references and names.

Name: Sum

Operator: (): /%+

Reference: A1 a11 B1

Number: 100

2. syntax analysis

After lexical analysis is completed, we further analyze the results of lexical analysis. Generally, the syntax analysis in calculation can be handled by expression tree or stack (i.e. inverse Polish).

Here we first introduce the method of expression tree.

Parsing – expression tree

The process of using an expression tree for analysis starts with a binary tree. First, we form an expression tree based on the results of lexical analysis. The leaf node of the expression tree is the operand, and the internal node is the operator.

Uncover the

In this case, the colon has the highest priority, followed by the parentheses, and finally the division sign. When the tree is formed, it is very close to the final calculation result.

We will operate on this tree by recursion. We will start from the root node to sum and recurse down to a1:a11. We will have the first result, and then return the calculation result layer by layer.

Uncover the

This shows how to implement a formula calculation.

Syntax analysis inverse Polish algorithm

The inverse Polish algorithm forms a stack (i.e. inverse Polish expression) in the syntax analysis stage. The core of this expression is to convert the infix expression we use to the suffix expression. The parentheses only prompt the operation sequence in the operation process, but are not the actual element contents involved in the calculation. Therefore, the parentheses can be omitted in the process of infix to suffix,

Then the computer writes code to complete the operation.

Uncover the

Here we show a tree transformed into the corresponding inverse Polish style.

Binary tree recursive vs inverse Polish algorithm

Compared with the recursive calculation of a tree, the inverse Polish formula is more in line with the habit of mathematical calculation. However, when dealing with this formula calculation in a project, which one can deal with more complex situations?

Let’s take a look at a multi-layer nested publicity content:

Uncover the

The usage scenario of this announcement is sumifs function multi column summation, which is equivalent to the following:

=SUMIFS($C:$C,$B:$B,$A1)+SUMIFS($D:$D,$B:$B,$A1)+….

Obviously, the above formula is simpler. Using binary tree recursion, you only need to judge the content of the parent node and child node of sumifs node, and you can solve the multi column summation in just one line of code.

However, if the inverse Polish algorithm is used, the code starts to calculate as soon as it encounters sum. It is difficult to determine that the content of sum to run at this time is actually in the innermost brackets. It can be solved, but it is not the simplest.

Comparison results

Compared with the stack method, the tree method is easier to expand and enhance, and can deal with complex formulas more easily. This is a unique advantage in dealing with a large number of formulas and complex calculations.

summary

After introducing the whole process of how to parse and calculate formulas, we will continue to introduce the related content of calculating chains and asynchronous functions in the formula calculation engine. When dealing with complex formulas, how to solve the digraph, what is the calcondemand solution, and the fancy usage of asynchronous functions in front and back end calculation.

Uncover the

I think it’s good. Please give me a like before leaving. \ ~ more interesting content will be brought to you in the future\~

Extended reading