Lua performance optimization skills (II): basic facts

Time:2022-5-2

Before running any code, Lua will translate (precompile) the source code into an internal format. This format is a virtual machine instruction sequence, which is similar to the machine code executed by the real CPU. After that, the internal format will be interpreted and executed by C code composed of a while loop containing a huge switch structure. Each case in the switch corresponds to an instruction.

As you may have learned elsewhere, starting with version 5.0, Lua uses a register based virtual machine. The virtual machine “register” mentioned here is different from the real CPU register, because the latter is difficult to transplant and the number is very limited. Lua uses a stack (implemented through an array and several indexes) to provide registers. Each active function has an activation record, that is, a segment on the stack that can be used by the function to store registers. Therefore, each function has its own register [1]. A function can use up to 250 registers because each instruction has only 8 bits to reference a register.

Due to the large number of registers, Lua precompiler can save all local variables in registers. The advantage of this is that accessing local variables can be very fast. For example, if a and B are local variables, the statement

Copy codeThe code is as follows:

a = a + b


Only one instruction will be generated:

Copy codeThe code is as follows:

ADD 0 0 1


(suppose a and B correspond to 0 and 1 respectively in the register). In contrast, if both a and B are global variables, this code will become:

Copy codeThe code is as follows:

GETGLOBAL 0 0 ; a
GETGLOBAL 1 1 ; b
ADD 0 0 1
SETGLOBAL 0 0 ; a


Therefore, we can easily get the most important performance optimization method in Lua programming: using local variables!

 

If you want to squeeze the performance of the program, there are many places you can use this method. For example, if you want to call a function in a long loop, you can assign the function to a local variable in advance. For example, the following code:

Copy codeThe code is as follows:

for i = 1, 1000000 do
    local x = math.sin(i)
end


30% slower than the following paragraph:

Copy codeThe code is as follows:

local sin = math.sin
for i = 1, 1000000 do
    local x = sin(i)
end


Accessing external local variables (or upper values of functions) is not as fast as accessing local variables directly, but it is still faster than accessing global variables. For example, the following code snippet:

 

 

Copy codeThe code is as follows:


function foo (x)
    for i = 1, 1000000 do
        x = x + math.sin(i)
    end
    return x
end

 

print(foo(10))


It can be optimized to declare sin outside foo once:

 

 

Copy codeThe code is as follows:


local sin = math.sin
function foo (x)
    for i = 1, 1000000 do
        x = x + sin(i)
    end
    return x
end

 

print(foo(10))


The second code is 30% faster than the former.

 

Although Lua’s compiler is very efficient compared with other language compilers, compiling is still a heavy manual work. Therefore, run-time compilation should be avoided as much as possible (for example, using the loadstring function), unless you really need code with such dynamic requirements, such as code entered by users. Only in rare cases do you need to compile code dynamically.

For example, the following code creates a table containing several functions that return constant values from 1 to 100000:

 

Copy codeThe code is as follows:


local lim = 10000
local a = {}
for i = 1, lim do
    a[i] = loadstring(string.format(“return %d”, i))
end

 

print(a[10]()) –> 10


It takes 1.4 seconds to execute this code.

 

By using closures, we can avoid dynamic compilation. The following code takes only one tenth of the time to complete the same work:

Copy codeThe code is as follows:


function fk (k)
    return function () return k end
end

 

local lim = 100000
local a = {}
for i = 1, lim do a[i] = fk(i) end

print(a[10]()) –> 10