Lua performance optimization skills (IV): about string

Time:2022-4-30

Similar to the table, understanding how Lua implements strings allows you to use it more efficiently.

Lua implements strings in a way that is different from the two main ways used by most other scripting languages. First, all strings in Lua are internalized [1], which means that Lua maintains a single copy of any string. When a new string appears, Lua checks whether there is a ready-made copy, and if so, reuses it. Internalization makes operations such as string comparison and index table very fast, but it reduces the speed of creating strings.

Second, variables in Lua never store strings, just reference them. This implementation can speed up many string operations. For example, in Perl, when you write code similar to $x = $Y and $y is a string, the assignment operation will copy the content of the string from the buffer of $y to the buffer of $X. If the string is very long, the operation will be very expensive. In Lua, this assignment is only a copy of the pointer.

However, this reference implementation slows down string concatenation in a particular way. In Perl, operation $s = $s “X” and $s= “X” is very different. For the former, you get a copy of $s and append “X” to its tail; For the latter, “X” is simply appended to the tail of the internal buffer maintained by $s. Therefore, the latter has nothing to do with the length of the string (assuming that the buffer is enough to hold the appended text). If you put these two sentences of code into the loop, the difference between them is the difference between linear and quadratic algorithms. For example, the following cycle takes about five minutes to read a 5MB file:

Copy codeThe code is as follows:

$x = “”;
while (<>)
{
    $x = $x . $_;
}


If we put

Copy codeThe code is as follows:

$x = $x . $_


Change to

Copy codeThe code is as follows:

$x .= $_


The time will be reduced to 0.1 seconds!

 

Lua does not provide a second, faster way, because its variables have no internal buffer. Therefore, we need an explicit buffer: a table containing string fragments to do this. The following loop reads the same 5MB file in 0.28 seconds. Although it is not as fast as Perl, it is still good:

Copy codeThe code is as follows:

local t = {}
for line in io.lines() do
    t[#t + 1] = line
end
s = table.concat(t, “\n”)

 

[1] Internalize