Delve into analytical expressions in Lua

Time:2021-11-25

Use a pattern

This example shows a program for establishing and using patterns. It is very simple but complete:

 

Copy codeThe code is as follows:

local lpeg = require “lpeg”

 

— matches a word followed by end-of-string
p = lpeg.R”az”^1 * -1

print(p:match(“hello”))        –> 6
print(lpeg.match(p, “hello”))  –> 6
print(p:match(“1 hello”))      –> nil

 

A pattern is a simple sequence of one or more lowercase characters that ends with (- 1) at the end. The program calls match as a method and function. In the above successful cases, the matching function returns the index of the first character successfully matched, adding 1 to its string length.

 

Copy codeThe code is as follows:
Name-value lists

 

This example parses a list of name value pairs and returns the paired tables:

 

Copy codeThe code is as follows:

lpeg.locale(lpeg)   — adds locale entries into ‘lpeg’ table

 

local space = lpeg.space^0
local name = lpeg.C(lpeg.alpha^1) * space
local sep = lpeg.S(“,;”) * space
local pair = lpeg.Cg(name * “=” * space * name) * sep^-1
local list = lpeg.Cf(lpeg.Ct(“”) * pair^0, rawset)
t = list:match(“a=b, c = hi; next = pi”)  –> { a = “b”, c = “hi”, next = “pi” }

 

Each pair has   An optional separator for formatname = namefollowed (with commas or semicolons).   The pairpattern forms a closure in a group pattern, so those names can become a single captured value. The list pattern then collapses the captures. It starts with an empty list, captures and matches an empty string by creating a list, and then applies a set over accumulator (table) and a capture value (pair name) for each capture (a name pair). Rawsetreturns ((uninitialized set) returns the table itself, so the accumulator is always executed in the table.


The following code creates a pattern that uses the given separation pattern SEP as a separator to split strings:

 

Copy codeThe code is as follows:
function split (s, sep)
  sep = lpeg.P(sep)
  local elem = lpeg.C((1 – sep)^0)
  local p = elem * (sep * elem)^0
  return lpeg.match(p, s)
end

 

First, this function ensures that SEP a suitable pattern. As long as there is no matching separator, the elem of the pattern is a duplicate of zero or more arbitrary characters. It also captures its matching values. Pattern P matches a set of elements split by Sep

If too many result values are generated by splitting, it may overflow the maximum number of values returned by a Lua function. In this case, we can put these values into a table:

 

Copy codeThe code is as follows:
function split (s, sep)
  sep = lpeg.P(sep)
  local elem = lpeg.C((1 – sep)^0)
  local p = lpeg.Ct(elem * (sep * elem)^0)   — make a table capture
  return lpeg.match(p, s)
end

 

Pattern search

Basic matching only works in anchor mode. If we want to find a pattern that matches anywhere in the string, we must write a pattern that matches anywhere.

Because patterns can be written, we can write a function that gives an arbitrary pattern P and returns a new pattern of search p to match any position of the string. There are several ways to perform this search. One method is as follows:

 

Copy codeThe code is as follows:
function anywhere (p)
  return lpeg.P{ p + 1 * lpeg.V(1) }
end

 

A direct interpretation of this syntax: match P or skip a character and try to match again.

If we want to know all the matching positions of the pattern in the string (not just a certain position in the string), we can add position capture to the pattern:

 

Copy codeThe code is as follows:
local I = lpeg.Cp()
function anywhere (p)
  return lpeg.P{ I * p * I + 1 * lpeg.V(1) }
end

 

print(anywhere(“world”):match(“hello world!”))   -> 7   12

Another method of this search is as follows:

 

Copy codeThe code is as follows:
local I = lpeg.Cp()
function anywhere (p)
  return (1 – lpeg.P(p))^0 * I * p * I
end

 

Again, the direct interpretation of this pattern: when p does not match, it skips as many characters as possible, and then matches P (plus the correct position capture).

If we want to find a pattern that only matches the word boundary, we can use the following transformation:

 

Copy codeThe code is as follows:

local t = lpeg.locale()

 

function atwordboundary (p)
  return lpeg.P{
    [1] = p + t.alpha^0 * (1 – t.alpha)^1 * lpeg.V(1)
  }
end

 

Balanced parentheses

The following pattern matches only strings with balanced parentheses:

 

Copy codeThe code is as follows:
b = lpeg.P{ “(” * ((1 – lpeg.S”()”) + lpeg.V(1))^0 * “)” }

 

Read the first (and only) grammar rule given. The so-called balanced string is an open parenthesis followed by zero or more non parenthesis characters or balanced string (LPFG. V (1)), followed by the closing parenthesis that can be closed with the open parenthesis.
Global replace

The following example is similar to what tostring.gsub does. It receives a parent string, a pattern and a replacement value, and then replaces all substrings in the passed in parent string that match the specified pattern as the specified replacement value:

 

Copy codeThe code is as follows:
function gsub (s, patt, repl)
  patt = lpeg.P(patt)
  patt = lpeg.Cs((patt / repl + 1)^0)
  return lpeg.match(patt, s)
end

 

As instring.gsub, the replacement value can be a string, a function, or a table

Comma separated values (CSV)

The following example converts a string to a comma separated value and returns all fields:

 

Copy codeThe code is as follows:

local field = ‘”‘ * lpeg.Cs(((lpeg.P(1) – ‘”‘) + lpeg.P'””‘ / ‘”‘)^0) * ‘”‘ +
                    lpeg.C((1 – lpeg.S’,\n”‘)^0)

 

local record = field * (‘,’ * field)^0 * (lpeg.P’\n’ + -1)

function csv (s)
  return lpeg.match(record, s)
end

 

A field is either a referenced field (a family may contain any characters except single quotation marks or double quotation marks) or an unreferenced field (does not contain commas, line breaks or quotation marks). A record is a comma separated list of fields (ending with a newline character or a string).

Like this, each field returned by the previous match is returned independently. If we add a list to intercept defined records. The returned will no longer be a separate list of all fields.

 

Copy codeThe code is as follows:
local record = lpeg.Ct(field * (‘,’ * field)^0) * (lpeg.P’\n’ + -1)

 


UTF-8 and Latin 1

It is not difficult to use LPEG to convert a string from UTF-8 encoding to Latin 1 (ISO 88590-1):

 

Copy codeThe code is as follows:

— convert a two-byte UTF-8 sequence to a Latin 1 character
local function f2 (s)
  local c1, c2 = string.byte(s, 1, 2)
  return string.char(c1 * 64 + c2 – 12416)
end

 

local utf8 = lpeg.R(“\0\127”)
           + lpeg.R(“\194\195”) * lpeg.R(“\128\191”) / f2

local decode_pattern = lpeg.Cs(utf8^0) * -1

 

In these codes, UTF-8 defines the coding range (from 0 to 255) that has been Latin 1. All codes outside this range (and any invalid codes) will not match the pattern.

Just like decode_ As required by pattern, this pattern matches all inputs (because – 1 is at the end of it), and any invalid string will fail to match without any useful information about this problem. We can redefine the following decode_ Pattern to improve this situation:

 

Copy codeThe code is as follows:

local function er (_, i) error(“invalid encoding at position ” .. i) end

 

local decode_pattern = lpeg.Cs(utf8^0) * (-1 + lpeg.P(er))

 

Now, if the mode utf8 ^ 0   Stopping before the end of the string calls an applicable error function.

UTF-8 and Unicode

We can extend the previous pattern to deal with all unicdoe code fragments. Of course, we can’t translate them into Arabic numeral 1 or any other byte encoding. Instead, we translate code fragments represented by numbers in sequence results. Here is the complete code:
 

Copy codeThe code is as follows:

— decode a two-byte UTF-8 sequence
local function f2 (s)
  local c1, c2 = string.byte(s, 1, 2)
  return c1 * 64 + c2 – 12416
end
 
— decode a three-byte UTF-8 sequence
local function f3 (s)
  local c1, c2, c3 = string.byte(s, 1, 3)
  return (c1 * 64 + c2) * 64 + c3 – 925824
end
 
— decode a four-byte UTF-8 sequence
local function f4 (s)
  local c1, c2, c3, c4 = string.byte(s, 1, 4)
  return ((c1 * 64 + c2) * 64 + c3) * 64 + c4 – 63447168
end
 
local cont = lpeg.R(“\128\191”)   — continuation byte
 
local utf8 = lpeg.R(“\0\127”) / string.byte
           + lpeg.R(“\194\223”) * cont / f2

 

 
— decode a two-byte UTF-8 sequence
local function f2 (s)
  local c1, c2 = string.byte(s, 1, 2)
  return c1 * 64 + c2 – 12416
end
 
— decode a three-byte UTF-8 sequence
local function f3 (s)
  local c1, c2, c3 = string.byte(s, 1, 3)
  return (c1 * 64 + c2) * 64 + c3 – 925824
end
 
— decode a four-byte UTF-8 sequence
local function f4 (s)
  local c1, c2, c3, c4 = string.byte(s, 1, 4)
  return ((c1 * 64 + c2) * 64 + c3) * 64 + c4 – 63447168
end
 
local cont = lpeg.R(“\128\191”)   — continuation byte
 
local utf8 = lpeg.R(“\0\127”) / string.byte
           + lpeg.R(“\194\223”) * cont / f2

 

Long string of lua

The long string in Lua starts with the pattern [= * [and ends with the first occurrence of] = *] with exactly the same number of equal signs. If an open bracket is followed by a line break, the line break is discarded (that is, it is not treated as part of the string).

To match a long string in Lua, the pattern must capture the first repeated equal sign. Then, just find the candidates for the closed string and check whether they have the same number of equal signs.

 

Copy codeThe code is as follows:
equals = lpeg.P”=”^0
open = “[” * lpeg.Cg(equals, “init”) * “[” * lpeg.P”\n”^-1
close = “]” * lpeg.C(equals) * “]”
closeeq = lpeg.Cmt(close * lpeg.Cb(“init”), function (s, i, a, b) return a == b end)
string = open * lpeg.C((lpeg.P(1) – closeeq)^0) * close / 1

 

Open pattern matching [= * [, which captures duplicate equal signs in a group named init; it also discards an optional newline character (if it exists). Close pattern matching] = *], which also captures duplicate equal signs. The closeeq pattern first matches close, then uses reverse capture to recover the content previously captured by open and named init, and finally uses match time capture to check whether the two captures are the same. After the string pattern starts from open, it will be included until closeeq is matched, and then the final close is matched. The final digital capture simply discards the capture generated by close.

Arithmetic expression

This example completely parses and evaluates simple arithmetic expressions. And we write in two styles.

The first way is to establish a syntax tree, and then traverse the tree to calculate the value of the expression:

 

Copy codeThe code is as follows:

–Dictionary elements
[code]local Space = lpeg.S(” \n\t”)^0
local Number = lpeg.C(lpeg.P”-“^-1 * lpeg.R(“09”)^1) * Space
local TermOp = lpeg.C(lpeg.S(“+-“)) * Space
local FactorOp = lpeg.C(lpeg.S(“*/”)) * Space
local Open = “(” * Space
local Close = “)” * Space

 

–Grammar
local Exp, Term, Factor = lpeg.V”Exp”, lpeg.V”Term”, lpeg.V”Factor”
G = lpeg.P{ Exp,
  Exp = lpeg.Ct(Term * (TermOp * Term)^0);
  Term = lpeg.Ct(Factor * (FactorOp * Factor)^0);
  Factor = Number + Open * Exp * Close;
}

G = Space * G * -1

–Evaluator
function eval (x)
  if type(x) == “string” then
    return tonumber(x)
  else
    local op1 = eval(x[1])
    for i = 2, #x, 2 do
      local op = x[i]
      local op2 = eval(x[i + 1])
      if (op == “+”) then op1 = op1 + op2
      elseif (op == “-“) then op1 = op1 – op2
      elseif (op == “*”) then op1 = op1 * op2
      elseif (op == “/”) then op1 = op1 / op2
      end
    end
    return op1
  end
end

–Parse / evaluate
function evalExp (s)
  local t = lpeg.match(G, s)
  if not t then error(“syntax error”, 2) end
  return eval(t)
end

–Use examples
print(evalExp”3 + 5*9 / (1+1) – 12″)   –> 13.5

 

The second style does not need to establish a syntax tree, but is evaluated directly. The following code follows this approach (assuming the same dictionary elements as above):

 

Copy codeThe code is as follows:

–Auxiliary function
function eval (v1, op, v2)
  if (op == “+”) then return v1 + v2
  elseif (op == “-“) then return v1 – v2
  elseif (op == “*”) then return v1 * v2
  elseif (op == “/”) then return v1 / v2
  end
end

 

–Grammar
local V = lpeg.V
G = lpeg.P{ “Exp”,
  Exp = lpeg.Cf(V”Term” * lpeg.Cg(TermOp * V”Term”)^0, eval);
  Term = lpeg.Cf(V”Factor” * lpeg.Cg(FactorOp * V”Factor”)^0, eval);
  Factor = Number / tonumber + Open * V”Exp” * Close;
}

–Use examples
print(lpeg.match(G, “3 + 5*9 / (1+1) – 12”))   –> 13.5

 

Note the usage of the fold capture. To calculate the value of an expression, the collector starts with the value of the first term and applies evolutionary collectors, operators, and new terms to each copy.

Recommended Today

Vue、Three. JS implementation panorama

1、 First, we need to create a Vue project This paper mainly records the process of building panorama in detail, so building Vue project is not described too much. 2、 Install three js npm install three –save npm install three-trackballcontrols –save npm install three-orbit-controls –save npm i three-obj-mtl-loader –save npm i three-fbx-loader –save npm i […]