Detailed explanation of two methods of regular pattern matching in openresty

Time:2021-10-16

preface

This article introducesOpenRestyTwo kinds ofregularpattern matching

First of all, the openresty suite contains two kinds of syntax: one is the openresty syntax mainly based on the FFI API, and the other is the syntax similar to the native Lua scripting language.

In the content introduced in this article, the regular pattern matching corresponding to the above two grammars are ngx.re.find and string.find respectively.

These two rules play exactly the same role: search the string of the specified pattern in the subject string, return the digits of its start position and end position if a matching value is found, otherwise return two nil null values. It should be noted that only two values will be generated when the pattern is found. For example, when there is only one variable, only the starting position digit or a nil null value will be generated.

Even if you are familiar with Lua, it is no longer recommended to use Lua’s regular syntax such as string.find. Firstly, due to different implementations, the performance of the regular expressions provided by Lua is much worse than that of NGX. Re. * secondly, Lua’s regular syntax does not comply with the POSIX specification, while NGX. Re. * is implemented by the standard POSIX specification, which is obviously more universal and meaningful.

Another important reason is that compared with string. * which needs to be recompiled every time, the NGX. Re. * specification provided by openresty can cache the pattern after compilation (using the “O” parameter) and enable JIT through the “J” parameter to further improve the performance (PCRE JIT support is required).

string.find

Although there is no need to use string. Find (Qianlang died on the beach), I still intend to briefly introduce it, because I use it now (I will mention the reason later).

?
1
2
3
4
5
6
7
8
-- syntax
from, to, err = string.find(s, pattern, start, [plain])
 
-- context
init_worker_by_lua*, set_by_lua*, rewrite_by_lua*, access_by_lua*, content_by_lua*, header_filter_by_lua*, body_filter_by_lua*, log_by_lua*, ngx.timer.\*, balancer_by_lua*, ssl_certificate_by_lua*, ssl_session_fetch_by_lua*, ssl_session_store_by_lua*
 
-- example
string.find(ngx.var.http_user_agent, "360")

The purpose of the above example is to match the UA containing “360”. When the matching hits, the returned value is the number of digits of the start position and end position of the matching string (from left to right). For example, use ngx.say to display the output value. First complete the following code:

?
1
2
3
4
5
--Define variables
var = string.find(ngx.var.http_user_agent, "360")
 
--Output
ngx.say("var=" .. var)

Put it in the / example path of the nginx website:

?
1
2
3
4
5
6
location = /example {
 access_by_lua_block {
 var = string.find(ngx.var.http_user_agent, "360")
 ngx.say("var=" .. var)
 }
}

Then use curl to test the response:

?
1
2
3
4
5
6
#Send a request and specify the UA as 360
curl example.com -A "360"
 
#When you return the response, you will see the string returned by ngx.say echo
#The "360" string matched here is at the beginning of the character and the number of digits is 1
var=1

ngx.re.find

The advantages of the ngx.re.find specification have been described above. Here is its basic syntax (for more instructions, seeOfficial documents), and the requirements to take advantage of its advantages (using “O” parameter caching and PCRE JIT).

?
1
2
3
4
5
6
7
8
-- syntax
from, to, err = ngx.re.find(subject, regex, options?, ctx?, nth?)
 
-- context
init_worker_by_lua*, set_by_lua*, rewrite_by_lua*, access_by_lua*, content_by_lua*, header_filter_by_lua*, body_filter_by_lua*, log_by_lua*, ngx.timer.\*, balancer_by_lua*, ssl_certificate_by_lua*, ssl_session_fetch_by_lua*, ssl_session_store_by_lua*
 
-- example
ngx.re.find(ngx.var.http_user_agent, "360", "jo")

To use the NGX. Re. * specification and achieve higher performance, three conditions need to be met: use the – with PCRE JIT parameter at compile time to enable PCRE JIT support; Lua resty core support is required during compilation (you can directly use openresty to install); And when using Lua code, you need to use init_ by_ The Lua section introduces the require ‘rest. Core. Regex’ statement (introduced Lua rest core API support), and will use the “Jo” parameter as your habit when building code. These two parameters provide PCRE JIT and pattern cache switches. As used in example above.

Similarly, as the implementation of the previous example, Lua code becomes as follows:

?
1
2
3
4
5
--Define variables
var = ngx.re.find(ngx.var.http_user_agent, "360", "jo")
 
--Output
ngx.say("var=" .. var)

My pit

Finally, let’s explain why I’m still using string. Find syntax. The reason is embarrassing. It’s not that I don’t want to use it, but that I can’t use it. I used the following code:

?
1
2
3
if (ngx.re.find(ngx.var.request_uri, "^/admin/", "jo") ~= nil or ngx.re.find(ngx.var.request_uri, "^/tools/", "jo") ~= nil) then
 return ngx.exit(ngx.HTTP_CLOSE)
end

Then I found that this match pit me. When I take out this code alone, I will be denied access to / admin / xxx or / tools / xxx, but it will be useless as soon as I put it into the code construction. Of course, I’m sure it’s not the problem with my other code, because it’s good to replace it with string. Find.

In order to confirm whether it is a wrong pot, I have also done the following tests:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
if (ngx.var.request_uri == "/test1/") then
 if (ngx.re.find("/admin/test/", "^/admin/", "jo") ~= nil) then
  ngx.say("1=" .. ngx.re.find("/admin/test/", "^/admin/", "jo"))
 end
elseif (ngx.var.request_uri == "/test2/") then
 if (ngx.re.find("/admintest/", "^/admin/", "jo") ~= nil) then
  ngx.say("2=" .. ngx.re.find("/admintest/", "^/admin/", "jo"))
 end
elseif (ngx.var.request_uri == "/test3/") then
 if (ngx.re.find("/artic/", "^/admin/", "jo") ~= nil) then
  ngx.say("3=" .. ngx.re.find("/artic/", "^/admin/", "jo"))
 end
elseif (ngx.var.request_uri == "/test4/") then
 if (ngx.re.find("/artic", "^/admin/", "jo") ~= nil) then
  ngx.say("4=" .. ngx.re.find("/artic", "^/admin/", "jo"))
 end
elseif (ngx.var.request_uri == "/test5/") then
 if (ngx.re.find("/offline/admin/", "^/admin/", "jo") ~= nil) then
  ngx.say("5=" .. ngx.re.find("/offline/admin/", "^/admin/", "jo"))
 end
elseif (ngx.var.request_uri == "/test6/") then
 if (ngx.re.find("/offline/", "^/admin/", "jo") ~= nil) then
  ngx.say("6=" .. ngx.re.find("/offline/", "^/admin/", "jo"))
 end
elseif (ngx.var.request_uri == "/test7/") then
 if (ngx.re.find("/admin/", "^/admin/", "jo") ~= nil) then
  ngx.say("7=" .. ngx.re.find("/admin/", "^/admin/", "jo"))
 end
elseif (ngx.var.request_uri == "/test8/") then
 if (ngx.re.find("/adm/in", "^/admin/", "jo") ~= nil) then
  ngx.say("8=" .. ngx.re.find("/adm/in", "^/admin/", "jo"))
 end
else
 if (ngx.var.request_uri == "/test9/") then
  if (ngx.re.find("/admin", "^/admin/", "jo") ~= nil) then
   ngx.say("9=" .. ngx.re.find("/admin", "^/admin/", "jo"))
  end
 end
end

The test results show that my writing is correct. According to the echo results, it is judged that ^ / admin / does make a unique match to / admin / xxx.

summary

The above is the whole content of this article. I hope the content of this article has a certain reference value for everyone’s study or work. If you have any questions, you can leave a message. Thank you for your support for developepper.