Replacing regular — hyperscript expression joint development initiative announcement

Regular, difficult to write and complex grammar make it difficult for scholars to touch. In order to solve this problem, I decided to invite you to develop hyperscript expression (hereinafter referred to as HSE) with me. HSE uses marked form with regular syntax, such as:
regular\d{2}-\d{5}The equivalent HSE is:<rep=2><digit></rep>-<rep=5><digit></rep>
regular<(.*)>.*<\/\1>The equivalent HSE is&lt;<rem(><*><any></*><)>&gt;<*><any></*>&lt;/<rem=1>&gt;
regular^Chapter [1-9][0-9]{0,1}The equivalent HSE is:<@start>Chapter <in>1-9</in><rep=0,1><in>0-9</in></rep>
regular(\w)+[@]{1}((\w)+[\.]){1,3}(\w)+The equivalent HSE is:<+><word></+><rep1>@</rep><rep=1,3><word+>.</rep><word+>
HSE syntax notes
<> Transfer symbol< Use & lt; Replace. > Use & gt; replace
<@start> Matches the start of the input string. If the multiline attribute of HSE object is set, < @ start > also matches the position after < CRLF > or < CR >.
<@over> Matches the end of the input string. If the multiline attribute of HSE object is set, < @ over > also matches the position before < CRLF > or < CR >.
<*></*> Matches the previous subexpression zero or more times. For example, Z < * > o < / * > can match “Z” and “zoo”. Equivalent to < least = 0 > < / least >.
<+></+> Matches the previous subexpression one or more times. For example, ‘Z < + > o < / + >’ can match “Zo” and “zoo”, but not “Z”. Equivalent to < least = 1 > < / least >.
<sel></sel> Matches the previous subexpression zero or once. For example, “do < sel > es < / sel >” can match “do” in “do” or “does”. Equivalent to < rep = 0,1 > < / Rep >.
<rep=n></rep> N is a nonnegative integer. Match the determined n times. For example, ‘< rep = 2 > o < / Rep >’ cannot match ‘o’ in “Bob”, but it can match two o’s in “food”.
<least=n></least> N is a nonnegative integer. Match at least N times. For example, ‘< least = 2 > o < / least >’ cannot match ‘o’ in “Bob”, but can match all o’s in “fooood”.
<rep=m,n></rep> Both M and N are nonnegative integers, where n > = M. At least m matches and at most N matches. For example, “< rep = 1,3 > o < / Rep >” will match the first three o’s in “food”. Please note that there can be no space between comma and two numbers.
Ng attribute The matching pattern is non greedy when the character follows any other qualifier. The non greedy pattern matches as few strings as possible, while the default greedy pattern matches as many strings as possible. For example, for the string “oooo” ‘< + ng > o < / + >’ will match a single “O”, and ‘< + > o < / + >’ will match all ‘o’.
< anything > or < any > Matches any single character except “< CRLF >”. To match any character including ‘< CRLF >’, use a pattern like ‘< in > < any > < CRLF > < / in >’.
< REM (> P < /) > or
Match P and get this match. The obtained matches can be obtained from the generated matches set, using the submatches set and the $0… $9 attribute.
<(>pattern</)> The pattern is matched but the matching result is not obtained, that is, it is a non obtained match and is not stored for future use. This is useful when using the or character (|) to combine parts of a pattern. For example, ‘industry < (> y < or > ies < /) > is a simpler expression than’ industry|industries’.
<eq>pattern</eq> Forward look-up, which matches the lookup string at the beginning of any string matching pattern. This is a non fetched match, that is, the match does not need to be fetched for later use. For example, ‘windows < EQ > 95 < or > 98 < or > NT < or > 2000 < / EQ >’ can match “windows” in “Windows 2000”, but cannot match “windows” in “windows 3.1”. The pre check does not consume characters, that is, after a match occurs, the search for the next match starts immediately after the last match, rather than after the characters containing the pre check.
<neq>pattern</neq> Negative prefetch matches the lookup string at the beginning of any string that does not match the pattern. This is a non fetched match, that is, the match does not need to be fetched for later use.
x<or>y Match X or y. For example, ‘Z < or > food’ can match “Z” or “food”< (> Z < or > F < /) > ood ‘matches “zoo” or “food”.
<in></in> Character set. Match any character contained. For example, ‘< in > ABC < / in >’ can match ‘a’ in “plain”.
<nin></nin> Negative character set. Matches any characters that are not included. For example, ‘< Nin > ABC < / Nin >’ can match ‘p’ in “plain”.
<in>a-z</in> Character range. Matches any character within the specified range.
<nin>a-z</nin> Negative character range. Matches any character that is not within the specified range.
<border></border> Match a word boundary, that is, the position between the word and the space.
<nborder></nborder> Matches non word boundaries< The negation of border >.
<control=x> Matches the control character indicated by X. For example, < control = m > matches a control-m or carriage return. The value of X must be either A-Z or one of A-Z. Otherwise, treat < control > as a < nothing > character.
<digit> Matches a numeric character. You can use < digit + >, < digit * >, < digit? > Format. The same below.
<ndigit> Matches a non numeric character.
<page> Match a page feed.
<crlf> Match a newline character.
<cr> Match a carriage return.
<blank> Matches any white space characters, including spaces, tabs, page breaks, and so on.
<nblank> Matches any non whitespace characters.
<tab> Match a tab.
<vtab> Match a vertical tab.
<word> Matches any word characters that include underscores.
<nword> Matches any non word characters.
<hex=n> Match n, where n is the hexadecimal escape value. Hexadecimal escape value must be two digits long.
<call=num> Match num, where num is a positive integer. A reference to the match obtained. For example, ‘< REM > < any > < / REM > < call = 1 >’ matches two consecutive identical characters.
<oct=n> Identifies an octal escape value.
<unicode=n> Match n, where n is a Unicode character represented by four hexadecimal digits. For example, \ u00a9 matches the copyright symbol( ©)。
< nothing > or
Matches null characters. Used to select, for example, < nothing > < or > a < or > b < or > C < or > d represents a, B, C, D or no (empty character).
<total> Force all strings to match. For example, < total > HS < in > def < / in > can match HSD, but not HSD in HSDB.

Recommended Today

Cloud native enthusiast weekly: kubesphere 3.3.0 alpha release

Open source project recommendation Collection of Linux eBPF slides/documents This project collects all kinds of information related to ebpf, which is very comprehensive. magic-trace Magic trace is a Linux performance analysis tool, which can be used to analyze the performance problems of applications and kernels. Unlike perf, magic trace does not sample the call stack […]