Learning notes – regular

Time:2020-9-19

[basic review]

a. Character class
    Any character in brackets
    [^...] any character not in square brackets
    . any character other than newline and other Unicode terminators
    \W any word made up of ASCII characters, i.e. [a-za-z0-9]
    \W any word that is not composed of ASCII characters, i.e., [^ a-za-z0-9]
    \S any Unicode whitespace, such as: 0x0020
    \S any character that is not a Unicode whitespace character
    \d [0-9]
    \D [^0-9]
b. Repetition
    {n, m} n < = repeat < = m
    {n,} n < = repeat
    {n} N times
    ? optional 0 or 1 is equivalent to {0,1} ("do (ES)?" can match "do" in "do" or "does"
    +One or more times is equivalent to {1,} ("Zo + 'can match" Zo "and" Zo ", but not" Z ")
    *0 or more times is equivalent to {0,} (Zo * can match "Z" and "zoo")
c. Modifier
    I is not size sensitive;
    G performs a global match
    M multi line matching pattern

[related methods]

js
    1.search()   string.search (reg) returns the matching position, otherwise - 1
    2.match()  string.match (reg) returns an array of matching results
    3.replace()  string.replace (reg, STR) returns the replaced string
    4. Split() string, split (reg) returns the split array
php
    1.preg_match($reg,$string,$matchs);
    2.preg_match_all($reg,$string,$matchs);
    3.preg_replace($reg,$string,$matchs);
    4.preg_split ($reg,$string);

[advanced correlation]

1、 Non greedy repetitions (match as little as possible): add a? After the string to be matched

var text = 'aaa';text.match (/a+/);  => ["aaa"]
 var text = 'aaa';text.match (/a+?/);    => ["a"]
 var text = 'aaab'; text.match  (/ A +? B /); = > ["aaab"] // find the first possible position in the string
[thinking 1]
1. Using regular matching Tiffany or Milly
2. Using regular matching HTTP or HTTPS 
3. Using regular matching Java or JavaScript
4. Using regular matching Tiffany or Milly like Java or JavaScript

2、 Select | group () reference

1. Select: similar to or, match any item on the left and right
    var text ='milly';text.match (/tiffany|milly/);=> ["milly"]
    var text ='sela';text.match (/tiffany|milly/);=>null
    [tips]: when selecting, select to match from the left by default, even if there are better matches on the right
    var text = 'ab';text.match(/a|ab/ );=> ["a"]
2. Grouping: separate items into a subexpression;
    var text ='javascript';text.match (/java(script)?/)=> ["javascript", "script"]
    【tips】
    1. Define the sub pattern in the complete pattern, and extract the matching part of the sub pattern in the parentheses from the target string;
    2. It is allowed to * * refer to the expression in front of * * at the back of the same expression, and the number is the position of the left bracket. The reference to the regular expression does not refer to the subexpression pattern, but to the text matching the pattern.)
    3. Regular expressions remember the text that each self expression matches
    4. Grouping with (?) does not generate references
    var quato =/[a|b][^'"]*[a|b]/,text='agggb' ; text.match (quato);=> ["agggb"]
    var quato =/([a|b])[^'"]*/,text='agggb'; text.match (quato);=>null
    var quato =/"([^"*])"/,text='dddde'; text.replace(quato,' "$1" ');=> "dddde"
    var quato =/(\?|&)id\=\d+(.*)/,text='w.midea.com?id=7076&mtag=1'; text.replace(quato,'$1id=7078$2');=> "w.midea.com?id=7078&mtag=1"

3、 Specify match location

Elements like ^ do not match a particular character, and they specify where the match occurs, sometimes called an anchor
1. ^ start of string $end of string
     var text =' This ';text.match(/hi/ ); => ["hi"]
     var text =' This ';text.match(/^hi/ ); =>null
2. The boundary backspace direct quantity of a word is the boundary of a word  
     var text =' This is Regex ';text.match(/\bis\b/ );=> ["is"]
     var text =' This is Regex ';text.match(/\bi\b/ );=>null
3. B is not the boundary of a word
     var text =' This is Regex ';text.match(/\Bis\B/ );=>null
     var text =' This is Regex ';text.match(/Re\B/ );=> ["Re"]
4. (? =) adds an expression, i.e. assertion first, indicating that the expression in parentheses must match correctly, but does not include
    var text = 'javaScript';text.match(/java(Script)*(?=\:)/ ); =>null
    var text = 'javaScript:';text.match(/java(Script)*(?=\:)/ );=> ["javaScript", "Script"] 
    var text = 'java:'; text.match (/java(Script)*(?=\:)/ );=> ["java", undefined](?! )Mismatch
    var text = 'javaScript';text.match(/java(Script)*(?!\:)/ );=> ["javaScript", "Script"]
    var text = 'javaScript:';text.match(/java(Script)*(?!\:)/ );=> ["java", undefined]
    var text = 'java:';text.match(/java(Script)*(?!\:)/ );=>null

[thinking 1 reference]

1. /tiffany|milly/
var text ='milly';text.match (/tiffany|milly/);
var text ='sela';text.match (/tiffany|milly/);
2. (/http[s]?/
var text ='http';text.match (/http[s]?/);
3. /java(script)?/
var text ='javascript';text.match (/java[script]?/);
var text ='javascript';text.match (/java(script)?/);
4. (/(tiffany|milly)likejava(script)?/
var text ='tiffanylikejava';text.match (/(tiffany|milly)likejava(script)?/);

4、 Problem record

1. If the JS string contains the change string, it will be escaped

Learning notes - regular

2. Only the characters following it can be escaped, \ \ | means |, not | (Figure 2 is to distinguish the result that C language does not escape string.)

Learning notes - regular
Learning notes - regular