A simple introduction to regular expressions & Practice

Time:2020-10-1

preface

Regular expressions are used in many places in our daily work. We have to say that regular expressions are very powerful. For example, our commonly used jQuery selector is very convenient. In the source code of jQuery, a large number of regular expressions are used in the selector part. However, a lot of times, when we want to use regular, many people (including me = =) are all open Baidu, copy and paste, try, no problem, finished. However, when meeting special needs, the copy and paste method is not easy to use, so it is necessary to master this regularity. Now I will combine some of my understanding of regular for a brief introduction, if there are mistakes, welcome to point out.

Appetizer

If there is such a question:

Please write a regular mailbox verification.

If you can write:

    /@/

want a go:

var reg = /@/;
reg.test("[email protected]");//true

Figure:

A simple introduction to regular expressions & Practice

No problem!
Congratulations, you can write regular ~ (funny face)

In fact, this is a technical interview question from Ali. It can be written very simply or very complicated. As for what kind of writing is, it depends on one’s ability.

Keep going down. When it comes to email format, let’s make a brief analysis:

The naming method of mailbox is generally divided into three parts

  1. mailing address

  2. @

  3. Email domain name address

The rules for the email address section are as follows:

  • English letter

  • number

  • Underline

  • strikethrough

  • spot

A few chestnuts:

  1. [email protected]

  2. [email protected]

  3. [email protected]

  4. [email protected]

According to the above rules, let’s write the email address part first:

var reg = /^\w+([-+_.]\w+)*$/;

    reg.test("123456"); //true
    reg.test("blue.sky"); //true
    reg.test("123bird"); //true
    reg.test("ex-xxx001"); //true

Figure:

A simple introduction to regular expressions & Practice

No problem. Keep going.

@Just use @ to match

Rules of domain name address part:

  • English letter

  • number

  • strikethrough

  • spot

@The first character at the end cannot be a Chinese line or dot

var reg=  /^@\w+([-.]\w+)*\.\w+([-.]\w+)*$/
var reg = /^@\w+([-.]\w+)*\.\w+([-.]\w+)*$/;
reg.test("@123-abc.com.cn"); // true

The final regularization can look like this:


var reg = /^\w+([-+_.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$/;
reg.test("[email protected]"); //true

Figure:

A simple introduction to regular expressions & Practice

At this point, a simple email regularization is written (ZZ, luoriba, a lot of wordy).

The following is a simple summary.

Official introduction

Character class

First of all, it introduces the most common and basic character class.

code explain Other introduction
. Matches any single character other than a newline character For example, in a sentence, ‘nay, an apple is on the tree’,.nCan matchanon, but it can’t matchnayIt should be noted that one has not been escaped.Matches any character except line end.
^ Match the beginning of the string
$ Matches the end of the string Equivalent to [0-9]; an unescaped$Matches the end of the text. When the m flag is specified, it can also match the line terminator.
\ Escape character You can use to cancel the meaning of certain characters themselves, for example, to match.,*Escape characters can be used. In addition, 1 is a reference to the text captured by group 1, so it can be matched again.
\b Match word boundaries A character boundary is a location with no other characters before or after it, but the matching word boundary is not included in the match. In other words, the length of the matching word boundary is zero.
\s Match a single white space character It includes space, tab, newline and newline.
\S \S, which matches a single character other than white space.
\w Matches any alphanumeric character, including underscores. Equivalent to [a-za-z0-9_]
\W Matches any non character. Equivalent to [^ a-za-z0-9  u]
\d Match numeric characters Equivalent to [0-9]
\D Match non numeric characters Equivalent to [^ 0-9]

Compared with this table, the above mailbox is easy to see.

Just a few simple examples

console.log(/^\w+$/.test("abc"));  //true
console.log(/^\d+$/.test("12345")); //true
console.log(/^\D+$/.test(" "));  //true

Marking

code meaning usage
g Global, global If the flag G is used, the value is true
i Ignore, case insensitive If the flag I is used, the value is true
m Multiline, multiline search(^and$Can match line terminator) If the flag m is used, the value is true

Quantifier of regular expression

It is generally used to specify the quantity, and the commonly used ones are as follows:

code explain
Repeat zero or more times
Repeat once or more
Repeat 0 or more times
{n} Repeat n times
{n,} Repeat more than or equal to N times
{n,m} Repeat n to m times

Just a few simple examples

console.log(/^\w{3}$/.test("abc"));  //true
console.log(/^\d{1,5}$/.test("12345")); //true

Branching condition

The branching condition in a regular expression refers to several rules. If any one of them is satisfied, it should be regarded as a match.
The specific method is to separate different rules with |.

Let’s take a specific example:

var reg =  /0\d{2}-\d{8}|0\d{3}-\d{7}/;

This expression can match two phone numbers separated by hyphens:

One is 3-digit area code and 8-digit local code (such as 010-12345678),
One is 4-digit area code and 7-digit local code (0755-1234567).

Or write a simple regular to match the phone number:

var reg =  /^1(3|4|5|7|8)[0-9]\d{8}$/;

grouping

We already know how to repeat a single character, but what if you want to repeat multiple characters?
At this point, you can use grouping.

There are four types of grouping:

code usage introduce
Capture type () A catch group is a branch of a regular expression enclosed in parentheses, and any character that matches that group is captured. Each capture group is assigned a number. First captured in a regular expression(It’s group 1, the second capture(It’s group 2.
Non capture type (?: prefix) Non capturing groups do simple matching and do not capture matched text.
Forward forward matching (? = prefix) Forward matching is similar to a non capturing grouping, but after this group matches, the text will fall back to where it started, not actually matching anything.
Forward negative matching (?! prefix) Forward negative matching is similar to forward positive matching grouping, but only when it fails to match does it continue to match forward.

Take a simple example

To achieve a regular verification of IP address:

var reg = /^((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[0-5]|[01]?\d\d?)$/;

reg.test("127.0.0.1"); // true
reg.test("255.255.888.888"); // false

Combined with the figure:

A simple introduction to regular expressions & Practice

The assembly diagram is clear.

The above part is only a very small part of regular expression, and the space is limited. I will introduce it in the next article.
Well, based on what appears above, let’s look at a more complex example to consolidate, extract from the essence, let’s take a look at it.

A complex example

var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;

var  result = parse_url.exec(url);

names.forEach(function(item, index){ 
    console.log(item + ": " + result[index]);
});

console.log(result);
 // url: http://www.ora.com:80/goodparts?q#fragement
 // scheme: http
 // slash: //
 // host: www.ora.com
 // port: 80
 // path: goodparts
 // query: q
 // hash: fragement
 

Do you feel cute (funny face. JPG)

Look at the picture

A simple introduction to regular expressions & Practice

Let’s decompose parse_ URL to see how it works.

First of all^Represents the beginning of the string. It’s an anchor that instructs exec not to skip prefixes that don’t look like URLs, but to match strings that start like URLs.

scheme


var reg = (?:([A-Za-z]+):)

A simple introduction to regular expressions & Practice

This factor matches a protocol name if and only if it is followed by a:Only when they match.

(?...)Represents a non capturing group.
suffix?Indicates that the grouping is optional.

The number of the first captured group is 1, so the result of the match will appear in result [1].

slash


var reg = (\/{0,3});

A simple introduction to regular expressions & Practice

The next factor is catch group 2, where / needs to be escaped so that it is not interpreted as a terminator.
{0,3}Indicates that the slash / will be matched 0 times or 1-3 times.

host


var reg = ([0-9.\-A-Za-z]+);

A simple introduction to regular expressions & Practice

The third capture group is the host nameOne or more numbers,letter, ., -form.
It needs to be right here-To escape.

port


var reg = (::(\d+))?;

A simple introduction to regular expressions & Practice

The fourth capture packet is the port number. It is made up of a front end:A sequence consisting of one or more numbers.

path


var reg = (?:\/([^?#]*))?;

A simple introduction to regular expressions & Practice

This is another optional grouping to a/`Start. Character class after[^?#]With a^Start, indicating that the class contains a division? #All characters except for.*Represents 0 or more matches.

query


var reg = (?:\?([^#]*))?;

A simple introduction to regular expressions & Practice

This is also an optional grouping, with a?`To begin with, it contains a capture type group that contains 0 or more non groups#The character of.

hash


var reg = (?:#(.*))?;

A simple introduction to regular expressions & Practice

This is the last optional grouping to a#`Start,.Matches all characters except line breaks.

final$Represents the end of the string, which guarantees that there is no more at the end of the URL.

That’s parse_ All factors of URL.

This regular expression can also be written more complex, but it is generally not recommended.

Another example:

var url = "http://www.ora.com:80/goodparts?q#fragement";
 var result = url.split(/^(([^:\/?#]+):)?((\/\/)?([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?/);
 console.log(result);

Short, sharp regular expressions are the best.

As an introduction to the basic introduction, it’s almost enough to write here. The follow-up content will continue to be released.

Ben Dawang should pack up his things and go home tomorrow to wish everyone a good year.

Visualization tools in the attachment

Recommended Today

Explain idea git branch backoff specified historical version

scene When I submitted this modification to the local and remote branches, I found that there were still some changes missing in this submission, or this modification was totally wrong, but I also pushed it to the remote repository. How to go back? problem How can the content that has been submitted to the repository […]