The skill of parsing URL with JavaScript regular expression

Time:2020-1-6

A regular expression is an object that describes character patterns.

First of all, this article does not directly tell you what the regular expression of URL is and how to use this regular expression to resolve a URL address. I believe that this problem can be found in the network. The purpose of this article is to teach you how to understand regular expressions of URLs, so as to understand regular expressions and write relatively simple regular expressions in future work. To get back to the point, let’s take a look at the following examples:


var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;
var parse_url = /^(?:([A-Za-z]+):)?(\/{,})([-.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;
 var url = "http://qiji.kerlai.net:/GoodsBasic/Operate/?q#simen";
 var result = parse_url.exec(url);
 var names = ["url","scheme","slash","host","port","path","query","hash"];
 for(var i=; i <names.length;i++){
  console.log(names[i]+":"+result[i]);
 }
 //Output results
 /*
 url:http://qiji.kerlai.net:/GoodsBasic/Operate/?q#simen
 scheme:http
 slash://
 host:qiji.kerlai.net
 port:
 path:GoodsBasic/Operate/
 query:q
 hash:simen
 */

Let’s first look at the results:

url:http://qiji123.kerlai.net:81/GoodsBasic/Operate/12678?q#simen
scheme:http
slash://
host:qiji123.kerlai.net
port:81
path:GoodsBasic/Operate/12678
query:q
hash:simen

The set of result arrays in the code is [‘http://qiji123.kerlai.net: 81 / goodsbasic / operate / 12678? Q ා Simen ‘,’ HTTP ‘,’ / / ‘,’qiji123. Kerlai. Net’, ’81’,’goodsbasic / operate / 12678 ‘,’q’,’simen ‘]

Now we try to link the results from the second one to the last one, and the result is: “http / / qiji123.kerlai.net 81 goodsbasic / operate / 12678 Q Simen”, compared with the original URL, is missing the link character of “:” # “. Why is this? Speaking of this, we will introduce a concept of regular expressions as grouping of regular expressions. There are four groups of regular expressions: capture type, flight capture type, forward positive matching and forward negative matching. Here I would like to focus on the first two kinds, and the latter two kinds can be used by you. Among them, the non capture type will not appear in the result array, () encloses a group, which will occupy a position of the result array. Also, if you don’t use parentheses in your regular expression, the matching characters won’t appear in the array returned by the exec () method. Regular grouping is a grouping enclosed by ().

1. Capture group: (…)

2. Non capture group: (?:…)

3. Forward forward matching: (? =……)

4. Forward negative match: (?!…)

Next, we will decompose the parse uurl regular expression, the first grouping

1. ^ indicates the beginning of the string

The whole regular factor matches a protocol name: http

2. (?:) indicates a non captured grouping: that is, the characters in this bracket that are not matched in its sub brackets will not be placed in the result array.

3. () represents a capture type group, and the characters matched in the brackets are put into the HTTP characters in the corresponding URL in the result array

4. [] is a regular expression class, indicating that it conforms to any character in the bracket.

7. A-za-z means a to Z, a to Z. [a-za-z] means any character from letter A to letter Z, letter A to letter Z

5. + means match 1 time

6. Indicates that this group is an optional match condition

Second regular factor: (\ / {0,3})://

Capture group, \ / indicates a /, and {0,3} indicates that \ will be matched 0 times or between 1 and 3 times

([0-9.\-A-Za-z]+):qiji123.kerlai.net

Capture group, consisting of one or more numbers, “.”, “\ – (escape to” – “), letters a to Z and letters a to Z

(?::(\d+))?:81

Pre: put in a non captured group will not appear in the return array, where \ D represents the matching number. The whole factor is the match preceded by: followed by one or more numbers. This grouping factor is optional

(?:\/([^?#]*))?:GoodsBasic/Operate/12678

This group starts with / and ^ here means “non”, that is, the last “non” of all characters except “×”? Indicates that this regular factor grouping is optional

(?:\?([^#]*))? :q

The group representation contains 0 or more non ා characters

(?:#(.*))?:simen

The grouping starts with ා, and (.) will match all characters except the end character.

$indicates the end of the string.

Now we have analyzed all the groups of URLs. Next, you can write the regular expression of phone number: it can match the fixed phone and the mobile phone number (this will use the new character: |)

character Meaning
\

As a paraphrase, that is, the characters after “\” are usually not interpreted according to the original meaning, for example, / B / matches the character “B”. When a backslash bar is added in front of B / \ B /, the paraphrase is to match the boundary of a word. – or –
For the restoration of regular expression function characters, such as “*” matches its preceding metacharacter 0 times or more, / A * / will match a, AA, AAA, and / a \ * / will only match “a *” after “\” is added.

^ Match the beginning of an input or line, / ^ A / matches “an a”, not “an a”
$ Match the end of an input or line, / a $/ matches “an a”, not “an a”
* Match the preceding metacharacter 0 or more times, / BA * / will match B, Ba, baa, baaa
+ Match the preceding metacharacter once or more, / BA * / will match Ba, baa, baaa
? Match the preceding metacharacter 0 or 1 times, / BA * / will match B, Ba
(x) Match x saves X in a variable named $1… $9
x|y Match X or Y
{n} Exact match n times
{n,} Match more than n times
{n,m} Match N-M times
[xyz] Character set that matches any character (or metacharacter) in the set
[^xyz] Does not match any of the characters in this collection
[\b] Match a backspace
\b Match the boundary of a word
\B Match the non boundary of a word
\cX Here, X is a control character, / \ cm / matches ctrl-m
\d Match a number of characters, / \ D / = / [0-9]/
\D Match a non word character, / \ D / = / [^ 0-9]/
\n Match a line break
\r Match a carriage return
\s Match a blank character, including \ n \ R \ f \ t \ V, etc
\S Match a non blank character equal to / [^ \ n \ f \ R \ t \ v]/
\t Match a tab
\v Match a heavy tab
\w Match a character (alphanumeric, my free translation, including numbers) that can make up a word, including underscores, such as [\ w] matching 5 in “$5.98”, equal to [a-za-z0-9]
\W Matching a character that can’t be a word, such as [\ w] matching $, in “$5.98”, equals [^ a-za-z0-9].

It is better to use re = new regexp (“pattern”, [“flags”]) to compare pattern: regular expression flags: G (all patterns in full-text search) I (ignore case) m (multi line search)

Vascript dynamic regular expression problem

Can regular expressions be generated dynamically? For example, in javascript: VAR STR = “strtemp”; to generate: VAR re = / strtemp /; if it is character connection: VAR re = “/ + STR +” / “
But to generate an expression, can it be implemented? How?