Reverse pre search in regular expressions (2)

Time:2021-5-6

The code is:

Copy codeThe code is as follows:
//The purpose of the program is to remove the domain name in the image path
var str = ‘<img src=”https://imgs.developpaper.com/imgs/logo.gif”>’;
var reg1 = /(\<img)(.*(?=(http|https)\:\/\/))((http|https)\:\/\/[^\/]*)/gim;
str.match(reg1);
alert(str.replace(RegExp.$4,”);

This usage is applicable when there is only one URL in the string, but if the string contains multiple domain names, for example:

Copy codeThe code is as follows:
var str = ‘<img src=” https://imgs.developpaper.com/imgs/logo.gif “> on the home page of developer https://www.jb51.net “> link < / a > ‘;

After the program runs, the content removed is the second domain name https://www.jb51.net . Why?

A closer look at regular expressions shows that after matching < img with “(\ < IMG)”, use “. *” to match all characters until “http: / /” or “HTTPS: /”. Please note that it is “. *” that causes this problem. Here “. *” means to search and match as many as possible until the last qualifier, that is, greedy match in the term. Naturally, I thought of using non greedy matching to solve this problem. Change the expression to:

Copy codeThe code is as follows:
//The difference with greedy matching is that there is a question mark “, Greedy “.” * “, non greedy”. “*”
var reg1 = /(\<img)(.*?(?=(http|https)\:\/\/))((http|https)\:\/\/[^\/]*)/gim;

The method to solve the problem is very simple, but it also shows an important problem in daily work: inadequate program testing.