Reverse pre search in regular expressions (2)


The code is:

Copy codeThe code is as follows:
//The purpose of the program is to remove the domain name in the image path
var str = ‘<img src=””>’;
var reg1 = /(\<img)(.*(?=(http|https)\:\/\/))((http|https)\:\/\/[^\/]*)/gim;

This usage is applicable when there is only one URL in the string, but if the string contains multiple domain names, for example:

Copy codeThe code is as follows:
var str = ‘<img src=” “> on the home page of developer “> link < / a > ‘;

After the program runs, the content removed is the second domain name . Why?

A closer look at regular expressions shows that after matching < img with “(\ < IMG)”, use “. *” to match all characters until “http: / /” or “HTTPS: /”. Please note that it is “. *” that causes this problem. Here “. *” means to search and match as many as possible until the last qualifier, that is, greedy match in the term. Naturally, I thought of using non greedy matching to solve this problem. Change the expression to:

Copy codeThe code is as follows:
//The difference with greedy matching is that there is a question mark “, Greedy “.” * “, non greedy”. “*”
var reg1 = /(\<img)(.*?(?=(http|https)\:\/\/))((http|https)\:\/\/[^\/]*)/gim;

The method to solve the problem is very simple, but it also shows an important problem in daily work: inadequate program testing.