Regular expressions do not contain attributes

Time:2021-9-7

Find out all img tags without the tag with the description attribute ALT:

Regular: < img (?! [^ < >] *? alt[^<>]*?>).*?>
Example: < img   src=””   alt=””>  < img   src=””  >  < img   src=””   title=””>   < img   src=””   id=””>   < img   src=””    title=””   alt=””>

Extension, if you want to find a without title attribute, it should be:

Regular: < a (?! [^ < >] *? title[^<>]*?>).*?>
Example: < A   src=””   alt=””>  < a   src=””  >  < a   src=””   title=””>   < a   src=””   id=””>   < a   src=””    title=””   alt=””>
Use regular expressions to find words that do not contain the continuous string ABC

[^ ABC] means that it does not contain any characters in a, B and C. I want to implement ABC without string. How should I write an expression?

As far as I am concerned, the simplest solution to this problem is to use the cooperation of programming language to find out those that contain ABC and the rest that do not – lazy style. But I wrote a tutorial. Readers may not have the foundation of programming. Some just use some tools to extract some information from TXT documents, so the answer must be completed entirely through regular expressions.

So I opened regextester and started the experiment. First, I tried to use ((? ‘ test’abc)|.)*(?( test)(?!))( The meaning is: find ABC, or any character. If ABC is found, store it in the group named test. Finally, check whether there is content in the test group. If so, the matching fails. See the tutorial for relevant instructions). The result is that “ABC”, “AABC”, “ABCD” and “AA” can pass the test. It seems that it goes back after the test group exists. This solution is not feasible.

Then I tried again (. (?)?! ABC)) * (find out all characters that are not followed by ABC). The result is “ABC”, “ABCD” passes the test, and “AABC” only intercepts the following “ABC”. Obviously, it can’t.

Then try to strengthen the conditions: (? <! abc).(?! ABC)) * (find out all characters that are not ABC before and after), the result is that all strings containing ABC only intercept the “ABC” inside, and those that do not contain ABC pass directly.

It seems a little funny now, but how to filter out those strings containing ABC? In other words, how to match the whole rather than the part? Now you need to clarify the user’s needs: if the user wants to find a word, add \ B at both ends of the expression. If the user wants to find a line, add ^ and $. Since the user’s question is not clearly explained, I’ll take it as a word.

So wait until the expression: \ B (? <! abc).(?! ABC)) * \ B after testing, this expression can match all words without ABC and the word ABC.

How to exclude the word ABC? After some thinking, I think it is the most convenient way to judge whether the word begins with a: \ B (a (?)?! bc)|[^a](?! abc))((?<! abc).(?! ABC)) * \ B (either start with a that is not followed by BC or not, except that all characters after the beginning must not be preceded and followed by ABC). After testing, it fully meets the requirements, bingo!

Use a regular expression to find words that do not contain the continuous string ABC. The final result is: \ B (a (?)?! bc)|[^a](?! abc))((?<! abc).(?! abc))*\b
—————-
Update: according to maple’s comments, a more concise approach is: \ B (?! abc)\w)+\b

Regular expression – does not contain a string

When using regular expressions, there is often a need to match a substring that does not contain a substring. For example, I want to get the substring before “CD” from “eabcdfgh”. Some people may write:

([^cd]*)

This way of writing is completely wrong, because in [] is a set, that is to say, [^ CD] means that it is not equal to C or D, not CD. There is no CD in the following program, but EAB is still matched.

Copy codeThe code is as follows:
String s = “([^cd]*)”; 
Match m = Regex.Match(“eabcfgh”, s); 
MessageBox.Show(m.Value);//eab 
MessageBox.Show(m.Groups[1].Value);//eab

The above way of writing is wrong and outrageous. Normal young people can generally avoid this mistake. In special cases, regular expressions can be written like this, and the efficiency is relatively high.

([/s/S]*cd)

First of all, / S / s means to match any character. The so-called special case is that I know that there must be CD in this string. If my requirement is to match the part that does not contain CD (for convenience of description, only the part before CD is matched), that is, when CD does not exist, the whole string should be taken out.

Copy codeThe code is as follows:
String s = “((.(?!cd))*.)”; 
//String s = “([/s/S]*cd)”; 
Match m = Regex.Match(“eabcdfgh”, s); 
MessageBox.Show(m.Value);//eab 
MessageBox.Show(m.Groups[1].Value);//eab

This kind of writing finally meets the requirements. However, it is worth mentioning that, compared with the former one, its efficiency is relatively low.
Review the relevant syntax:
(?: Subexpression)          Define non capture groups.

Copy codeThe code is as follows:
//Define non capture groups  
String s = “e(?:ab)(.*)”; 
Match m = Regex.Match(“eabcd”, s); 
MessageBox.Show(m.Value);//eabcd 
MessageBox.Show(m.Groups[1].Value);//cd

AB is matched, but its group is not captured. Group [1] is CD

(?= Subexpression)        Zero width positive prediction lookahead assertion.

Copy codeThe code is as follows:
//Zero width positive prediction look ahead assertion  
//String s = “b(cd|de)(.*)”; 
String s = “b(?=cd|de)(.*)”; 
Match m = Regex.Match(“eabcdfg”, s); 
MessageBox.Show(m.Value); 
MessageBox.Show(m.Groups[1].Value);// Difference CD   cdfg

This writing method is different from the commented writing method. The difference is “zero width”. This writing method will be captured, that is, it will not occupy a group.

(?! Subexpression)        Zero width negative prediction lookahead assertion.

! It means no, that is, it does not contain. It is also zero width and will not be captured.

(?<= Subexpression)     Zero width positive review post assertion.

Example: (?)<= 19)\d{2}\b

“99”, “50” and “05” in “1851 1999 1950 1905 2003”

(?<! Subexpression)      Zero width negative review post assertion.

Example: (? <! 19)\d{2}\b

“51” and “03” in “1851 1999 1950 1905 2003”

Recommended Today

Swift advanced (XV) extension

The extension in swift is somewhat similar to the category in OC Extension can beenumeration、structural morphology、class、agreementAdd new features□ you can add methods, calculation attributes, subscripts, (convenient) initializers, nested types, protocols, etc What extensions can’t do:□ original functions cannot be overwritten□ you cannot add storage attributes or add attribute observers to existing attributes□ cannot add parent […]