(defun text-match (source target) (setq n (length target)) (if (< (length source) n) nil (string= (substring source 0 n) target)))
In fact, elisp provides a more powerful text matching function. How strong is it? Powerful enough to support regular expression matching.
Regular expressions are like the portraits of criminals on the wanted notices posted by the ancient government on the side of the city gate when they caught the bandits. The more characteristic a criminal’s appearance is, the more useful his portrait is. I also think that modern machine learning programs recognize faces in photos, and its principle is like putting up wanted notices on the side of city gates.
How to give a text portrait? Specifically, how to give
\`\`\`What about the text portrait as the beginning? It’s very simple. Just draw like this
^It means “at the beginning” followed by “at the beginning”
\`\`\`It means the beginning is
string-matchFunction can use a string object composed of regular expressions to match another string object, for example:
(string-match "^```" "```lisp")
Note that for the sake of illustration, from now on, string objects (or instances of string types) and list objects (or instances of list types) are all referred to as string and list without special declaration. There should be no misunderstanding.
In the above example, because the string
\`\`\`At the beginning, so
string-matchThe evaluation result of is not
nilOtherwise it is
nil. For the elisp interpreter, non
nilThat is to say, if a value is not true
'()No matter what it is, elisp will equate it to
t. Remember what I said before,
'()Equivalence. Keep that in mind. In fact, the evaluation result of the above example is 0, but 0 is not
Why is the evaluation result of the above example 0? because
string-matchAt the beginning of the string, you find the part that matches the regular expression. The beginning of a string, that is, the index (or subscript) of the first character of the string, whose value is 0. Let’s take another example
(setq r "```") (setq x "foo```bar") (string-match r x)
At this point,
string-matchIs a judgment string
xIs there a regular expression with
rThe result of evaluation is the index of the first character of the matched text. Because in
\`\`\`The index of the first character of is 3, so in the above example
string-matchThe result is 3. The implication of this evaluation is that it conforms to the regular expression
rThe text of is in
xThe fourth character position of begins to appear.
Here’s an example,
(setq r "```$") (setq x "foo```") (string-match r x)
It can be judged
xIs it based on
\`\`\`ending. In regular expressions,
$Represents the end of the text.
^\`\`\`$What do you mean? Guess, although there is no reward, but you can be sure that they are not stupid.
Now, we can use the text matching function in the parser program in Chapter 2
string-matchInstead. So far, the knowledge related to the parser has been popularized. The problem it solved is no longer a problem. I need to find new problems.
The new problem is still in the foo. MD file. Only part of it is given below
# Hello world! The following is the content of the C language Hello world program source file hello. C: ``` #include <stdio.h> ... ... ... ``` ... ... ...
# Hello world!Is the title of the document section. Using regular expressions
^#You can match it, but there’s also a way to copy it
#The line of text that begins with the. Now, do some people understand why, from the second chapter to now, I’m right
\`\`\`Is the beginning of the text line so obsessive? Only by identifying the transcription environment and ignoring them, can we have enough possibility to match the section title of the document. As for how to ignore the copy environment of the text, now and put it down. Just remember, there is a new problem, and I don’t know how many chapters will be needed to solve it.
In the premise of ignoring the copying environment, the use of
^#You can match the document section title, but it’s too coarse. Because, the actual appearance of the document section title can be as follows
#Title #Title #Title
#There should be at least one space between the title and the name of the title. In addition, the title of the name is also allowed to appear after the space, such as the input title, accidentally introduced. Therefore, a more precise regular expression for matching document section titles is
[[:blank:]]Matches white space characters, which cover spaces.
+Indicates that there may be one or more characters before it.
*Indicates that the character before it may not exist, or there may be one or more characters.
.It can match any character. therefore
[[:blank:]]+One or more spaces can be matched,
.+Can match 1 or more characters, and
[[:blank:]]*Can match 0, 1 or more spaces. Using this regular expression, you can more accurately match the section titles of documents. For example:
(setq x "# Hello world! ") (setq r "^#[[:blank:]]+.+[[:blank:]]*$") (string-match r x)
string-matchThe evaluation result of is 0, which is correct. Now I can think that if I define a text matching function with similar functions, I can’t estimate the workload based on my current elisp programming skills and my understanding of NFA.
Regular expressions are not only used for matching, but also for text capture. For example, from the string in the example above
xCapture document section title name in
Hello world!The corresponding regular expression should be written as
(setq r "^#[[:blank:]]+\(.+\)[[:blank:]]*$")
That is, in regular expressions
\\)The regular expression segment corresponding to the text to be captured
string-matchWhen using this regular expression for text matching, the
.+To save the matched text segment, use the
(match-string 1)Extraction. for example
(setq x "# Hello world! ") (setq r "^#[[:blank:]]+\(.+\)[[:blank:]]*$") (string-match r x) (princ\' (match-string 1 x))
Above program output
match-stringThe first parameter of is in the regular expression
\\(...\\)The serial number of the. Because there can be many places in a regular expression
match-stringSpecifies where to get the text in
The following program uses two regular expression traps
(setq x "############ Hello world! ") (setq r "^\(#+\)[[:blank:]]+\(.+\)[[:blank:]]*$") (string-match r x) (princ\' (match-string 1 x)) (princ\' (match-string 2 x))
############ Hello world!
The above is just some basic knowledge of regular expression, because the main problem is how to use regular expression to match text in elisp program. As for more knowledge of regular expression itself, we can temporarily hold our feet when we encounter practical problems1。
Next chapter:Buffer transformation