Case analysis of the use of REXML in Ruby program to parse XML format data

Time:2021-12-31

REXMLIs a library written by Sean Russell. It’s notRubyThe only XML library, but it is very popular and written in pure Ruby (nqxml is also written in ruby, but xmlparser encapsulates the jade library written in C). In his rexml overview, Russell commented:
I have this problem: I don’t like confusing APIs. There are several XML parser APIs for Java implementations. Most of them follow DOM or sax and are very similar in principle to the many emerging Java APIs. That is, they look as if they were designed by theorists who have never used their own APIs. Often, existing XML APIs are annoying. They use a markup language that is clearly designed to be very simple, first-class and powerful, and then encapsulate it with annoying, excessive and large APIs. Even for the most basic XML tree operation, I always have to refer to API documents; Nothing is intuitive, and almost every operation is complex.
Although I don’t think it’s very annoying, I agree with Russell that XML APIs undoubtedly bring too much workload to most people who use them.

Example
Look at the book below xml:

quote

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<library shelf="Recent Acquisitions">
 <section name="Ruby">
  <book isbn="0672328844">
  <title>The Ruby Way</title>
  <author>Hal Fulton</author>
  <description>
   Second edition. The book you are now reading.
   Ain't recursion grand?
  </description>
  </book>
 </section>
 <section name="Space">
  <book isbn="0684835509">
   <title>The Case for Mars</title>
   <author>Robert Zubrin</author>
   <description>Pushing toward a second home for the human
    race.
   </description>
  </book>
  <book isbn="074325631X">
   <title>First Man: The Life of Neil A. Armstrong</title>
   <author>James R. Hansen</author>
   <description>Definitive biography of the first man on
    the moon.
   </description>
  </book>
 </section>
</library>

1 tree parsing (i.e. DOM like)

We need the require rexml / document library, and include rexml:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
require 'rexml/document'
include REXML
 
input = File.new("books.xml")
doc = Document.new(input)
 
root = doc.root
puts root.attributes["shelf"# Recent Acquisitions
 
doc.elements.each("library/section") { |e| puts e.attributes["name"] }
# Output:
# Ruby
# Space
 
doc.elements.each("*/section/book") { |e| puts e.attributes["isbn"] }
# Output:
# 0672328844
# 0321445619
# 0684835509
# 074325631X
 
sec2 = root.elements[2]
author = sec2.elements[1].elements["author"].text  # Robert Zubrin

It should be noted here that the attributes and values in XML are represented as a hash, so we can extract the values we need through attributes [] and the values of elements can also be obtained through strings or integers similar to path If an integer is used, it is 1-based instead of 0-based

2. Stream parsing (i.e. Sax like parsing)

A little trick is used here, that is, a listener class is defined, which will be called back during parse:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
require 'rexml/document'
require 'rexml/streamlistener'
include REXML
 
class MyListener
 include REXML::StreamListener
 def tag_start(*args)
 puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"
 end
 
 def text(data)
 return if data =~ /^\w*$/  # whitespace only
 abbrev = data[0..40] + (data.length > 40 ? "..." : "")
 puts " text : #{abbrev.inspect}"
 end
end
 
list = MyListener.new
source = File.new "books.xml"
Document.parse_stream(source, list)


Here we introduce the streamlistener module, which provides several empty callback methods, so you can override it in order to implement your own functions When the parser enters a tag, the tag is called_ Start method The text method is similar. It is just called back when reading data. Its output is as follows:

?
1
2
3
4
5
tag_start: "library", {"shelf"=>"Recent Acquisitions"}
tag_start: "section", {"name"=>"Ruby"}
tag_start: "book", {"isbn"=>"0672328844"}
tag_start: "title", {}
text : "The Ruby Way"


3 XPath

Rexml provides XPath support through XPath classes It also supports DOM like and sax like As for the previous XML file, we can do this using XPath:

?
1
2
3
4
5
6
7
8
9
book1 = XPath.first(doc, "//book") # Info for first book found
p book1
 
# Print out all titles
XPath.each(doc, "//title") { |e| puts e.text }
 
# Get an array of all of the "author" elements in the document.
names = XPath.match(doc, "//author").map {|x| x.text }
p names

The output is similar to the following:

?
1
2
3
4
5
<book isbn='0672328844'> ... </>
The Ruby Way
The Case for Mars
First Man: The Life of Neil A. Armstrong
["Hal Fulton", "Robert Zubrin", "James R. Hansen"]

Recommended Today

Could not get a resource from the pool when the springboot project starts redis; nested exception is io. lettuce. core.

resolvent: Find your redis installation path: Start redis server Exe After successful startup: Restart project resolution. ———————————————————————->Here’s the point:<——————————————————————- Here, if you close the redis command window, the project console will report an error. If you restart the project, the same error will be reported at the beginning, The reason is: It is inconvenient to […]