REXMLIs a library written by Sean Russell. It’s notRubyThe only XML library, but it is very popular and written in pure Ruby (nqxml is also written in ruby, but xmlparser encapsulates the jade library written in C). In his rexml overview, Russell commented:
I have this problem: I don’t like confusing APIs. There are several XML parser APIs for Java implementations. Most of them follow DOM or sax and are very similar in principle to the many emerging Java APIs. That is, they look as if they were designed by theorists who have never used their own APIs. Often, existing XML APIs are annoying. They use a markup language that is clearly designed to be very simple, first-class and powerful, and then encapsulate it with annoying, excessive and large APIs. Even for the most basic XML tree operation, I always have to refer to API documents; Nothing is intuitive, and almost every operation is complex.
Although I don’t think it’s very annoying, I agree with Russell that XML APIs undoubtedly bring too much workload to most people who use them.
Example
Look at the book below xml:
quote
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
|
< library shelf = "Recent Acquisitions" > < section name = "Ruby" > < book isbn = "0672328844" > < title >The Ruby Way</ title > < author >Hal Fulton</ author > < description > Second edition. The book you are now reading. Ain't recursion grand? </ description > </ book > </ section > < section name = "Space" > < book isbn = "0684835509" > < title >The Case for Mars</ title > < author >Robert Zubrin</ author > < description >Pushing toward a second home for the human race. </ description > </ book > < book isbn = "074325631X" > < title >First Man: The Life of Neil A. Armstrong</ title > < author >James R. Hansen</ author > < description >Definitive biography of the first man on the moon. </ description > </ book > </ section > </ library > |
1 tree parsing (i.e. DOM like)
We need the require rexml / document library, and include rexml:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
require 'rexml/document' include REXML input = File . new ( "books.xml" ) doc = Document. new (input) root = doc.root puts root.attributes[ "shelf" ] # Recent Acquisitions doc.elements. each ( "library/section" ) { |e| puts e.attributes[ "name" ] } # Output: # Ruby # Space doc.elements. each ( "*/section/book" ) { |e| puts e.attributes[ "isbn" ] } # Output: # 0672328844 # 0321445619 # 0684835509 # 074325631X sec2 = root.elements[ 2 ] author = sec2.elements[ 1 ].elements[ "author" ].text # Robert Zubrin |
It should be noted here that the attributes and values in XML are represented as a hash, so we can extract the values we need through attributes [] and the values of elements can also be obtained through strings or integers similar to path If an integer is used, it is 1-based instead of 0-based
2. Stream parsing (i.e. Sax like parsing)
A little trick is used here, that is, a listener class is defined, which will be called back during parse:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
require 'rexml/document' require 'rexml/streamlistener' include REXML class MyListener include REXML ::StreamListener def tag_start(*args) puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}" end def text(data) return if data =~ /^\w*$/ # whitespace only abbrev = data[ 0 .. 40 ] + (data.length > 40 ? "..." : "" ) puts " text : #{abbrev.inspect}" end end list = MyListener. new source = File . new "books.xml" Document.parse_stream(source, list) |
Here we introduce the streamlistener module, which provides several empty callback methods, so you can override it in order to implement your own functions When the parser enters a tag, the tag is called_ Start method The text method is similar. It is just called back when reading data. Its output is as follows:
1
2
3
4
5
|
tag_start: "library" , { "shelf" => "Recent Acquisitions" } tag_start: "section" , { "name" => "Ruby" } tag_start: "book" , { "isbn" => "0672328844" } tag_start: "title" , {} text : "The Ruby Way" |
3 XPath
Rexml provides XPath support through XPath classes It also supports DOM like and sax like As for the previous XML file, we can do this using XPath:
1
2
3
4
5
6
7
8
9
|
book1 = XPath.first(doc, "//book" ) # Info for first book found p book1 # Print out all titles XPath. each (doc, "//title" ) { |e| puts e.text } # Get an array of all of the "author" elements in the document. names = XPath.match(doc, "//author" ).map {|x| x.text } p names |
The output is similar to the following:
1
2
3
4
5
|
<book isbn='0672328844'> ... </> The Ruby Way The Case for Mars First Man: The Life of Neil A. Armstrong ["Hal Fulton", "Robert Zubrin", "James R. Hansen"] |