What are the parsing methods of XML data?



Last time we talked about the four ways of parsing JSON, so this time we’ll take a look at the four ways of parsing XML.

Four ways of analysis

  • DOM parsing
  • Sax analysis
  • JDOM analysis
  • Analysis of Dom4j

Case practice

DOM parsing

DOM (document object model). In the application program, the XML analyzer based on DOM transforms an XML document into a collection of object models (usually called DOM)DOM tree)Through the operation of this object model, the application realizes the operation of XML document data. XML itself is in the form of a tree, so when DOM is operated, it will also be converted in the form of a chapter tree. In the whole DOM tree, the largest place refers to the document, which represents a document in which there is only one root node.

Note: when using DOM operation, each text area is also a node, which is called text node.

Core operation interface

There are four core operation interfaces in DOM parsing

Document: this interface represents the whole XML document, and represents the root of the whole DOM tree. It provides the access and operation of the data in the document. All the elements in the XML document can be accessed through the document node.

Node: this interface plays an important role in the whole DOM tree. A large part of the core interfaces of DOM operation are inherited from the node interface. For example: document, element and other interfaces. In the DOM tree, each node interface represents a node in the DOM tree.

NodeList: this interface represents a collection of nodes. It is generally used to represent a group of nodes with sequential relationship. For example, the child nodes of a node will directly affect the NodeList collection when the document changes.

NamedNodeMap: this interface represents the one-to-one correspondence between a group of nodes and their unique names. This interface is mainly used for the representation of attribute nodes.

DOM parsing process

If a program needs to parse and read DOM, it also needs to follow the following steps:

① Create documentbuilderfactory: documentbuilderfactory factory = documentbuilderfactory. Newinstance();
② Create documentbuilder: documentbuilder builder = factory. Newdocumentbuilder();
③ Create document: document doc = builder. Parse (“file path to be parsed”);
④ Create NodeList: NodeList NL = doc. GetElementsByTagName (“read node”);
⑤ Read XML information

Sax analysis

Sax (simple API for XML) parsing is done step by step according to the sequence of XML files. Sax does not have an official standard organization, it does not belong to any standard organization or group, nor does it belong to any company or individual, but provides a computer technology for anyone to use.

Sax (simple API for XML, a simple interface for operating XML) is different from DOM operation in that Sax uses a sequential mode to access, which is a way to quickly read XML data. When the Sax parser is used for operation, a series of things will be triggered. When scanning to the beginning and end of the document and the beginning and end of the element, the related processing methods will be called, and the corresponding operations will be performed by these operation methods until the end of the whole document scanning.

If you want to implement this kind of Sax parsing, you must first build a Sax parser.

//1. Create parser factory
SAXParserFactory factory = SAXParserFactory.newInstance();
//2. Get the parser
SAXParser parser = factory.newSAXParser();
//Sax parser, inheriting defaulthandler
String path = new File(“resource/demo01.xml”).getAbsolutePath();
parser.parse(path, new MySaxHandler());

JDOM analysis

Dom and sax are XML operation standards provided by W3C. However, from the perspective of development, DOM and sax have their own characteristics. DOM can be modified, but it is not suitable for reading large files. Sax can read large files, but it cannot be modified. The so-called JDOM = DOM modifiable + Sax read large files, JDOM itself is a free open source component, directly from thewww.jdom.orgDownload.

Common classes of JDOM operation XML:

Document: represents the whole XML document, which is a tree structure

Eelement: an XML element that provides methods to manipulate its child elements, such as text, attributes, and namespace

Attribute: indicates the attribute contained in the element

Text: represents XML text information

XML output putter: XML output stream. The bottom layer is realized by JDK stream

Format: provides the encoding, style and typesetting of XML file output

We found that the output operation of JDOM is much more convenient than the traditional DOM, and it is also more intuitive, including the output time is very easy. At this time, we observed that JDOM supports DOM parsing, but it also supports the features of Sax; Therefore, Sax can be used for parsing.

//Get Sax parser
SAXBuilder builder = new SAXBuilder();
File file = new File(“resource/demo01.xml”);
//Getting documents
Document doc = builder.build(new File(file.getAbsolutePath()));
//Get root node
Element root = doc.getRootElement();
//Get all the child nodes under the root node, or get the specified direct point according to the label name
List<Element> list = root.getChildren();
for(int x = 0; x<list.size(); x++){
Element e = list.get(x);
//Get the name of the element and the text inside
String name = e.getName();
System.out.println(name + “=” + e.getText());

Analysis of Dom4j

Dom4j is a simple open source library, which is used to process XML, XPath and XSLT. It is based on Java platform, uses Java collection framework, and integrates DOM, sax and jaxp. Download path:



Dom4j, like JDOM, is a free XML open source component. However, it is widely used in current development frameworks, such as hibernate and spring. As an introduction, you can have an understanding of this component. There is no one who is good or bad. Most frameworks use Dom4j, but we usually use JDOM. It can be found that Dom4j has played a lot of new features, such as the output format can be very good.

File file = new File(“resource/outputdom4j.xml”);
SAXReader reader = new SAXReader();
//Read file as document
Document doc = reader.read(file);
//Gets the root element of the document
Element root = doc.getRootElement();
//Find all the child nodes according to the following elements
Iterator<Element> iter = root.elementIterator();
Element name = iter.next();
System.out.println(“value = ” + name.getText());

Creation of extended XML

DOM creation

If you want to generate an XML file, you should use the newdocument () method when creating the document

If you want to output DOM documents, it is troublesome. Write multiple copies at once

public static void createXml() throws Exception{
//Get parser factory
DocumentBuilderFactory factory=DocumentBuilderFactory.newInstance();
//Get parser
DocumentBuilder builder=factory.newDocumentBuilder();
//Create document
Document doc=builder.newDocument();
//Creating elements and setting relationships
Element root=doc.createElement(“people”);
Element person=doc.createElement(“person”);
Element name=doc.createElement(“name”);
Element age=doc.createElement(“age”);
//Write it out
//Get transformer factory
TransformerFactory tsf=TransformerFactory.newInstance();
Transformer ts=tsf.newTransformer();
//Set encoding
ts.setOutputProperty(OutputKeys.ENCODING, “UTF-8”);
//Create a new input source with DOM nodes to act as the holder of the transformation source tree
DOMSource source=new DOMSource(doc);
//Act as the holder of the transformation result
File file=new File(“src/output.xml”);
StreamResult result=new StreamResult(file);
ts.transform(source, result);

Sax creation

//Create a saxtransformerfactory object
SAXTransformerFactory stf = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
try {
//Create a transfomerhandler object through the saxtransformerfactory object
TransformerHandler handler = stf.newTransformerHandler();
//Create a transformer object through the transformerhandler object
Transformer tf = handler.getTransformer();
//Set the properties of the Transfomer object
tf.setOutputProperty(OutputKeys.ENCODING, “UTF-8”);
tf.setOutputProperty(OutputKeys.INDENT, “yes”);
//Create a result object and associate it with the handler
File file = new File(“src/output.xml”);
Result result = new StreamResult(new FileOutputStream(file));
//Writing XML content through handler
//Open document
AttributesImpl attr = new AttributesImpl();
//Create root node Bookstore
handler.startElement(“”, “”, “bookstore”, attr);
attr.addAttribute(“”, “”, “id”, “”, “1”);
handler.startElement(“”, “”, “book”, attr);
handler.startElement(“”, “”, “name”, attr);
Handler. Characters (“rehabilitation guidelines for cervical spondylosis”. Tochararray (), 0, “rehabilitation guidelines for cervical spondylosis”. Length ());
//Close each node
handler.endElement(“”, “”, “book”);
handler.endElement(“”, “”, “bookstore”);
} catch (SAXException e) {
// TODO Auto-generated catch block
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
} catch (IOException e) {
// TODO Auto-generated catch block
} catch (TransformerConfigurationException e) {
// TODO Auto-generated catch block

JDOM creation

//Create node
Element person = new Element(“person”);
Element name = new Element(“name”);
Element age = new Element(“age”);
//Create properties
Attribute id = new Attribute(“id”,”1″);
//Set text
//Set relationship
Document doc = new Document(person);
XMLOutputter out = new XMLOutputter();
File file = new File(“resource/outputjdom.xml”);
out.output(doc, new FileOutputStream(file.getAbsoluteFile()));

Dom4j creation

//Using documenthelper to create a document object
Document document = DocumentHelper.createDocument();
//Create elements and set relationships
Element person = document.addElement(“person”);
Element name = person.addElement(“name”);
Element age = person.addElement(“age”);
//Set the text name. Settext (“lebyte”);
//Creating a formatted output
OutputFormat of = OutputFormat.createPrettyPrint();
//Output to file
File file = new File(“resource/outputdom4j.xml”);
XMLWriter writer = new XMLWriter(new FileOutputStream(new  File(file.getAbsolutePath())),of);