XML learning summary


1. Overview

xml:extensiable markup Language is called extensible markup language
A brief history of XML
GML (General Markup Language) – Data Specification for communication on different machines
SGML (Standard General Markup Language)
HTML (Hypertext Markup Language)

1. Before we did not have XML, we used string as the communication between two programs, but string is not good at relational data structure, which will be ambiguous in description
2. HTML language itself has defects
    For example, the tag is fixed, and there is no real internationalization

XML is a good solution to these problems

2. The use of XML

1. Configuration files: for example, web.xml and server.xml of Tomcat. XML can clearly describe the relationship between programs.
2. The format of XML is universal, which can reduce the complexity of data exchange.
3. As a small database, if we need to manually configure it sometimes, XML is a good choice to act as a small database. It is obviously faster for the program to read XML files directly than the database.

3. Technical framework

XML data or XML documents are only used to organize and store data, and other operations such as data generation, reading and transmission have nothing to do with XML itself.
Therefore, if you want to operate XML, you need to use technologies other than XML
Rules for XML: DTD or schema technology is now commonly used.
Parsing XML data: DOM or Sax technology is generally used, and each has its own advantages.
Provide style: XML is generally used to store data, but designers have great ambition and want to use it to display data, so they have the extensible stylesheet language transformation (XSLT).

4. XML syntax

Document declaration

The XML declaration is placed on the first line of XML
Version --- version
Encoding -- encoding
Standalone -- used independently -- no by default. Standalone indicates whether the XML is independent. If yes, it indicates that the XML document is independent and cannot refer to external DTD specification files; If it is no, then the XML document is not independent, indicating that the external DTD specification document can be referenced.
The position of the property cannot be changed with the correct document declaration format.
<?xml version="1.0" encoding="utf-8" standalone="no"?>

First of all, let’s explain a concept: in XML, elements and tags do not refer to the same thing, so don’t be confused by different name locks.
What should be noticed in elements

Spaces and line breaks in XML elements are treated as element contents.
There must be and only one root element in each XML document
The element must be closed
Case sensitive
Cross nesting is not allowed
Cannot start with a number


< China name = "China" > < / China >


When writing an XML file, some content may not be parsed and executed by the parsing engine, but treated as the original content. In this case, you can use the CDATA area
... content

Escape character
XML learning summary
Processing instruction
PI(processing instruction)。 Processing instructions are used to direct the parsing engine to parse the content of XML documents.
For example:

In the XML document, you can use the XML stylesheet instruction to inform the XML parsing engine, and apply the CSS file to display the content of the XML document.
<?xml-stylesheet type="text/css" href="1.css"?>


1. Jaxp: mainly responsible for parsing XML
2. JAXB: mainly responsible for mapping XML to Java objects

What is XML parsing

XML is used to organize and store data. The operations of data generation, reading and transmission are independent of XML itself.

XML parsing operation

1. DOM (document object model) is a method of parsing XML recommended by W3C
Sax (simple API for XML), which is the standard of XML community, is supported by almost all XML parsers.

XML learning summary
The application program does not directly operate the XML document, but analyzes the XML document by the XML parser, and then the application program operates the analysis results through the DOM interface or Sax interface provided by the XML parser, thus indirectly realizing the access to the XML document!
XML learning summary

DOM parsing operation

DOM parsing is an object-based API, which loads the content of XML into memory and generates the model corresponding to the content of XML document! When parsing is completed, DOM object tree corresponding to the structure of XML document will be generated in memory, so that the document can be operated in the form of nodes according to the structure of tree.
DOM parsing will load the XML document into memory, and the elements that generate the DOM tree exist in the form of objects. We can operate the XML document by operating these objects!

XML learning summary
1. The node above a node is the parent of the node
2. The node under a node is the child of the node
3. In the same level, the node with the same parent node is sibling
4. The parent, grandfather node and all the nodes above are the ancestors of the node
There are several core operation interfaces in DOM parsing

1. Document -- represents the entire XML document, and all elements in the XML file can be accessed through the document node
2. Node -- node is almost equivalent to object of ordinary Java class in XML operation interface. Many core interfaces implement it, as can be seen from the following diagram.
3. NodeList -- represents a collection of nodes, usually a collection of nodes in a node!
4. Namenodemap -- represents the one-to-one correspondence between a group of nodes and their unique names. It is mainly used to represent attribute nodes

###Sax analysis

Sax uses a sequential access mode, which is a fast way to read XML data. When the Sax parser operates, a series of events Sax will be triggered. Using event processing to parse XML documents, using Sax to parse XML documents, involves two parts: parser and event processor.
Sax is a push mechanism. You create a Sax parser and the parser will tell you when it finds the content in the XML document. How to deal with it is up to the programmer.

XML learning summary

The difference between Dom and sax parsing

DOM -- DOM tree in memory. If the document is too large, it will overflow.
Sax -- partial read, can handle large files, can only parse the files in order from beginning to end, does not support adding or deleting the query.


It's just to overcome the shortcomings of DOM and Sax

1. Access

//Get parser
SAXReader saxReader = new SAXReader();

//Get the stream object of the XML file
InputStream inputStream = DOM4j.class.getClassLoader().getResourceAsStream("1.xml");

//Read XML file through parser
Document document = saxReader.read(inputStream);

2. Get the document object

1. Read the XML file to get the document object
SAXReader reader = new SAXReader();
Document document = reader.read(new File("input.xml"));

2. Parse the XML text to get the document object
String text = "<members></members>";
Document document=DocumentHelper.parseText(text);

3. Create document object actively
Document document =DocumentHelper.createDocument();

//Create root node
Element root = document.addElement("members");


It can help us get XML nodes more easily.