Complete parsing of DTD document type definition in XML

Time:2019-6-1

I. What is DTD
DTD, which is called Document Type Definition, is a file definition format. It specifies the structure of the XML file and provides the grammar and rules for the XML file. Define the structure of the XML file in DTD, and then write the XML file according to the declaration of DTD. It is like a function definition in a programming language. When using a function, it should be referenced according to the format of the function declaration.
2016624115119673.png (323×176)

II. DTD Detailed Explanation
1. Detailed examples

XML/HTML CodeCopy content to clipboard
  1. <?xml version=‘1.0’ encoding=‘utf-8’?>     
  2. <! – Statement of internal DTD – >     
  3. DOCTYPE Film Catalogue[
  4. <! ELEMENT Film Catalogue (Films)+>   <! — Declares the child element of the top-level element of XML, “Movie” and “+” denotes one or more movie child elements – >”     
  5. <! ELEMENT Film (Title, Star, Director, Introduction)>     <! – Declare the child elements of the “film” element     
  6. ATTLIST Film Category CDATA “Action” Year CDATA #REQUIRED> <! – Declare the attributes of the “movie” element, the two attributes are “category” and “year”, and the CDATA indicates that the attributes are of character type – >.     
  7. ENTITY: “Heavy snow, three duels in the snow”>   <! — The declaration of an entity, typed as a character, is quoted directly below with “& entity name;”     
  8. ENTITY Huo Yuanjia: “National Hero, Struggle against Western Imperialism”>     
  9. <! ELEMENT Title (#PCDATA)>     
  10. <! ELEMENT (#PCDATA)>     
  11. <! ELEMENT Director (#PCDATA)>     
  12. <! ELEMENT (#PCDATA)>     
  13. ]>     
  14. <! — XML obtained by DTD – >     
  15. <Movie catalogue>     
  16.    <Film Category= “Swordsman” Year= “2008”>     
  17.          <title>Ambush on all sides</title>     
  18.          <To star>Liu Dehua, Jincheng Wu and Zhang Ziyi</To star>     
  19.          <director>Zhang Yimou</director>     
  20.          <brief introduction>& Ambush on all sides;</brief introduction>     
  21.    </Film>     
  22.    <Film Category= “Swordsman” Year= “2006”>     
  23.        <title>Huo Yuanjia</title>     
  24.        <To star>Jet Li</To star>     
  25.        <director>Yu Ren Tai</director>     
  26.        <brief introduction>& Huo Yuanjia;</brief introduction>     
  27.    </Film>     
  28. </Movie catalogue>    

1.1 DTD declaration start statement

(1) Internal Statement: <! DOCTYPE [Specific DTD Statement]>
(2) External declaration: <! DOCTYPE refers to DTD root element keyword (SYSTEM/PUBLIC) “dtd file name/dtd file network address”>
There are many forms of external declaration, which are mainly divided into SYSTEM and PUBLIC files.
SYSTEM: A DTD that is commonly used in many XML documents written by an author or organization.
PUBLIC: DTDs developed by authoritative bodies for specific industries or the public.
1.2 Other Statements
(1) Elements:

XML/HTML CodeCopy content to clipboard
  1. <!ELEMENT element_name element_definition>  

(2) Attribute list:

XML/HTML CodeCopy content to clipboard
  1. <!ATTLIST Element_Name   
  2.    Attribute_Name Type [added_declare]   
  3.    Attribute_Name Type [added_declare]   
  4.    ……   
  5. >  

(3) Entities
inside
General: <! ENTITY Entity_Name Entity_Value>
Parameter: <! ENTITY Entity_Name SYSTEM Entity_URL>
external
General: <! ENTITY%. Entity_Name Entity_Value>
Parameter: <! ENTITY%! Entity_Name SYSTEM Entity_URL>

2. Detailed Content
2.1 Element Statement
2016624115142958.png (442×446)

In element declaration, attention should be paid to the number, selectivity and mixing of several special element declarations and sub-elements, which are similar to arithmetic and logical operators in programming languages. The following is an example of DTD with multiple elements.

XML/HTML CodeCopy content to clipboard
  1. <?xml version=‘1.0’ encoding=‘utf-8’?>     
  2. DOCTYPE Film Catalogue[
  3. <! ELEMENT Film Catalogue (Films, Others, Descriptions)+>   <! – Use the “+” number to indicate at least one occurrence of child elements in the film catalog – >.     
  4. <! ELEMENT Other EMPTY>   <! – Use EMPTY keyword to declare null elements – > Use EMPTY keyword to declare null elements.     
  5. <! ELEMENT Note ANY>     <! – Use ANY keywords to declare elements of any content – >     
  6. <! ELEMENT Film (Title, Star, Director, Introduction)>    <! – Element declaration format with child elements – >      
  7. ATTLIST Film
  8. Name ID # FIXED “Decameron Ambush”
  9. Category CDATA “Action”
  10. Year CDATA#REQUIRED
  11. Box office CDATA #IMPLIED
  12.      
  13.     >    <! – Attribute Statement – >     
  14. ENTITY Introduction “Heavy snow, three duels in the snow”>     
  15. <! ELEMENT Title (#PCDATA)>     
  16. <! ELEMENT (#PCDATA)>     
  17. <! ELEMENT Director (#PCDATA)>     
  18. <! Introduction to ELEMENT (#PCDATA)>     
  19. ]>    

To get a deeper understanding of the basic syntax of element declarations, download the map.
2.2 Naming conflicts

Namespaces and prefix identifiers are introduced to avoid the occurrence of elements with the same name in a complex XML document.
2.2.1 Namespace
Introduce a namespace using xmlns to tell the user which part of the space belongs to. It works a little like namespaces in other programming languages, ensuring the uniqueness of elements and avoiding conflicts.

XML/HTML CodeCopy content to clipboard
  1. <?xml version=“1.0” encoding=‘utf-8’?>     
  2. <Filmxmlns:h=‘http://www.abc.edu’ xmlns:c=‘http://www.123.edu’><! — Use xmlns: to refer to namespaces – >     
  3.   <db>     
  4.     <h:table>werer</h:table>    <! – Tell the user that this table is defined in http://www.abc.edu – >     
  5.     <c:table>fdfdsfsdf</c:table>    <! – Tell the user that this table is defined in http://www.123.edu – >     
  6.   </db>     
  7. </Film>   

 

Role: Standardize elements and attributes, and add a unique flag to them; ensure that there are no conflicts in element names, and clarify their sources.
2.2.2 Prefix Identification
Add an identifier before the element name and attribute name to uniquely distinguish which DTD the current element or attribute comes from. It is often used in conjunction with namespaces, such as <h:table> and <c:table> in the example above.

3. Entity Explanation

Why do we introduce entities when we have elements? In order to distinguish the two, we must first look at the purpose of entity introduction. Entity mechanism is a time-saving tool that incorporates different types of data into XML documents. It is like an object-oriented Abstract class, which abstracts the commonly used objects into an entity and can be directly referenced where it is used, avoiding duplication.
In detail
(1) Instead of characters that cannot be entered, the keyboard has only 26 letters and some simple punctuation symbols, while the character set has more than N symbols that cannot be entered in the keyboard.
(2) Substitute some contents that conflict with the reserved words of the XML specification, such as: <> and so on.
(3) Replace large repetitive texts.
Entity references are divided into internal and external references according to the location of references, and general and parameter references according to the content of references. Let’s look at an example of an external entity reference:

Listing 1: Statement of “2.dtd”

XML/HTML CodeCopy content to clipboard
  1. <! – Declare the external DTD and save it to 2.dtd – >.     
  2.       
  3. <! ELEMENT Film Catalogue (Films)+>     
  4. <! ELEMENT Film (Title, Star, Director, Introduction)>     
  5. ATTLIST Film Category CDATA “Action” Year CDATA #REQUIRED>     
  6. ENTITY: “Heavy snow, three duels in the snow”>     
  7. ENTITY Huo Yuanjia: “National Hero, Struggle against Western Imperialism”>     
  8. <! ELEMENT Title (#PCDATA)>     
  9. <! ELEMENT (#PCDATA)>     
  10. <! ELEMENT Director (#PCDATA)>     
  11. <! ELEMENT (#PCDATA)>     
  12. ENTITY Filmcomment SYSTEM “Movie Review. xml”>       <! — Refers to an external generic entity with the file name “Movie Review. xml” –>.    

Listing 2: Movie Review. Contents of XML

XML/HTML CodeCopy content to clipboard
  1. <?xml version=“1.0” encoding=‘utf-8’?>     
  2. <Film review>     
  3. These reviews are all produced by XXX Company. They are worth watching!
  4. </Film review>    

Listing 3: Use the contents of the DTD XML file.

XML/HTML CodeCopy content to clipboard
  1. <?xml version=“1.0” encoding=‘utf-8’?>     
  2. <Film review>     
  3. These reviews are all produced by XXX Company. They are worth watching!
  4. </Film review>    

Listing 3: Use the contents of the DTD XML file.

XML/HTML CodeCopy content to clipboard
  1. <?xml version=“1.0” encoding=‘utf-8’?>     
  2. DOCTYPE Film Catalogue SYSTEM “. / 2. dtd”>     
  3. <Movie catalogue>     
  4.     <Film Category= “Swordsman” Year= “2008”>     
  5.         <title>Ambush on all sides</title>     
  6.         <To star>Liu Dehua, Jincheng Wu and Zhang Ziyi</To star>     
  7.         <director>Zhang Yimou</director>     
  8.         <brief introduction>& Ambush on all sides;</brief introduction>     
  9.     </Film>     
  10.     <Film Category= “Swordsman” Year= “2006”>     
  11.         <title>Huo Yuanjia</title>     
  12.         <To star>Jet Li</To star>     
  13.         <director>Yu Ren Tai</director>     
  14.         <brief introduction>& Huo Yuanjia;</brief introduction>     
  15.     </Film>     
  16.      &filmcomment;     
  17. </Movie catalogue>    

     
Listing 4: Use IE8 to open the contents after Listing 3

XML/HTML CodeCopy content to clipboard
  1. <?xml version=“1.0” encoding=“utf-8” ?>     
  2. <! DOCTYPE Film Catalogue (View Source for full doctype…)>     
  3. – <Movie catalogue>     
  4. –   <Film Category= “Swordsman” Year= “2008”>     
  5.        <title>Ambush on all sides</title>     
  6.        <To star>Liu Dehua, Jincheng Wu and Zhang Ziyi</To star>     
  7.        <director>Zhang Yimou</director>     
  8.        <brief introduction>Heavy snow, three people duel in the snow</brief introduction>     
  9.     </Film>     
  10. –   <Film Category= “Swordsman” Year= “2006”>     
  11.        <title>Huo Yuanjia</title>     
  12.        <To star>Jet Li</To star>     
  13.        <director>Yu Ren Tai</director>     
  14.        <brief introduction>National Heroes, Struggling against Western Imperialism</brief introduction>     
  15.     </Film>     
  16.     <Film review>These reviews are all produced by XXX Company. They are worth watching!</Film review>     
  17.   </Movie catalogue>    

Internal and external are easy to understand, mainly looking at the difference between general and parameter references.

1. Parametric entities

Listing 1: test. dtd, where this content exists separately in a DTD file because it is in the internal DTD subset.
Parametric entity references cannot occur inside the tag declaration, but can occur where the tag declaration allows them to occur. However, for external DTD subsets, there is no such limitation.
XML/HTML CodeCopy content to clipboard

  1. <! — Declare the external DTD and save it as test.dtd–>.     
  2. <! – Personal information entities declare parameter types, which can be used together in various elements – >.     
  3. ENTITY% Personal Information “(Name, Sex, Date of Birth)”>     
  4. <! ELEMENT Student Information% Personal Information;>     
  5. <! ELEMENT Teacher Information% Personal Information;>     
  6. <! ELEMENT Employee Information% Personal Information;>    

Listing 2: School Information. XML file, referring to the external test. DTD file

XML/HTML CodeCopy content to clipboard
  1. <?xml version=‘1.0’ encoding=‘utf-8’?>     
  2. <! – School Information. XML File – >      
  3. <! – Reference to external DTD – >     
  4. DOCTYPE School Information SYSTEM’. / test. dtd’>     
  5. <! — XML obtained by DTD – >     
  6. <School information>     
  7.  <Student information>     
  8.   <Full name>Zhang San</Full name>     
  9.   <Gender>male</Gender>     
  10.   <Date of birth>2013-10-12</Date of birth>     
  11.  </Student information>     
  12.  <Teacher information>     
  13.   <Full name>Zhang San</Full name>     
  14.   <Gender>male</Gender>     
  15.   <Date of birth>2013-10-12</Date of birth>     
  16.  </Teacher information>     
  17.  <Employee information>     
  18.   <Full name>Zhang San</Full name>     
  19.   <Gender>male</Gender>     
  20.   <Date of birth>2013-10-12</Date of birth>     
  21.  </Employee information>     
  22. </School information>    

       
Listing 3: After opening the contents of Listing 2 with IE8

XML/HTML CodeCopy content to clipboard
  1. <?xml version=“1.0” encoding=“utf-8” ?>     
  2. – <!–   
  3. Declare internal DTD 
  4.   –>     
  5. DOCTYPE School Information (View Source for full doctype…)>     
  6. – <!–   
  7. XML Obtained by DTD 
  8.   –>     
  9. – <School information>     
  10. –    <Student information>     
  11.        <Full name>Zhang San</Full name>     
  12.        <Gender>male</Gender>     
  13.        <Date of birth>2013-10-12</Date of birth>     
  14.      </Student information>     
  15. –    <Teacher information>     
  16.         <Full name>Zhang San</Full name>     
  17.         <Gender>male</Gender>     
  18.         <Date of birth>2013-10-12</Date of birth>     
  19.      </Teacher information>     
  20. –    <Employee information>     
  21.         <Full name>Zhang San</Full name>     
  22.         <Gender>male</Gender>     
  23.         <Date of birth>2013-10-12</Date of birth>     
  24.      </Employee information>     
  25.   </School information>    

 
2. General entities

References can be made in either XML elements or DTDs, but parameter entities can only be referenced in DTDs, and usually only in external DTD documents.

3. Contrast sublimation

The differences between parametric entities and general entities are as follows:

(l) When defining a parameter entity, a “%” number must be added before the entity name.

(2) Parametric entity references start with “%”, not with “&” referenced by general entities.

(3) The content of the parameter entity can contain not only text, but also tags.

(4) Parametric entities can only be used in DTD, but not in document ontology. That is, parameter entities can only be used to constitute DTD content, but not document content.

(5) Parametric entities can only be used in external DTD documents, but can not be applied to internal DTD.

The differences between external parameter entities and external general entities are as follows:

(1) External parameter entities are applied to independent DTD documents, while general external entities are applied to XML documents.

(2) External parameter entities are used to combine multiple independent DTD documents into a large DTD document, and external general entities are used to combine multiple independent XML documents into a large XML document.

IV. Verifying the Legitimacy of XML Documents

DTD defines the use format of XML documents. It restricts the structure and form of XML documents. By referring to DTD, a unified and standardized XML document can be formed. In addition, the content of DTD and XML documents can be simplified by using entities. XML documents validated by DTD can be called normalized documents, so how to verify that the written XML documents conform to DTD specifications? Through the following code string:

Java CodeCopy content to clipboard
  1. import javax.xml.parsers.DocumentBuilder;     
  2. import javax.xml.parsers.DocumentBuilderFactory;     
  3.      
  4. import org.xml.sax.InputSource;     
  5.      
  6. public class ValidateDTD     
  7. {     
  8.     public static void main(String[] args){     
  9.              
  10.         // The XML and specification DTDs that need to be validated need to be included in the jar before validation  
  11.         try{     
  12.             DocumentBuilderFactory dbf=DocumentBuilderFactory.newInstance();    // Create a Document Construction Factory  
  13.             dbf.setValidating(true);     
  14.             DocumentBuilder builder=dbf.newDocumentBuilder();     
  15.             builder.parse(new InputSource(“xml-2-2.xml”));  // XML name to be validated  
  16.         }catch(Exception e){     
  17.             e.printStackTrace();     
  18.         }     
  19.     }     
  20. }    

The classes and structures in the above code mainly complete the parsing of the XML document, and verify whether the current XML file conforms to a DTD definition before parsing. Before running the above code, you need to import validated XML and DTD documents to the current Validate DTD project, and then run the above code instance. The project will automatically find the standard DTD in the project file, and then validate the XML file.

Five, conclusion
So far, the content of document definition format has been basically discussed once. From the initial element declaration to the complex and changeable entity type, the introduction of DTD undoubtedly specifies a unified standard for the use of XML. This standard is a rule stipulated by the provider and followed by the user. Finally, how to verify the validity of the XML that refers to DTD is discussed. In addition, the structure of XML documents is not only DTD, DTD is an early definition format. It has many shortcomings, such as not supporting data types, not easy to expand, etc. To avoid this shortcoming, Schema was introduced later. It is the successor of DTD. The next blog will focus on Schema.