XML represents an extensible markup language. It is a text-based markup language, derived from the Standard General Markup Language (SGML).
XML tags identify data and are used to store and organize data, rather than specifying how to display it, which is used to display data like HTML. XML will not replace HTML in the near future, but it introduces possibilities by adopting many successful features in HTML.
There are three important features of XML that make it useful for a variety of systems and solutions:
XML is extensible: XML allows us to create our own self-describing tags or languages for applications.
XML carries data, but does not render it: XML allows us to store data, no matter how it is rendered.
XML is a common standard: XML is developed by an organization called the World Wide Web Consortium (W3C), and it can be used as an open standard.
Use of XML
This short list of XML uses illustrates everything:
XML can work behind the scenes to simplify creating HTML documents for large websites.
XML can be used to exchange information between organizations and systems.
XML can be used to uninstall and reload databases.
XML can be used to store and organize data, and can also customize data processing requirements.
XML can easily merge style sheets to create almost any desired output.
In fact, any type of data can be represented as an XML document.
What is markup?
XML is a markup language that defines a set of rules for encoding documents in human-readable and machine-readable formats. So what is Markup Language? Markup is the information added to a document to improve its meaning in a way that identifies parts and how they relate to each other. More specifically, a markup language is a set of symbols that can be placed in the text of a document to partition and mark certain parts of the document.
The following example shows what an XML tag embedded in a piece of text looks like:
- <text>Hello, world!</text>
This fragment contains markup symbols or labels, such as <message> </message> and <text> </text>. Labels < message > and </message > denote the beginning and end of this XML code fragment. Labels < text > and </text > wrap the text Hello World!.
Is XML a programming language?
A programming language for creating computer programs consists of grammar rules and its own vocabulary. These programs instruct the computer to perform specific tasks. XML does not qualify as a programming language because it does not perform any computations or algorithms. It is usually stored in simple text files and processed by special software that can interpret XML.
The following is a complete XML document:
- <?xml version=“1.0”?>
- <name>Tanmay Patil</name>
- <phone>(011) 123-4567</phone>
You can notice that there are two types of information in the example above:
Markup, such as <concat-info>.
Text or character data, such as _Tutorials Point and (011) 123-4567_.
The following figure describes the grammatical rules for writing different types of tags and texts in an XML document:
Let’s take a detailed look at each of the components in the figure above:
An XML document can have an optional XML declaration. It can be written in the following forms:
<?xml version=”1.0″ encoding=”UTF-8″?>
Here version is the XML version, encoding specifies the character encoding used in the document.
Syntax rules for XML declarations
XML declarations are case-sensitive and must begin with “<? Xml>”, where “xml” is lowercase.
If the document contains an XML declaration, it must be the first statement of the XML document.
An XML declaration must be the first statement in an XML document.
An HTTP protocol can be used to override the encoding value specified in the XML declaration.
Labels and elements
The structure of an XML file consists of several XML elements, also known as XML nodes or XML tags. The names of the XML elements are closed with angle brackets <>, as follows:
Grammatical rules for labels and elements
Element grammar: Each XML element must be closed or with start and end elements, as follows:
Or abbreviated form, like this:
Element nesting: An XML element can contain multiple XML elements as its sub-elements, but the sub-elements cannot overlap. For example, the end tag of an element must have the same name as the recently matched start tag.
The following example shows incorrect nested tags:
- <?xml version=“1.0”?>
The following example shows the correct nested tags:
- <?xml version=“1.0”?>
Root element: An XML document has only one root element. For example, the following is an incorrect XML document because the X and Y elements appear at the top level without a root element:
The following example shows the correct form of an XML document:
Case-sensitive: The names of XML elements are case-sensitive. This means that the start and end labels of elements must be the same case.
For example, <contact-info> and <Contact-Info> are different.
Attribute uses name/value pairs to assign an attribute to an element. An XML element can have one or more attributes. For example:
- <a href=“http://www.tutorialspoint.com/”>Tutorialspoint!</a>
Here href is the attribute name, and http://www.tutorialspoint.com/ is the attribute value.
Syntax rules for XML attributes
XML attribute names are case sensitive (unlike HTML). That is to say, HREF and href are considered to be two different XML attributes.
There can be no two identical attributes in grammar. The following example shows the incorrect syntax because attribute B has been specified twice:
- <a b=“x” c=“y” b=“z”>….</a>
Property names are defined without quotation marks, and property values must be displayed in quotation marks. The following example demonstrates incorrect XML syntax:
- <a b=x>….</a>
In the above grammar, attribute values are not defined in quotation marks.
References usually allow us to add or include additional text in an XML document. The reference always begins with the symbol “&”, which is a reserved character and ends with the symbol “;”. There are two types of references in XML:
Entity reference: A name is included between the start and end delimiters of an entity reference. For example, & amp;, where amp is the name. This name usually points to a predefined text string or tag.
Character references: These contain references such as & # 65; contain a hash tag (“”), followed by a number. This number always points to the Unicode code code of a character. Here, 65 points to the letter “A”.
The names of XML elements and XML attributes are case-sensitive. This means that the start and end labels of elements must be case-sensitive.
To avoid character encoding problems, all XML files should be saved as Unicode UTF-8 or UTF-16 files.
Blank characters such as spaces, tabs, and line breaks between XML elements and XML attributes are ignored.
Some characters are reserved by the XML grammar itself. Therefore, they cannot be used directly. To use them, you need to use some alternative entities. Here are some: