XML Advancement: An Introduction to Level 1 – XML

Time:2019-8-13

By Rob Sheldon, 2014/03/26 (first published in: 2012/09/20)

About Series

This article belongs to the Advanced Series: XML Advancement

Since 2003, XML has been part of the SQL standard, and many dynamic management views return XML data, so it is also critical for every database administrator. Now, the industry is more used for data defined by document tags, and database developers and database administrators are more familiar with XML technology and the way it is used than ever before. In this series of articles, Robert Sheldon will try to simplify complex things. Note: This series of articles is available as e-books.


Extensible Markup Language (XML) first provides support in the SQL Server 2000 release of SQL Server. However, XML-related functions are limited to data management functions that focus on mapping relational data and XML data. For example, the FOR XML clause has been added to SQL Server 2000 to return relational query results as XML.

But by the time SQL Server 2005 is released with the addition of XML data types, it makes sense to support XML. The XML data type allows the XML document itself to be stored in columns and variables that use this type of configuration. Data types also support a set of methods that can be used to retrieve and modify specific components in an XML document.

To take full advantage of the XML-related capabilities supported by SQL Server, you may find it useful to have a basic understanding of XML itself. To this end, the first part of the XML Advanced Series will introduce what XML is and the various components that make up an XML document.

Overview of XML

Similar to Hypertext Markup Language (HTML), XML is also a markup language that uses tags to describe the nature of the data associated with these tags. What makes XML extensible is its self-describing nature, which is to create tags specific to the data values contained in an XML document. In HTML, these tags are predefined. (With XML components, the extensibility of XML will become clearer.)

Despite its scalability, XML is still a standardized language and must conform to a specific set of format rules defined by the World Wide Web Consortium (W3C). Because this standardization is different from HTML for displaying data, this language is widely used for transmitting and storing data.

XML can easily share data between heterogeneous systems, regardless of hardware, operating system or application type, the universal use of XML means that data can be processed with less human intervention. At the same time, it can control the description, arrangement and display of data.

XML Components

The main components that make up an XML document and the rules governing their use are usually very simple, but in order for an XML document to be properly processed by an SQL Server XML parser, you must strictly follow some rules.

There are two main types of information in XML documents: the data to be stored and the label describing the data. Labels consist of a set of parentheses (<>), which contain descriptive words or compounds (no spaces) describing the data associated with the label. Because of the self-describing nature of these tags, XML is often considered a meta-language.

Each discrete piece of stored data is encapsulated in the start tag and the end tag, as shown in the following example:

<Person>John Doe</Person>

In this case, the start tag is<Person>The closing tag is</ Person>。 Note that the forward slash precedes the tag description in the end tag. The forward slash must precede all end tags, but the language of the tag must be the same as the beginning tag. The example above is Person. I can choose a name other than Person, including one unrelated to people, but it’s a good idea to always provide the tag name that best describes the data contained in the open and close tags. In this case, the tag describes a person’s name, John Doe, so the tag name is<Person>

Similarly, labels and enclosed data represent a single element. However, elements do not always contain data. An empty element can be presented in one of two ways. The first is to open and close labels by specifying, but excluding data, as shown in the following example:

<Person></Person> 

Another way to represent a blank element is to use only a set of parentheses, but still include forward slashes:

<Person />

Furthermore, this method can only be used if the element does not contain a value. Later in the XML Advanced Series, you may see that schemas need an element with no value. In this case, the abbreviation format can be used to represent the two labels of the element.

Whether the element contains a value or not, the start and close tags must match exactly to case (except for the forward slash in the end tag) as long as two tags are used. For example, the following elements generate errors in the SQL Server XML parser because the two tags do not match:

<person> John Doe </ Person>

Descriptive words in the start tag are all lowercase; however, descriptive words in the end tag should begin with capital letters. Start and end tags must match to be considered appropriate or well-formed XML.

But you can embed elements into each other. In the following example, I will<Person>Two instances of the element are embedded in<People>In the element:

<People>
<Person>John Doe</Person>
<Person>Jane Doe</Person>
</People>

Please note that each<Person>The elements themselves are complete. It includes opening and closing tags and their accompanying data. Elements embedded in other elements are called child, or in some cases subelements. External elements (in this case<People>) It’s the parent element. The parent element at the highest level of an XML document is considered the root element. All XML documents must have and have only one root element. Therefore, in the example above<People>The elements are two.<Person>The parent element of an element, which is the root element of an XML document.

SQL Server also allows XML fragments to be stored in XML columns or variables. Fragments are XML code blocks without root elements, such as the two elements shown in the following example:

<Person>John Doe</Person>
<Place>Seattle, WA</Place>

Elements must still be well-formed XML, that is, with matching tags containing data, but they are not necessarily XML documents. As you will see later in the XML Advanced Series, you can specify that only XML documents are allowed in XML columns or variables, but now you just need to know that SQL Server can distinguish and store XML documents and fragments.

When embedding elements in other elements, you must ensure that the child element is completed before the parent element is completed. For example, in the following example, I am<Person>It’s over before the element<People>Element, which results in an error in the generation of the SQL Server XML parser:

<People><Person>John Doe</People></Person>

You have to make sure that no matter how many levels contain embedded elements, your child elements are complete. In the following example,<FirstName>and<LastName>Elements are embedded in each<Person>In the element,<Person>Elements are embedded in<People>In the element:

<People>
<Person>
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person>
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>

Under these circumstances,<Person>Elements act as child and parent elements. Note, however, that each embedded element, regardless of its level, must fall within the start and end tags of the parent element. For example,<FirstName>and<LastName>The first instance of the element falls entirely on<Person>In the first instance of the element, and<Person>Two instances of the element fall entirely on<People>In the element, that is, the root element of the document.

Elements can also have attributes associated with them. Attributes are attributes that can be assigned. This property is defined as part of the element’s start tag. In the following example, I have added the ID attribute<Person>In each instance of the element:

<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>

As shown in the example, attributes are named by attributes (id in this case), followed by equals, and attribute values are enclosed by double quotes. Therefore,<Person>The ID attribute of the first instance of the element has a value of 1234, while the ID attribute of the element has a value of 1234.<Person>The ID attribute of the second instance of the element has a value of 5678.

Another component that many XML documents contain is a declaration that specifies at least the version of the XML standard that the document conforms to. So far, there are only two versions: 1.0 and 1.1. If you use XML 1.0, declarations are unnecessary; however, XML 1.1 requires declarations. Therefore, you should know how to include declarations in an XML document.

If you include a declaration, you must put it at the beginning of the document.<?Open the label and use it?>Close the label to end. In addition, you must include XML keywords (lowercase) and version attributes (also lowercase). Another property that is usually included (although optional) is encoding, which specifies the character encoding for XML documents. In the following example, I include a declaration that specifies the encoding of versions 1.0 and UTF-8, which means that the data is stored as an 8-bit Unicode character sequence:

<?xml version="1.0" encoding="UTF-8"?>
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>

You can also add comments to an XML document. To do this, you just need to<! -Note before label and end it with - >Labels, as I did in the following example:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>

As you can see, I have added a list of reviews for the current client and attached it to the comment tag. The SQL Server XML parser will ignore anything in the tag, so you can not only use the annotation function to provide information about the XML document and its data, but also suspend the part of the XML code that you do not want to process as a document.

Another consideration when using XML is that some characters will not be parsed when element values appear. For example, you cannot include an ampersand (&) in the value of an element, as I did in the following example<FavoriteBook>What is done in the child element:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<FavoriteBook>Crime & Punishment</FavoriteBook>
</Person>
</People>

If I try to assign this XML document to an XML column or variable,<FavoriteBook>The child element will cause the parser to generate errors because the “Crime & Punishment” value contains the & symbol. You must replace this type of character with an entity reference that tells the parser to retain the original intended character. Entity references begin with the ampersand end with semicolons, which contain multi-character codes representing the original values. For the amp symbol, the entity reference should be amp; I will use it in the following examples:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<FavoriteBook>Crime &amp; Punishment</FavoriteBook>
</Person>
</People>

Please note that I have used amp; Entity Reference. Now the XML parser processes<FavoriteBook>Elements will have no problems. Note, however, that the ampersand is not the only character that produces errors. The XML standard identifies five characters that should be replaced with entity references, as I did in the example above:

  • Less than (<): Replace with&lt;
  • Greater than (>): replaced by & gt;
  • Sum sign (&): Replace amp;
  • Apostrophe (‘): Replace with’
  • Quotation mark (“): Replaced with”

Another problem raised by this example is that child elements do not necessarily have to be the same from one parent instance to another. As you can see,<Person>The first instance of an element contains only<FirstName>and<LastName>Subelements, but<Person>The second instance of the element contains<FirstName>and<LastName>Subelements and<FavoriteBook>Elements. As long as your child elements are well formed, you can include any elements needed to describe and define data.

summary

At this level, we have seen the main components needed to make up an XML document. Elements are the basic building blocks of all XML documents. Each element is defined by the start tag and the end tag, and the data value itself is contained between the two tags. Elements can be embedded with each other, but an element – root – must be the parent of all other elements in an XML document. Elements can also include attributes that are defined as part of the element’s start tag.

We’ve learned as easily as possible how to organize an XML document, but the goal of this course is not to cultivate how to create these types of documents, but to provide an introduction to XML so that you can use XML more effectively in SQL Server. In the next section, we will show you how to implement XML data types in SQL Server and how to assign them to columns and variables to store XML documents and fragments.


original text

Stairway to XML: Level 1 – Introduction to XML

By Rob Sheldon, 2014/03/26 (first published: 2012/09/20)

The Series

This article is part of the Stairway Series: Stairway to >XML

XML has been part of the SQL Standard since 2003, and it is also essential for any DBA because so many of the dynamic management views return XML data. Now that the industry is more used to data defined by document markup, it is becoming more important than ever for Database Developers and DBAs to understand the technology and to know where it makes sense to use XML. In this series of articles, Robert Sheldon flexes his talent to make the complicated seem simple. Note: This series of articles is now available as an eBook.


Support for the eXtensible Markup Language (XML) was first introduced in SQL Server with the release of SQL Server 2000. However, XML-related features were limited to data management capabilities that focused on mapping relational and XML data. For example, SQL Server 2000 added the FOR XML clause, which lets you return relational query results as XML.

However, it wasn’t until the release of SQL Server 2005—when the XML data type was added—that support for XML got interesting. The XML data type lets you natively store XML documents in columns and variables configured with that type. The data type also supports a set of methods you can use to retrieve and modify specific components within the XML document.

To take full advantage of the XML-related features supported in SQL Server, you might find it useful to have a fundamental understanding of XML itself. To that end, this first Level of the Stairway to XML series explains what XML is and describes the various components that make up an XML document.

An Overview of XML

Similar to the HyperText Markup Language (HTML), XML is a markup language that uses tags to delineate and describe the nature of the data associated with those tags. What makes XML extensible is its self-describing nature, that is, you create tags that are specific to the data values contained in the XML document. In HTML, those tags are pre-defined. (XML’s extensible nature will become clearer as we work through the XML components.)

Despite its extensibility, XML is still a standardized language that must conform to a specific set of formatting rules, as defined by the World Wide Web Consortium (W3C). Because of this standardization, the language has been widely adopted in order to transport and store data, unlike HTML, which is used to display data. XML makes it possible to easily share data among heterogeneous systems, regardless of hardware, operating system, or application type, and XML’s universal adoption means that data can be processed with little human intervention. At the same time, you can control how the data is described, while also controlling how the data is ordered and displayed.

XML Components

The primary components that make up an XML document—and the rules that govern the use of >those components—are generally very straightforward, but you must adhere strictly to these rules in order for an XML document to be properly processed by the SQL Server XML parser.

There are primarily two types of information included in an XML document: the data to be stored and the tags that describe the data. A tag is made up of a set of angle brackets (< >) that enclose a descriptive word or compound word (no spaces) that describes the data associated with the tag. It’s because of the self-describing nature of these tags that XML is often considered a meta-language.

Each discrete piece of stored data is enclosed in an opening tag and a closing tag, as shown in the following example:

<Person>John Doe</Person>

In this case, the opening tag is <Person>, and the closing tag is </Person>. Notice that a forward slash precedes the tag description in the end tag. A forward slash must precede all end tags, but the language of the tag must be the same as the opening tag, which in the example above is Person. I could have chosen a name other than Person, including a name that has nothing to do with people, but a good practice is to always provide tag names that best describe the data enclosed in the opening and closing tags. In this case, the tags are describing the name of a person, John Doe, thus the tag name <Person>.

Together, the tags and enclosed data represent a single element. However, an element does not always have to contain data. An empty element can be rendered in one of two ways. The first is by specifying the opening and closing tags, but including no data, as I do in the following example:

<Person></Person> 

Another way to represent an empty element is use only one set of brackets, but still include the forward slash:

<Person />

Again, this method can be used only when an element contains no value. As you’ll see later in the Stairway to XML series, a schema might require an element for which there is no value. In that case, you can use the shortened format to represent the both tags of the element.

Whether or not an element contains a value, whenever both tags are used, the opening and closing tags must match exactly, down to the capitalization (except for the forward slash in the closing tag). For instance, the following element generates an error in the SQL Server XML parser because the case is different between the two tags:

<person>John Doe</Person>  

The descriptive word in the opening tag is all lowercase; however, the descriptive word in the closing tag begins with a capital letter. The opening and closing tags must match to be considered proper, or well formed, XML.

You can, however, embed elements within each other. In the following example, I embed two instances of the <Person> element within the <People> element:

<People>
<Person>John Doe</Person>
<Person>Jane Doe</Person>
</People>  

Notice that each <Person> element is complete in itself. It includes the opening and closing tags and the data they enclose. Elements embedded in other elements are referred to as child elements or, in some cases, subelements. The outer element, in this case, <People>, is the parent element. The parent element at the highest level of an XML document is considered the root element. All XML documents must have one, and only one, root element. So the <People> element in the example above is the parent element to the two <Person> elements, and it is the root element for the XML document.

SQL Server also permits you to store XML fragments in an XML column or variable. A fragment is a chunk of XML code without a root element, such as the two elements shown in the following example:

<Person>John Doe</Person>
<Place>Seattle, WA</Place>  

The elements must still be well formed XML, that is, have matching tags that enclose the data, but they don’t have to be an XML document. As you’ll see later in the Stairway to XML series, you can specify that only XML documents be permitted in an XML column or variable, but for now just know that SQL Server distinguishes between XML documents and fragments and can store both.

When you embed elements within other elements, you must ensure that the child elements are complete before you end the parent element. For instance, in the following example, I end the <People> element before the <Person> element, which causes the SQL Server XML parser to generate an error:

<People><Person>John Doe</People></Person>  

You must ensure that your child elements are complete no matter how many levels contain embedded elements. In this following example, the <FirstName> and <LastName> elements are embedded in each <Person> element, and the <Person> elements are embedded in the <People> element:

<People>
<Person>
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person>
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>  

In this case, the <Person> elements act as both child and parent elements. Notice, however, that each embedded element, regardless of the level, falls completely within the opening and closing tags of the parent element. For example, the first instances of the <FirstName> and <LastName> elements fall completely within the first instance of the <Person> element, and the two instances of the <Person> elements fall completely within the <People> element, which is the document’s root element.

Elements can also have attributes associated with them. An attribute is a property to which you can assign a value. The attribute is defined as part of the element’s opening tag. In the following example, I’ve added the id attribute to each instance of the <Person> element:

<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People> 

As the example demonstrates, an attribute consists of the attribute name (in this case, id), followed by an equal sign and the attribute value, enclosed in double quotes. So the id attribute for the first instance of the <Person> element has a value of 1234, and the id attribute for the second instance of the <Person> element has a value of 5678.

Another component contained in many XML documents is the declaration, which at a minimum specifies the version of the XML standard that the document conforms to. To date, there are only two versions: 1.0 and 1.1. If using XML 1.0, the declaration is not necessary; however, XML 1.1 requires one. For that reason, you should be aware of how to include a declaration in your XML document.

If you include a declaration, you must place it at the beginning of the document, start the declaration with the <? opening tag, and end it with the ?> closing tag. In addition, you must include the xml keyword (lowercase) and the version attribute (also lowercase). Another attribute commonly included, although optional, is encoding, which specifies the character encoding used for the XML document. In the following example, I include a declaration that specifies version
1.0 and an encoding of UTF-8, which means the data is stored as a sequence of 8-bit Unicode characters:

<?xml version="1.0" encoding="UTF-8"?>
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>  

You can also add comments to your XML documents. To do so, simply precede the comment with the <!— tag and end it with the —> tag, as I’ve done in the following example:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>  

As you can see, I’ve added the comment A list of current clients, which I’ve enclosed in the comment tags. The SQL Server XML parser will ignore anything within the tags, so you can use the commenting feature not only to provide information about the XML document and its data, but also to preserve parts of the XML code that you want to hang on to but you don’t want to have processed as part of the document.

Another consideration when working with XML is that certain characters cannot be parsed when they appear in element values. For example, you cannot include an ampersand (&) in an element’s value, as I’ve done in the <FavoriteBook> child element in the following example:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<FavoriteBook>Crime & Punishment</FavoriteBook>
</Person>
</People>  

If I try to assign this XML document to an XML column or variable, the <FavoriteBook> child element will cause the parser to generate an error because the value Crime & Punishment includes the ampersand. You must replace this type of character with an entity reference that tells the parser to preserve the character as it is originally intended. An entity reference begins with an ampersand and ends with a semi-colon and in between includes a multi-character code that represents the original value. For an ampersand, the entity reference should be &amp;, which I use in the following example:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<FavoriteBook>Crime &amp; Punishment</FavoriteBook>
</Person>
</People>

Notice that I’ve replaced the ampersand with the &amp; entity reference. Now the XML parser will handle the <FavoriteBook> element with no problem. But note that the ampersand is not the only character that will generate an error. The XML standard identifies five characters that should be replaced with entity references, as I’ve done in the above example:

  • Less than (<): replace with <
  • Greater than (>): replace with >
  • Ampersand (&): replace with &
  • Apostrophe (‘): replace with ‘
  • Quotation mark (“): replace with “

Another issue that the example raises is the fact that the child elements do not have to be the same from one parent instance to the next. As you can see, the first instance of the <Person> element contains only the <FirstName> and <LastName> child elements, but the second instance of the <Person> element contains the <FirstName> and <LastName> child elements, as well as the <FavoriteBook> element. As long as your child elements are well formed, you can include whatever elements necessary to delineate and define your data.

Summary

In this Level, we’ve looked at the primary components that make up an XML document. Elements serve as the basic building blocks for all XML documents, with each element being delineated by an opening tag and a closing tag and the data value itself being enclosed between those two tags. Elements can be embedded within each other, but one element—the root—must act as the parent to all other elements in an XML document. An element can also include attributes, which are defined as part of an element’s opening tag.

As handy as it might be to know how to put together an XML document, the purpose of this Level has not been to train you in how to create these types of documents, but rather to provide an introduction to XML so you can more effectively work with XML in SQL Server. In the next Level, we’ll look at how the XML data type is implemented in SQL Server and how it can be assigned to columns and variables in order to store both XML documents and XML fragments.


Recommended Today

Write the correct posture of chameleon cross-end components (Part I)

In the chameleon project, there are two ways to implement a cross-end component: using third-party component encapsulation and unified implementation based on chameleon grammar. This article is the first in a series of articles on the correct posture of chameleon cross-end components. Taking encapsulating a cross-end indexList component as an example, it first introduces how […]