XML has been widely adopted as a platform-independent mechanism for representing data, and is also commonly used to exchange data between disparate and loosely-coupled systems, such as B2B applications and workflow solutions. More recently, XML has been used to represent semi-structured (as well as unstructured) data such as documents and emails. If information in these models has to be queried, then XML is probably the simplest way to represent such information.
For example, if documents are represented in XML, it is very easy to write a DOM or XPATH query to extract the contents of a section titled, for example, "Recipe". Many applications also require other pieces of information that typically reside in a database. For example, consider an application that allows a user to personalize the look and feel of an application. It is not uncommon to store the preferences of the user against the user record in the database. XML lends itself well to storing such unstructured information to cope with an ever expanding list of user preferences.
When we store this kind of semi-structured or unstructured information as XML in the database, it is not always feasible to extract the content to the application tier and then process the XML using XML parsers. If the database platform is able to provide native XML processing capabilities then that is a massive advantage. In that way we can also take advantage of other database capabilities such as query optimization, indexing, backups and, most importantly of all, interoperability with relational data.
The XML Data Type
The XML data type is a built-in data type similar to other built-in types in SQL Server 2005. You can define XML columns, XML variables, XML parameters for procedures and functions, and can also return XML as return values from functions. Internally, the XML data type is stored in a proprietary binary format. The maximum permissible size of the XML content is 2GB.
Defining XML Variables and Columns
To declare a variable of type XML is as simple as this:
DECLARE @xmlDoc XML
In order to create an XML column (using untyped XML – more on this later), we could use the following:
CREATE TABLE rssFeeds ( feedName VARCHAR(20) NOT NULL PRIMARY KEY, generatedDate DATETIME DEFAULT GETDATE(), feedXML XML )
Loading Data into XML Columns and Variables
Having defined an XML variable or column, we can store information in it via the following methods:
- Assigning a string to the XML variable / column
- Using the output of the SELECT…FOR XML clause
- Using bulkload
Simple String Assignment
We can load a string into our @xmlDoc variable as follows:
SET @xmlDoc = '<contact><name>Bob Smith</name></contact>' SELECT @xmlDoc
Note that string content is implicitly converted to the XML type. You can also explicitly convert the string to XML as shown:
SET @xmlDoc = CAST ('<contact><name>Bob Smith</name></contact>' AS XML) SELECT @xmlDoc
In each example, we then select this XML content. If you run this example in SQL Server Management Studio with the "results to grid" option enabled, the XML output will be as shown below:
Notice that the XML content is hyperlinked and if you click this link, a new window opens and shows the entire XML document. This is a great improvement over SQL Server 2000 Query Analyzer, which would show the results of XML queries as a single line of output and also truncate the results if the result size is greater than 8192 bytes.
We can insert regular string content into the feedXML column of our rssFeeds table using the following method:
INSERT INTO rssFeeds (feedName, feedXML) VALUES (‘MSDN’, ‘<feedXML/>’)
Using FOR XML
In the following example, we initialize the @xmlDOC variable using a SELECT…FOR XML statement:
USE AdventureWorks GO DECLARE @xmlDoc XML SET @xmlDoc = ( SELECT name, salesYTD FROM sales.salesterritory FOR XML AUTO, ELEMENTS ) SELECT @xmlDoc
The output of the above command is as follows (only partial results are shown. To see the full results, you need to click on the hyperlink in Management Studio):
Streaming results from FOR XML statements directly into XML variables is a great improvement over SQL Server 2000 (which did not allow this). Also note that the above example returns an XML fragment and not an XML document. If you need to return an XML document, you can use the new ROOT directive in the FOR XML command as shown:
USE AdventureWorks GO DECLARE @xmlDoc XML SET @xmlDoc = ( SELECT name, salesYTD FROM sales.salesterritory FOR XML AUTO, ROOT ('territories'), ELEMENTS ) SELECT @xmlDoc
Executing the above command produces the following output (again, only partial results are shown):
Bulkloading XML data using the OPENROWSET Function
SQL Server 2005 provides extensions to the OPENROWSET function to bulk load an XML document from the file system. The following example shows how to bulkload our @xmlDoc variable using this technique:
DECLARE @xmlDoc XML SET @xmlDoc = ( SELECT * FROM OPENROWSET ( BULK 'C:\Temp\rss.xml', SINGLE_CLOB ) AS xmlData ) SELECT @xmlDoc
The RSS.XML file that is referred to in the above query is a sample RSS XML that I generated from the MSDN site. To do this, you can query the MSDN RSS site at: http://msdn.microsoft.com/rss.xml and then save results as XML from Internet Explorer.
The output of the above command is as follows:
Similarly, the following code uses the OPENROWSET command to load an XML document into our feedXMLcolumn:
GO INSERT INTO rssFeeds (feedName, feedXML) SELECT 'MSDNRSS', xmlData FROM ( SELECT * FROM OPENROWSET (BULK 'C:\Temp\rss.xml', SINGLE_CLOB) AS xmlData ) AS feed (xmlData) GO SELECT * FROM rssFeeds
The following output is generated:
Typing XML
Up to this point, the XML content that we have used is referred to as untyped XML. An untyped XML variable/column is one that is not associated with a schema. If we associate a schema with an XML type, it is then referred to astyped XML. Once an XML variable/column is typed using a schema, any data that is inserted into it is automatically validated against the schema, and SQL Server 2005 raises an error if the data does not conform to the defined schema rules.
It is usually recommended to associate schemas to XML content in SQL Server, since query optimization can use the information and also it provides better operational semantics. For example, we can perform arithmetic operations on numbers, but not strings. If no schema is attached to an XML, then all values are considered to be strings. MSDN provides further information about schemas and their maintenance.
Creating typed XML involves two basic steps: 1) Create an XML schema and 2) Register it in the database.
Creating a Schema
Using tools such as Visual Studio 2005 and XML Spy, creating a schema is pretty simple. The following link contains some excellent tutorials on this topic: Creating Schemas in Visual Studio. Using the tool of your choice, create the following schema definition:
<?xml version="1.0" encoding="utf-8"?> <xs:schema id="Contacts" targetNamespace="http://www.srisamp.com/contacts.xsd" elementFormDefault="qualified" xmlns="http://www.srisamp.com/contacts.xsd" xmlns:mstns="http://www.srisamp.com/contacts.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:complexType name="contact"> <xs:sequence> <xs:element name="name" type="xs:string" /> </xs:sequence> </xs:complexType> <xs:element name="contacts"> <xs:complexType> <xs:sequence> <xs:element name="contact" type="contact" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
This schema defines an XML document of the form:
<contacts><contact><name></name></contact></contacts>
Save this schema in a file called ContactsSchema.xsd in the C:\Temp folder.
Registering a Schema in SQL Server 2005
You register a schema in SQL Server 2005 using the CREATE XML SCHEMA COLLECTION command. The following script loads our XML schema into the database.
IF EXISTS (SELECT * FROM sys.xml_schema_collections WHERE [name] = 'MyContacts') DROP XML SCHEMA COLLECTION MyContacts GO DECLARE @contactsSchema XML SET @contactsSchema = ( SELECT * FROM OPENROWSET (BULK 'C:\Temp\ContactsSchema.xsd', SINGLE_CLOB) AS xmlData ) CREATE XML SCHEMA COLLECTION MyContacts AS @contactsSchema GO
In the following code, we type an XML variable to the MyContacts schema and then assign both valid and invalid content to the XML (by invalid, we mean that the content does not adhere to the schema):
DECLARE @xmlDoc XML (MyContacts) BEGIN TRY -- This entry conforms to our schema rules SET @xmlDoc = ' <contacts xmlns="http://www.srisamp.com/contacts.xsd"> <contact> <name>Bob Smith</name> </contact> </contacts> ' SELECT @xmlDoc -- This one does not conform to the schema SET @xmlDoc = ' <contacts xmlns="http://www.srisamp.com/contacts.xsd"> <contact> <names>Bob Smith</names> </contact> </contacts> ' SELECT @xmlDoc END TRY BEGIN CATCH PRINT 'Error at line '+CAST(error_line() AS VARCHAR)+'. Message: '+error_message() END CATCH
Note how we have declared the XML variable. The schema reference is specified in parentheses after the variable. We have also used the new TRY…CATCH error handling syntax of SQL Server 2005 to capture the error and display the corresponding message.