XML in SQL Server

This is my second post on XML after 10 years. I was contented with the earlier post however I experienced more over these years and learned many things which I wanted to share with the current and future generations.

Let me begin with the basics.

XML is a file extension for an Extensible Markup Language (XML) represents information in a hierarchical (tree) structure in a simple text format. An XML document consists of XML elements and all elements in an XML document can contain sub-elements, text, or attributes. The hierarchical structure represented by an XML document starts at the root element and branches to the lowest level of elements.

There are several modes that change the shape of XML format while generating it through SQL Server using FOR XML clause.

Instead of reinventing the wheel, I would quote the MSDN information here for the descriptions with the examples I prepared.

Here is the data for the tryout.

VideoID VideoDesc vcVideoLocation
842735 SQL D:\Partition23\20200101\73899mpart
842736 Java   D:\Partition23\20200101\74900mpart
842738 XML D:\Partition23\20200101\74901mpart
842739 Hive D:\Partition23\20200101\74901mpart
842740 Sqoop D:\Partition23\20200101\74900mpart
842742 Impala D:\Partition23\20200101\73899mpart
842743 SQL D:\Partition23\20200101\74900mpart
842744 Hadoop D:\Partition23\20200104\74791mpart
842745 SQL D:\Partition23\20200101\74901mpart
842746 Hive D:\Partition23\20200101\73899mpart
842747 Sqoop D:\Partition23\20200104\74791mpart
842748 Impala D:\Partition23\20200101\74900mpart
842749 Spark D:\Partition23\20200101\74900mpart
842750 HBase D:\Partition23\20200101\73899mpart
842751 Scala D:\Partition23\20200104\74791mpart
842752 Hive D:\Partition23\20200104\74791mpart
842753 SQL D:\Partition23\20200101\74901mpart
842756 Impala D:\Partition23\20200101\74901mpart

SQL Code:
CREATE TABLE VideoData(
VideoID BIGINT,
VideoDesc
VARCHAR(10),
vcVideoLocation
NVARCHAR(MAX))

INSERT INTO VideoData(VideoID, VideoDesc, vcVideoLocation) VALUES
(842735,‘SQL’,‘D:\Partition23\20200101\73899mpart’),
(842736,‘Java’,‘D:\Partition23\20200101\74900mpart’),(842738,‘XML’,‘D:\Partition23\20200101\74901mpart’),(842739,‘Hive’,‘D:\Partition23\20200101\74901mpart’),(842740,‘Sqoop’,‘D:\Partition23\20200101\74900mpart’),(842742,‘Impala’,‘D:\Partition23\20200101\73899mpart’),(842743,‘SQL’,‘D:\Partition23\20200101\74900mpart’),(842744,‘Hadoop’,‘D:\Partition23\20200104\74791mpart’),(842745,‘SQL’,‘D:\Partition23\20200101\74901mpart’),(842746,‘Hive’,‘D:\Partition23\20200101\73899mpart’),(842747,‘Sqoop’,‘D:\Partition23\20200104\74791mpart’),(842748,‘Impala’,‘D:\Partition23\20200101\74900mpart’),(842749,‘Spark’,‘D:\Partition23\20200101\74900mpart’),(842750,‘HBase’,‘D:\Partition23\20200101\73899mpart’),(842751,‘Scala’,‘D:\Partition23\20200104\74791mpart’),(842752,‘Hive’,‘D:\Partition23\20200104\74791mpart’),(842753,‘SQL’,‘D:\Partition23\20200101\74901mpart’),(842756,‘Impala’,‘D:\Partition23\20200101\74901mpart’)

XML RAW:
Each row in the result set is taken as one element with your columns being the attributes.

SELECT VideoID, VideoDesc, vcVideoLocation
FROM VideoData
FORXML RAW;
XML_forXMLRaw

If you look at the output, each row from the table is represented as a single element in XML and is also represented by the keyword . As stated in the beginning, the elements can contain sub-elements, text or attributes; here the table’s columns are being treated as attributes. What if the keyword ELEMENTS mentioned after FOR XML RAW? Let’s check it out. 

SELECT VideoID, VideoDesc, vcVideoLocation
FROM VideoData
FORXML RAW, ELEMENTS;

XML_forXMLPathElements
Now, each row element has column names as sub-elements. 

FOR XML RAW and FOR XML RAW, ELEMENTS are returning a shape that is slightly different.

XML AUTO:
Returns query results in a simple, nested XML tree. Each table in the FROM clause for which at least one column is listed in the SELECT clause is represented as an XML element.

Each row of the table will be represented as an element by table name.

SELECT
VideoID, VideoDesc, vcVideoLocation
FROM VideoData
FORXML AUTO;

XML_forXMLAuto
Let’s add “ELEMENTS” to it and see how the output changes.

SELECT VideoID, VideoDesc, vcVideoLocation
FROM VideoData
FORXML AUTO, ELEMENTS;

XML_forXMLAutoElements
Table name as an element for each row and each element has column names as sub-elements

XML PATH:
Provides a simpler way to mix elements and attributes, and to introduce additional nesting for representing complex properties. Table columns are passed as child elements.

SELECT VideoID, VideoDesc, vcVideoLocation
FROM VideoData
FOR XML PATH;

XML_forXMLPath
If you look at the outcome of “FOR XML RAW, ELEMENTS” and “FOR XML PATH” there is no difference at all. Both are returning the same elements and sub-elements. 

SELECT VideoID, VideoDesc, vcVideoLocation
FROM VideoData
FOR XML PATH, ELEMENTS; 

Again, the output is as same as “FOR XML RAW, ELEMENTS” and “FOR XML PATH”.

XML_forXMLPathElements

Look at the below example –

SELECT VideoID, VideoDesc, vcVideoLocation
FROM VideoData
FOR XML PATH(‘VideoInfo’);

XML_forXMLPath_VideoInfo
User can replace the default keyword “row” with any desired information. In the above example, since the data is related to Videos, I mentioned “VideoInfo” for the readability. It solves no other purpose.

Synopsis: RAW will return the table’s row as an element and column names being the attributes. AUTO will return the table’s name as an element and the element contains the entire row and the column names are the attributes. PATH returns the row as element and columns as child or sub-elements.

Now let’s go back to the data; the table contains video information such as on which topic the video is created and where the video is located. There is more than one video on the same subject and there are several videos on the same location. 

We can create different shapes apart from the above by using the existing modes, either clustered/grouped by topic of the video or by the location, similar to bitmap fashion. Look at the below to understand easily.

SELECT vcVideoLocation AS ‘@path’, (
              SELECT VideoID AS [index], VideoDesc FROM VideoData t1
              WHERE t1.vcVideoLocation = t2.vcVideoLocation
              FOR XML PATH(‘VideoInfo’),TYPE
                                                          )
              FROM (
                      SELECTDISTINCT vcVideoLocation FROM VideoData) t2
                      FOR XML PATH(‘vLocation’), ROOT(‘data’) 

“data” is the main element for the whole table data, the location is sub-element; and in each location, there are several videos that are being treated as attributes within the user-defined sub-element “VideoInfo”. Look at the results below.
XMLData_Basedon_VideoLocationn
XMLData_Basedon_VideoLocation2
XMLData_Basedon_VideoLocation3

In the above example, the data is grouped by “location”. Similarly, we can shape it by grouping by “Subject” (i.e. Video Description).

SELECT VideoDesc AS ‘@Desc’, (
              SELECT VideoID AS [index], vcVideoLocation FROM VideoData t1
              WHERE t1.VideoDesc = t2.VideoDesc
              FOR XML PATH(‘VideoInfo’),TYPE
                                                  )
              FROM (
                      SELECTDISTINCT VideoDesc FROM VideoData) t2
                      FOR XML PATH(‘VideoType’), ROOT(‘data’)

Output:
XMLData_Basedon_VideoType1XMLData_Basedon_VideoType2XMLData_Basedon_VideoType3

Hope you find this article helpful.

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s