XQuery vs OpenXML in SQL Server - sql-server

I have this XML in a SQL Server table:
<root>
<meetings>
<meeting>
<id>111</id>
<participants>
<participant><name>Smith</name></participant>
<participant><name>Jones</name></participant>
<participant><name>Brown</name></participant>
</participants>
</meeting>
<meeting>
<id>222</id>
<participants>
<participant><name>White</name></participant>
<participant><name>Bloggs</name></participant>
<participant><name>McDonald</name></participant>
</participants>
</meeting>
</meetings>
</root>
And want a result set like this:
MeetingID Name
111 Smith
111 Jones
111 Brown
222 White
222 Bloggs
222 McDonald
This is easy using select from openxml but I failed using XQuery. Can someone help me there, and maybe also give pros and cons for either method?

Once you've fixed your invalid XML (the <name> elements need to be ended with a </name> end tag), you should be able to use this:
SELECT
Meetings.List.value('(id)[1]', 'int') AS 'Meeting ID',
Meeting.Participant.value('(name)[1]', 'varchar(50)') AS 'Name'
FROM
#input.nodes('/root/meetings/meeting') AS Meetings(List)
CROSS APPLY
Meetings.List.nodes('participants/participant') AS Meeting(Participant)
Basically, the first call to .nodes() gives you a pseudo-table of all <meeting> nodes, from which I extract the meeting ID.
The second .nodes() call on that <meeting> tag digs deeper into the <participants>/<participant> list of subnodes and extracts the name from those nodes.

This may give you the XQuery based Table based output.
(: Assume $x is your Xml Content. so :)
let $x := Assign your Xml Content.
let $d1:= <table border="1"><tr><td>Meeting</td><td> Participant</td></tr>
{ for $p in $x//meeting/participants/participant
return element{'tr'} {
element{'td'} {$p/parent::*/parent::*/id/text()},
element{'td'} {data($p)}
}
}
</table>

Related

Shred XML file into SQL Server table

I have researched the best way to shred this xml file extensively and have come close but not all the way to what I want.
I am using SQL Server 2012 and have Visual Studio 2012 as well though I prefer to use SQL Server.
Here is a snippet of the type of XML data I am working with. I cannot control how the XML is built as it comes from a 3rd party. In Reality below the node there are about 450 response types such as ResponseID, Name, Status etc... I only show about ten.
<xml>
<Response>
<ResponseID>R_a4yThVvKXzVyftz</ResponseID>
<ResponseSet>Default Response Set</ResponseSet>
<Name>Doe, John</Name>
<ExternalDataReference>0</ExternalDataReference>
<EmailAddress>jdoe#gmail.com</EmailAddress>
<IPAddress>140.123.12.123</IPAddress>
<Status>0</Status>
<StartDate>2014-09-18 09:21:11</StartDate>
<EndDate>2014-09-23 16:09:58</EndDate>
<Finished>1</Finished>
</Response>
</xml>
I've tried the OPENROWSET Method shown on this site
http://blogs.msdn.com/b/simonince/archive/2009/04/24/flattening-xml-data-in-sql-server.aspx
Using a query like this:
SELECT
a1.value('(RESPONSEID/text())[1]', 'varchar(50)') as RESPONSEID,
a2.value('(RESPONSESET/text())[1]', 'varchar(50)') as RESPONSESET,
a3.value('(NAME/text())[1]', 'varchar(50)') as NAME
FROM XmlSourceTable
CROSS APPLY XmlData.nodes('//Response') AS RESPONSEID(a1)
CROSS APPLY XmlData.nodes('//Response') AS RESPONSESET(a2)
CROSS APPLY XmlData.nodes('//Response') AS NAME(a3)
I got this to work once, but the shredded output was repeating values and not appearing in the table form I want which is like output below, though note in reality the table is very wide, at least 450 rows in all. Another issue is due to the width being greater than 255 I can't convert this to .txt and import it though I'd strongly prefer to consume and shred the native XML so this process can be automated:
RESPONSEID RESPONSESET NAME EXTERNALDATAREFERENCE EMAILADDRESS IPADDRESS STATUS STARTDATE ENDDATE
R_a4yThVvKXzVyftz Default Response Set Doe, John 1/1/2014 doej#gmail.com 123.12.123 0 9/18/2014 9:21 9/23/2014 16:09
R_06znwEis73yLsnX NonDefault Response Set Doe, Jane 1/1/2014 doeja#gmail.com 123.12.123 0 9/18/2014 5:29 9/29/2014 9:42
R_50HuB0jDFfI6hmZ Response Set 1 Doe, Cindy 1/1/2014 doec#gmail.com 123.12.123 0 9/18/2014 17:21 10/1/2014 11:45
I did find this application
https://www.novixys.com/ExultSQLServer/
to shred XML files which created a single table for the Nodehowever in addition to the response table it creates a table for each and every response node which results in about 500 additional tables. Also the application costs $250..
https://www.novixys.com/ExultSQLServer/
You don't need to add a cross apply for each value you want to extract. One is enough.
SELECT
R.X.value('(ResponseID/text())[1]', 'varchar(50)') as RESPONSEID,
R.X.value('(ResponseSet/text())[1]', 'varchar(50)') as RESPONSESET,
R.X.value('(Name/text())[1]', 'varchar(50)') as NAME
FROM XmlSourceTable
CROSS APPLY XmlData.nodes('//Response') AS R(X)

Bulk Import of XML Into Existing Tables

I am new to XML and SQL Server and am trying import an XML file into SQL Server 2010. I have 14 tables that I would like to parse the data into. All 14 table names are listed in the XML as nodes (I think) I found some example code that worked with the simple example XML, but my XML seems a little more complicated and may not be structured optimally; unfortunately, I can't change that. As a basic attempt, I tried to insert the data into just one field of one existing table (SILVX_SN16000), but the Message pane shows "(0 rows(s) affected). Thanks in advance for looking at this.
USE TEST
Declare #xml XML
Select #xml =
CONVERT(XML,bulkcolumn,2) FROM OPENROWSET(BULK 'C:\Users\Kevin_S\Documents \SilvxInSightImport.xml',SINGLE_BLOB) AS X
SET ARITHABORT ON
Insert into [SILVX_SN16000]
(
md_group
)
Select
P.value('MD_GROUP[1]','NVARCHAR(255)') AS md_group
From #xml.nodes('/TableData/Row') PropertyFeed(P)
Here is a much-shortened (rows removed) version of my XML:
<?xml version="1.0" ?>
<SilvxInSightImport Version="1.0" Host="uslsss17" Date="14-09-14_20-40-02">
<Tables Count="14">
<Table Name="SN16000">
<TableSchema>
<Column><COLUMN_NAME>PARENT_HPKEY</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>MD_GROUP</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>PKEY</COLUMN_NAME><DATA_TYPE>NUMBER</DATA_TYPE></Column>
<Column><COLUMN_NAME>S_STATE</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>NAME</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>ROUTER_ID</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>IP_ADDR</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
</TableSchema>
<TableData>
<Row><MD_GROUP>100.120.25162</MD_GROUP><PARENT_HPKEY>100</PARENT_HPKEY> <PKEY>161888</PKEY><NAME>UODEDTM010</NAME><ROUTER_ID>10.41.32.129</ROUTER_ID> <IP_ADDR>10.41.32.129</IP_ADDR><S_STATE>IS-NR</S_STATE></Row>
<Row><MD_GROUP>100.120.25162</MD_GROUP><PARENT_HPKEY>100</PARENT_HPKEY> <PKEY>278599</PKEY><NAME>UODEETM010</NAME><ROUTER_ID>10.41.4.129</ROUTER_ID> <IP_ADDR>10.41.4.129</IP_ADDR><S_STATE>IS-NR</S_STATE></Row>
<Row><MD_GROUP>100.120.25162</MD_GROUP><PARENT_HPKEY>100</PARENT_HPKEY> <PKEY>183583</PKEY><NAME>UODEGRM010</NAME><ROUTER_ID>10.41.76.129</ROUTER_ID> <IP_ADDR>10.41.76.129</IP_ADDR><S_STATE>IS-NR</S_STATE></Row>
NT_HPKEY>100</PARENT_HPKEY><PKEY>811003</PKEY><NAME>UODWTIN010</NAME> <ROUTER_ID>10.27.36.130</ROUTER_ID><IP_ADDR>10.27.36.130</IP_ADDR><S_STATE>IS-NR</S_STATE> </Row>
</TableData>
</Table>
</Tables>
</SilvxInSightImport>
The xPath in .nodes() must specify the whole path to the Row nodes so you should start with SilvxInSightImport and work your way down to Row.
/SilvxInSightImport/Tables/Table/TableData/Row
In your case you have multiple table nodes, one for each table and I assume you only need one table at a time. You can use a predicate on the table name in the .nodes() xPath expression.
/SilvxInSightImport/Tables/Table[#Name = "SN16000"]/TableData/Row
Your whole query for SN16000 should look something like this.
select T.X.value('(MD_GROUP/text())[1]', 'varchar(20)') as MD_GROUP,
T.X.value('(PARENT_HPKEY/text())[1]', 'int') as PARENT_HPKEY,
T.X.value('(PKEY/text())[1]', 'int') as PKEY,
T.X.value('(NAME/text())[1]', 'varchar(20)') as NAME,
T.X.value('(ROUTER_ID/text())[1]', 'varchar(20)') as ROUTER_ID,
T.X.value('(IP_ADDR/text())[1]', 'varchar(20)') as IP_ADDR,
T.X.value('(S_STATE/text())[1]', 'varchar(20)') as S_STATE
from #XML.nodes('/SilvxInSightImport/Tables/Table[#Name = "SN16000"]/TableData/Row') as T(X)
You have to sort out the data types used for each column.
SQL Fiddle

SQL "for XML" and built-in functions like "comment()"

I'm writing an SQL select statement which returns XML. I wanted to put in some comments and found a post asking how to do this. The answer seemed to be the "comment()" function/keyword. So, my code looks broadly like this:
select ' extracted on tuesday ' as 'comment()',
(select top 5 id from MyTable for xml path(''),type)
for xml path('stuff')
...which returns XML as follows:
<stuff>
<!-- extracted on tuesday -->
<id>0DAD4B42-CED6-4A68-AB7D-0003E4C127CC</id>
<id>24BD0E5F-8B76-43FF-AEEA-0008AA911ADD</id>
<id>AAFF5BB0-BFFB-4584-BACC-0009684A1593</id>
<id>0581AF24-8C30-408C-9A48-000A488133AC</id>
<id>01E2306D-296A-4FF7-9263-000EEFF42230</id>
</stuff>
In the process of trying to find out more about "comment()", I discovered "data()" as well.
select top 5 id as 'data()' from MyTable for xml path('')
Unfortunately, the names make searching for information on these functions very difficult.
Can someone point me at the documentation on their usage, as well as any other similar functions ?
Thanks,
Edit:
Another would appear to be "processing-instruction(blah)".
Example:
select 'type="text/css" href="style.css"' as 'processing-instruction(xml-stylesheet)',
(select top 5 id from MyTable for xml path(''),type)
for xml path('stuff')
Results:
<stuff>
<?xml-stylesheet type="text/css" href="style.css"?>
<id>0DAD4B42-CED6-4A68-AB7D-0003E4C127CC</id>
<id>24BD0E5F-8B76-43FF-AEEA-0008AA911ADD</id>
<id>AAFF5BB0-BFFB-4584-BACC-0009684A1593</id>
<id>0581AF24-8C30-408C-9A48-000A488133AC</id>
<id>01E2306D-296A-4FF7-9263-000EEFF42230</id>
</stuff>
Here is the link to the BOL info: Columns with the Name of an XPath Node Test.
This details the functionality you are interested in. (It can indeed be a pain to find)
Also you can find quick functional examples here

Generate XML comments with SQL FOR XML statement

Background: I am generating pieces of a much larger XML document (HL7 CDA documents) using SQL FOR XML queries. Following convention, we need to include section comments before this XML node so that when the nodes are reassembled into the larger document, they are easier to read.
Here is a sample of the expected output:
<!--
********************************************************
Past Medical History section
********************************************************
-->
<component>
<section>
<code code="10153-2" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC"/>
<title>Past Medical History</title>
<text>
<list>
<item>COPD - 1998</item>
<item>Dehydration - 2001</item>
<item>Myocardial infarction - 2003</item>
</list>
</text>
</section>
</component>
Here is the SQL FOR XML statement that I have constructed to render the above XML:
SELECT '10153-2' AS [section/code/#code], '2.16.840.1.113883.6.1' AS [section/code/#codeSystem], 'LOINC' AS [section/code/#codeSystemName],
'Past Medical History' AS [section/title],
(SELECT [Incident] + ' - ' + [IncidentYear] as "item"
FROM [tblSummaryPastMedicalHistory] AS PMH
WHERE ([PMH].[Incident] IS NOT NULL) AND ([PMH].[PatientUnitNumber] = [PatientEncounter].[PatientUnitNumber])
FOR XML PATH('list'), TYPE
) as "section/text"
FROM tblPatientEncounter AS PatientEncounter
WHERE (PatientEncounterNumber = 6)
FOR XML PATH('component'), TYPE
While I can insert the comments from the controlling function that reassembles these XML snippets into the main document, our goal is to have the comments be generated with the output to avoid document construction errors.
I've tried a few things, but am having trouble producing the comments with the SELECT statement. I've tried a simple string, but have not been able to get the syntax for the line breaks. Any suggestions?
Example:
SELECT [EmployeeKey]
,[ParentEmployeeKey]
,[FirstName]
,[LastName]
,[MiddleName]
,[DepartmentName] AS "comment()"
FROM [AdventureWorksDW2008].[dbo].[DimEmployee]
FOR XML PATH('Employee'),ROOT('Employees')
produces:
<Employees>
<Employee>
<EmployeeKey>1</EmployeeKey>
<ParentEmployeeKey>18</ParentEmployeeKey>
<FirstName>Guy</FirstName>
<LastName>Gilbert</LastName>
<MiddleName>R</MiddleName>
<!--Production-->
</Employee>
<Employee>
<EmployeeKey>2</EmployeeKey>
<ParentEmployeeKey>7</ParentEmployeeKey>
<FirstName>Kevin</FirstName>
<LastName>Brown</LastName>
<MiddleName>F</MiddleName>
<!--Marketing-->
</Employee>
</Employees>
Just an alternative that also works:
select cast('<!-- comment -->' as xml)`
This may be the only viable approach if you're using FOR XML EXPLICIT, which doesn't support the [comment()] column alias notation of the answer by John Saunders. For example:
select
1 [tag],
null [parent],
(select cast('<!-- test -->' as xml)) [x!1],
2 [x!1!b]
for xml explicit, type
The above produces:
<x b="2"><!-- test --></x>
If the comment is dynamic, just concatenate it like this:
select cast('<!--' + column_name + '-->' as xml)` from table_name

Modify XML in SQL server to add a root node

To give some background to this problem first, I am rewriting some code that currently loops through some xml, doing an insert to a table at the end of each loop - replacing with a single sp that takes an xml parameter and does the insert in one go, 'shredding' the xml into a table.
The main shred has been done successfully,but currently one of the columns is used to store the entire node. I have been able to work out the query necessary for this (almost), but it misses out the root part of the node. I have come to the conclusion that my query is as good as I can get it, and I am looking at a way to then do an update statement to get the root node back in there.
So my xml is of the form;
<xml>
<Items>
<Item>
<node1>...</node1><node2>..<node2>.....<noden>...<noden>
<Item>
<Item>
<node1>...</node1><node2>..<node2>.....<noden>...<noden>
<Item>
<Item>
<node1>...</node1><node2>..<node2>.....<noden>...<noden>
<Item>
......
<Items>
</xml>
So the basic shredding puts the value from node1 into column1, node2 into column2 etc. The insert statement looks something like;
INSERT INTO mytable col1, col2,...etc.....,wholenodecolumn
Select
doc.col.value('node1[1]', 'int') column1,
doc.col.value('node2[1]', 'varchar(50)') column2,
....etc......,
doc.col.query('*')--this is the query for getting the whole node
FROM #xml.nodes('//Items/Item') doc(col)
The XML that ends up in wholenodecolumn is of the form;
<node1>...</node1><node2>..<node2>.....<noden>...<noden>
but I need it to be of the form
<Item><node1>...</node1><node2>..<node2>.....<noden>...<noden></Item>
There is existing code (a lot of it) that depends on the xml in this column being of the correct form.
So can someone maybe see how to modify the doc.col.query('*') to get the desired result?
Anyway, I gave up on modifying the query, and tried to think of other ways to accomplish the end result. What I am now looking at is an Update after the insert- something like;
update mytable set wholenodecolumn.modify('insert <Item> as first before * ')
If I could do this along with
.modify('insert </Item> as last after * ')
that would be fine, but doing 1 at a time isn't an option as the XML is then invalid
XQuery [mytable.wholenodecolumn.modify()]: Expected end tag 'Item'
and doing both together I don't know if it's possible but I've tried various syntax and can't get to work.
Any other approaches to the problem also gratefully received
I beleive you can specifiy the Root Node name by using the FOR clause.
For example:
select top 1 *
from HumanResources.Department
for XML AUTO, ROOT('RootNodeName')
Take a looks at books online for more details:
http://msdn.microsoft.com/en-us/library/ms190922.aspx
Answering my own question here! - this follows on from the comments to the one of the other attempted answers where I said:
I am currently looking into FLWOR
Xquery constructs in the query.
col.query('for $item in * return <Item> {$item} </item>') is almost
there, but puts around
each node, rather than around all the
nodes
I was almost there with the syntax, a small tweak has given me what I needed;
doc.col.query('<Item> { for $item in * return $item } </item>'
Thankyou to everyone that helped. I have further related issues now but I'll post as separate questions
Couldn't you just add the '' / '' as fixed texts in your select? Something like:
Select
'<Item>',
doc.col.value('node1[1]', 'int') column1,
doc.col.value('node2[1]', 'varchar(50)') column2,
....etc......,
doc.col.query('*'),
'</Item>' --this is the query for getting the whole node
FROM #xml.nodes('//Items/Item') doc(col)
Marc

Resources