SQL "for XML" and built-in functions like "comment()" - sql-server

I'm writing an SQL select statement which returns XML. I wanted to put in some comments and found a post asking how to do this. The answer seemed to be the "comment()" function/keyword. So, my code looks broadly like this:
select ' extracted on tuesday ' as 'comment()',
(select top 5 id from MyTable for xml path(''),type)
for xml path('stuff')
...which returns XML as follows:
<stuff>
<!-- extracted on tuesday -->
<id>0DAD4B42-CED6-4A68-AB7D-0003E4C127CC</id>
<id>24BD0E5F-8B76-43FF-AEEA-0008AA911ADD</id>
<id>AAFF5BB0-BFFB-4584-BACC-0009684A1593</id>
<id>0581AF24-8C30-408C-9A48-000A488133AC</id>
<id>01E2306D-296A-4FF7-9263-000EEFF42230</id>
</stuff>
In the process of trying to find out more about "comment()", I discovered "data()" as well.
select top 5 id as 'data()' from MyTable for xml path('')
Unfortunately, the names make searching for information on these functions very difficult.
Can someone point me at the documentation on their usage, as well as any other similar functions ?
Thanks,
Edit:
Another would appear to be "processing-instruction(blah)".
Example:
select 'type="text/css" href="style.css"' as 'processing-instruction(xml-stylesheet)',
(select top 5 id from MyTable for xml path(''),type)
for xml path('stuff')
Results:
<stuff>
<?xml-stylesheet type="text/css" href="style.css"?>
<id>0DAD4B42-CED6-4A68-AB7D-0003E4C127CC</id>
<id>24BD0E5F-8B76-43FF-AEEA-0008AA911ADD</id>
<id>AAFF5BB0-BFFB-4584-BACC-0009684A1593</id>
<id>0581AF24-8C30-408C-9A48-000A488133AC</id>
<id>01E2306D-296A-4FF7-9263-000EEFF42230</id>
</stuff>

Here is the link to the BOL info: Columns with the Name of an XPath Node Test.
This details the functionality you are interested in. (It can indeed be a pain to find)
Also you can find quick functional examples here

Related

SQL: Using XML as input to do an inner join

I have XML coming in as the input, but I'm unclear on how I need to setup the data and statement to get the values from it. My XML is as follows:
<Keys>
<key>246</key>
<key>247</key>
<key>248</key>
</Keys>
And I want to do the following (is simplified to get my point across)
Select *
From Transaction as t
Inner Join #InputXml.nodes('Keys') as K(X)
on K.X.value('#Key', 'INT') = t.financial_transaction_grp_key
Can anyone provide how I would do that? What would my 3rd/4th line in the SQL look like?
Thanks!
From your code I assume this is SQL-Server but you added the tag [mysql]...
For your next question please keep in mind, that it is very important to know your tools (vendor and version).
Assuming T-SQL and [sql-server] (according to the provided sample code) you were close:
DECLARE #InputXml XML=
N'<Keys>
<key>246</key>
<key>247</key>
<key>248</key>
</Keys>';
DECLARE #YourTransactionTable TABLE(ID INT IDENTITY,financial_transaction_grp_key INT);
INSERT INTO #YourTransactionTable VALUES (200),(246),(247),(300);
Select t.*
From #YourTransactionTable as t
Inner Join #InputXml.nodes('/Keys/key') as K(X)
on K.X.value('text()[1]', 'INT') = t.financial_transaction_grp_key;
What was wrong:
.nodes() must go down to the repeating element, which is <key>
In .value() you are using the path #Key, which is wrong on two sides: 1) <key> is an element and not an attribute and 2) XML is strictly case-sensitive, so Key!=key.
An alternative might be this:
WHERE #InputXml.exist('/Keys/key[. cast as xs:int? = sql:column("financial_transaction_grp_key")]')=1;
Which one is faster depends on the count of rows in your source table as well as the count of keys in your XML. Just try it out.
You probably need to parse the XML to a readable format with regex.
I wrote a similar event to parse the active DB from an xmlpayload that was saved on a table. This may or may not work for you, but you should be able to at least get started.
SELECT SUBSTRING(column FROM IF(locate('<key>',column)=0,0,0+LOCATE('<key>',column))) as KEY FROM table LIMIT 1\G

Parse XML using SQL

I'm using MS SQL2016 and I have an XML file that I need to parse to put various data elements into the separate fields. For the most part everything works find except I need a little help to identify a particular node value. If I have (I put only a snippet of the xml here but it does show the problem)
DECLARE #xmlString xml
SET #xmlString ='<PubmedArticle>
<MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM">
<PMID Version="1">25685064</PMID>
<Article PubModel="Electronic-eCollection">
<Journal>
<ISSN IssnType="Electronic">1234-5678</ISSN>
<ISSN IssnType="Print">1475-2867</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>15</Volume>
<Issue>1</Issue>
<PubDate>
<Year>2015</Year>
</PubDate>
</JournalIssue>
</Journal>
</Article>
</MedlineCitation>
</PubmedArticle>'
select
nref.value('Article[1]/Journal[1]/ISSN[1]','varchar(max)') ISSN
from #xmlString.nodes ('//MedlineCitation[1]') as R(nref)
I bypass the second ISSNType and read the first value available. I need to pull both values. What do I need to change? Thanks
You can read as second column:
SELECT
nref.value('Article[1]/Journal[1]/ISSN[1]','varchar(max)') ISSN,
nref.value('Article[1]/Journal[1]/ISSN[2]','varchar(max)') ISSN2
FROM #xmlString.nodes('//MedlineCitation[1]') as R(nref)
Or
SELECT
nref.value('ISSN[1]','varchar(max)') ISSN,
nref.value('ISSN[2]','varchar(max)') ISSN2
FROM #xmlString.nodes('//MedlineCitation[1]/Article[1]/Journal[1]') as R(nref)
Or as a separate row:
SELECT nref.value('.','varchar(MAX)') ISSN
from #xmlString.nodes('//MedlineCitation[1]/Article[1]/Journal[1]/ISSN') as R(nref)
Update
If number of ISSNs may vary, I recommend normalize your resultset:
SELECT
nref.value('.','varchar(MAX)') Issn,
nref.value('#IssnType','varchar(MAX)') IssnType
FROM #xmlString.nodes('//MedlineCitation[1]/Article[1]/Journal[1]/ISSN') as R(nref)

Select XML multiple only a few nodes with the same name

I'm trying to construct a soap message, and I was able to construct the entire message using a single select. Except the problem is, on only a few occasions the same node name is repeated twice.
So for example the required output result should be like so, with two separate id root nodes:
<SoapDocument>
<recordTarget>
<patientRole>
<id root="1.2.3.4" extension="1234567" />
<id root="1.2.3.5.6" extension="0123456789" />
</patientRole>
</recordTarget>
</SoapDocument>
I tried to use my sparse knowledge of xpath to construct the node names like so:
select
'1.2.3.4' AS 'recordTarget/patientRole/id[1]/#root',
'1234567' AS 'recordTarget/patientRole/id[1]/#extension',
'1.2.3.5.6' AS 'recordTarget/patientRole/id[2]/#root',
'0123456789' AS 'recordTarget/patientRole/id[2]/#extension'
FOR XML PATH('SoapDocument'),TYPE
Apparently xpath naming can't be applied to column names id[1] and id[2] like that? Am I missing something here or should the notation be different? What would be the easiest way to constuct the desired result?
From your question I assume, this is not tabular data, but fixed values and you are creating a medical document, assumably a CDA.
Try this:
SELECT
(
SELECT
'1.2.3.4' AS 'id/#root',
'1234567' AS 'id/#extension',
'',
'1.2.3.5.6' AS 'id/#root',
'0123456789' AS 'id/#extension'
FOR XML PATH('patientRole'),TYPE
) AS [SoapDocument/recordTarget]
FOR XML PATH('')
The result:
<SoapDocument>
<recordTarget>
<patientRole>
<id root="1.2.3.4" extension="1234567" />
<id root="1.2.3.5.6" extension="0123456789" />
</patientRole>
</recordTarget>
</SoapDocument>
Some explanation: The empty element in the middle allows you to place two elements with the same name in one query. There are various approaches how you get this into your surrounding tags. This is just one possibility.
UPDATE
I'd like to point to BdR's own answer! Great finding and worth an up-vote!
A little more elaboration on the answer from Shnugo, as it got me trying out some things using an "empty column".
If you do not give the emtpy column a name, it will reset to the XML root node. So the following columns will start from the XML root of the selection you are in at that point. However, if you explicitly name the empty separator column, then the following columns will continue in the hierarchy as set by that column name.
So the selection below will also result in the desired result. It's subtly different, but in my case it allows me to avoid using subselections.
select
'1.2.3.4' AS 'recordTarget/patientRole/id/#root',
'1234567' AS 'recordTarget/patientRole/id/#extension',
'' AS 'recordTarget/patientRole',
'1.2.3.5.6' AS 'recordTarget/patientRole/id/#root',
'0123456789' AS 'recordTarget/patientRole/id/#extension'
FOR XML PATH('SoapDocument'),TYPE
This should do the job:
WITH CTE AS (
SELECT *
FROM (VALUES('1.2.3.4','1234567'),
('1.2.3.5.6','0123456789')) V ([root], [extension]))
SELECT (SELECT (SELECT (SELECT [root] AS [#root],
[extension] AS [#extension]
FROM CTE
FOR XML PATH('id'), TYPE)
FOR XML PATH('patientRole'), TYPE)
FOR XML PATH ('recordTarget'), TYPE)
FOR XML PATH ('SoapDocument');

How to get the partial value of a XML Node value

I'm new to Xpath and This is my XML . I'm trying to the get the attribute value #name in the appl/*__job tag and the value 'TESTQUEUE 'in the node snmp_notify/message and I'm taking one step at a time. As of now I was able to get the child nodes of all _job, but I couldn't get the value in the node /snmp_notifylist/snmp_notify/message. This is the SQL and Could someone help me with identifying where I got stuck.
This is the Sample XML Document stored as DEFINITION in the table TAB_AR.
<appl xmlns="http://dto.wa.ca.com/application" name="TEST_NEW_AGENT">
<version>12.0</version>
<comment />
<unix_job name="TEST_JOB">
<dependencies><relcount>0</relcount></dependencies>
<snmp_notifylist>
<snmp_notify>
<returncode>4</returncode>
<monitor_states><monitor_state>FAILED</monitor_state></monitor_states>
<snmpagent />
<message>TICKET TESTQUEUE TSTMSG</message>
</snmp_notify>
</snmp_notifylist>
</unix_job>
<link name="HOLD_LINK">
<dependencies><relcount>0</relcount></dependencies>
<hold>true</hold>
<job_ancestor_wait_default_ignore>true</job_ancestor_wait_default_ignore>
</link>
<sftp_job name="TEST_SFTP1">
<dependencies><relcount>0</relcount></dependencies>
<snmp_notifylist>
<snmp_notify>
<returncode>4</returncode>
<monitor_states>
<monitor_state>FAILED</monitor_state>
</monitor_states>
<snmpagent />
<message>TICKET MFG1AWA TSTMSG</message>
</snmp_notify>
</snmp_notifylist>
</sftp_job>
</appl>
And this is the SQL I wrote,
SELECT
SFTP_Job_name = DEFT1.value('(#name)[1]','nvarchar(max)'),
Server_Address = DEFT1.query('local-name(/*:snmp_notifylist/*:snmp_notify/*:message)')
from (select CAST([DEFINITION] as XML) as DEFT from TAB_AR)TAB
CROSS APPLY TAB.DEFT.nodes('/*:appl/*[fn:contains(local-name(),"_job")]') as XMLTAB1(DEFT1)
You were close...
In this line I'm not sure, what you really wanted to get:
DEFT1.query('local-name(/*:snmp_notifylist/*:snmp_notify/*:message)')
With local-name() you can return the name of one specific node. As you are reading from several nodes ending on _job it perfectly makes sense to return the name of the element you are reading from.
But you are telling us, that you are trying to read the <message> too. Might be, that you are mixing two calls in one line?
I slightly modified your code:
SELECT
SFTP_Job_name = DEFT1.value('(#name)[1]','nvarchar(max)')
,NodeName = DEFT1.value('local-name(.)','nvarchar(max)')
,Server_Address = DEFT1.value('(*:snmp_notifylist/*:snmp_notify/*:message)[1]','nvarchar(max)')
from (select CAST([DEFINITION] as XML) as DEFT from TAB_AR)TAB
CROSS APPLY TAB.DEFT.nodes('/*:appl/*[fn:contains(local-name(.),"_job")]') as XMLTAB1(DEFT1);
This returns
SFTP_Job_name NodeName Server_Address
TEST_SFTP1 sftp_job TICKET MFG1AWA TSTMSG
TEST_JOB unix_job TICKET TESTQUEUE TSTMSG
Like Roger Wolf pointed out, it was better to read with a specified namespaces like this:
WITH XMLNAMESPACES (default 'http://dto.wa.ca.com/application')
SELECT
SFTP_Job_name = DEFT1.value('(#name)[1]','nvarchar(max)')
,NodeName = DEFT1.value('local-name(.)','nvarchar(max)')
,Server_Address = DEFT1.value('(snmp_notifylist/snmp_notify/message)[1]','nvarchar(max)')
from (select CAST([DEFINITION] as XML) as DEFT from TAB_AR)TAB
CROSS APPLY TAB.DEFT.nodes('/appl/*[fn:contains(local-name(.),"_job")]') as XMLTAB1(DEFT1);
The general rule is: Be as specific as possible!
Hint
If you can change this, you should store your XML in a column of type XML.
This construction from (select CAST([DEFINITION] as XML) as DEFT from TAB_AR)TAB should really not be necessary...
Might be, that your column is XML actually and you just did not know how to transfer the code you found somewhere to get the right syntax for the .nodes()? In this case just try this:
SELECT
SFTP_Job_name = DEFT1.value('(#name)[1]','nvarchar(max)')
,NodeName = DEFT1.value('local-name(.)','nvarchar(max)')
,Server_Address = DEFT1.value('(*:snmp_notifylist/*:snmp_notify/*:message)[1]','nvarchar(max)')
from TAB_AR
CROSS APPLY TAB_AR.[DEFINITION].nodes('/*:appl/*[fn:contains(local-name(.),"_job")]') as XMLTAB1(DEFT1);
This seems to be working:
with xmlnamespaces (default 'http://dto.wa.ca.com/application')
select j.c.value('./#name', 'sysname') as [JobName],
m.c.value('./text()[1]', 'varchar(max)') as [MessageText]
from (
select cast(t.[Definition] as xml) as [Deft] from tab_ar t
) sq
cross apply sq.Deft.nodes('/appl/*[fn:contains(local-name(),"_job")]') j(c)
cross apply j.c.nodes('./snmp_notifylist/snmp_notify/message') m(c);
After that, splitting the string by spaces and taking the middle part should be relatively trivial.

Using SQL Server 2005's XQuery select all nodes with a specific attribute value, or with that attribute missing

Update: giving a much more thorough example.
The first two solutions offered were right along the lines of what I was trying to say not to do. I can't know location, it needs to be able to look at the whole document tree. So a solution along these lines, with /Books/ specified as the context will not work:
SELECT x.query('.') FROM #xml.nodes('/Books/*[not(#ID) or #ID = 5]') x1(x)
Original question with better example:
Using SQL Server 2005's XQuery implementation I need to select all nodes in an XML document, just once each and keeping their original structure, but only if they are missing a particular attribute, or that attribute has a specific value (passed in by parameter). The query also has to work on the whole XML document (descendant-or-self axis) rather than selecting at a predefined depth.
That is to say, each individual node will appear in the resultant document only if it and every one of its ancestors are missing the attribute, or have the attribute with a single specific value.
For example:
If this were the XML:
DECLARE #Xml XML
SET #Xml =
N'
<Library>
<Novels>
<Novel category="1">Novel1</Novel>
<Novel category="2">Novel2</Novel>
<Novel>Novel3</Novel>
<Novel category="4">Novel4</Novel>
</Novels>
<Encyclopedias>
<Encyclopedia>
<Volume>A-F</Volume>
<Volume category="2">G-L</Volume>
<Volume category="3">M-S</Volume>
<Volume category="4">T-Z</Volume>
</Encyclopedia>
</Encyclopedias>
<Dictionaries category="1">
<Dictionary>Webster</Dictionary>
<Dictionary>Oxford</Dictionary>
</Dictionaries>
</Library>
'
A parameter of 1 for category would result in this:
<Library>
<Novels>
<Novel category="1">Novel1</Novel>
<Novel>Novel3</Novel>
</Novels>
<Encyclopedias>
<Encyclopedia>
<Volume>A-F</Volume>
</Encyclopedia>
</Encyclopedias>
<Dictionaries category="1">
<Dictionary>Webster</Dictionary>
<Dictionary>Oxford</Dictionary>
</Dictionaries>
</Library>
A parameter of 2 for category would result in this:
<Library>
<Novels>
<Novel category="2">Novel2</Novel>
<Novel>Novel3</Novel>
</Novels>
<Encyclopedias>
<Encyclopedia>
<Volume>A-F</Volume>
<Volume category="2">G-L</Volume>
</Encyclopedia>
</Encyclopedias>
</Library>
I know XSLT is perfectly suited for this job, but it's not an option. We have to accomplish this entirely in SQL Server 2005. Any implementations not using XQuery are fine too, as long as it can be done entirely in T-SQL.
It's not clear for me from your example what you're actually trying to achieve. Do you want to return a new XML with all the nodes stripped out except those that fulfill the condition? If yes, then this looks like the job for an XSLT transform which I don't think it's built-in in MSSQL 2005 (can be added as a UDF: http://www.topxml.com/rbnews/SQLXML/re-23872_Performing-XSLT-Transforms-on-XML-Data-Stored-in-SQL-Server-2005.aspx).
If you just need to return the list of nodes then you can use this expression:
//Book[not(#ID) or #ID = 5]
but I get the impression that it's not what you need. It would help if you can provide a clearer example.
Edit: This example is indeed more clear. The best that I could find is this:
SET #Xml.modify('delete(//*[#category!=1])')
SELECT #Xml
The idea is to delete from the XML all the nodes that you don't need, so you remain with the original structure and the needed nodes. I tested with your two examples and it produced the wanted result.
However modify has some restrictions - it seems you can't use it in a select statement, it has to modify data in place. If you need to return such data with a select you could use a temporary table in which to copy the original data and then update that table. Something like this:
INSERT INTO #temp VALUES(#Xml)
UPDATE #temp SET data.modify('delete(//*[#category!=2])')
Hope that helps.
The question is not really clear, but is this what you're looking for?
DECLARE #Xml AS XML
SET #Xml =
N'
<Books>
<Book ID="1">Book1</Book>
<Book ID="2">Book2</Book>
<Book ID="3">Book3</Book>
<Book>Book4</Book>
<Book ID="5">Book5</Book>
<Book ID="6">Book6</Book>
<Book>Book7</Book>
<Book ID="8">Book8</Book>
</Books>
'
DECLARE #BookID AS INT
SET #BookID = 5
DECLARE #Result AS XML
SET #result = (SELECT #xml.query('//Book[not(#ID) or #ID = sql:variable("#BookID")]'))
SELECT #result

Resources