How to retrieve XML that contains CDATA, from Database - sql-server

I need to retrieve XML in the following format
<mv>
<v>!CDATA[[some_inner_xml_1]]</v>
<v>!CDATA[[some_inner_xml_2]]</v>
</mv>
I just learned that data in <v /> will be some other XML. When I thought that data will be an integer, I wrote this and it worked
select IdentifierText as 'v' from ipmruntime.RecordsToExport where BatchID = 5 for xml path(''), Root('mv')
I was trying to use syntax 'v!cdata' - it doesn't like it. I don't know where to stick CDATA in it
I tried another syntax
SELECT
1 AS Tag,
null AS Parent,
IdentifierText as 'mv!1!v!cdata'
FROM ipmruntime.RecordsToExport
where BatchID = 5
FOR XML EXPLICIT, root('mv')
It results in almost what I need
<mv><mv><v><![CDATA[47f81be4-b54f-4703-840b-62b306c40842]]></v></mv><mv><v><![CDATA[3ba36a1f-bf75-4ed9-911e-26f10fba5587]]></v></mv></mv>
Or, if I use 'v!1' in the same query, it will give me <mv><v></v><v></v></mv> but where than goes CDATA?
But this has each <v> wrapped into <mv>. Obviously, I am not great with XML/SqlServer combo...

You can do this way :
select
1 as Tag,
null as Parent,
IdentifierText as [v!1!!CDATA] --[tag name!tag type!tag attribute!other optional setting]
from ipmruntime.RecordsToExport
where BatchID = 5
for xml explicit, root('mv')

Related

Is that valid XML and how to replicate with SQL Server

I do have to replicate an XML file with SQL Server and I am now stumbling over the following structure inside the XML file and I don't know how to replicate that.
The structure looks like this at the moment for certain tags:
<ART_TAG1>
<UNMLIMITED/>
</ART_TAG1>
<ART_TAG2>
<ART_TAG3>
<Data_Entry/>
</ART_TAG3>
</ART_TAG2>
I am wondering if this is proper XML that the data inside (unlimited and Data_Entry) is enclosed with a closing XML tag. The XML validator https://www.w3schools.com/xml/xml_validator.asp is telling me this is correct. But now I am struggling with replicating that with Transact-SQL.
If I try to replicate that I can only come up with the following TSQL script, which obviously does not fully look like the original.
SELECT 'UNLIMITED' as 'ART_TAG1'
, 'Data_Entry' as 'ART_TAG2/ART_TAG3'
FOR XML PATH(''), ROOT('root')
<root>
<ART_TAG1>UNLIMITED</ART_TAG1>
<ART_TAG2>
<ART_TAG3>Data_Entry</ART_TAG3>
</ART_TAG2>
</root>
If I get this correctly, your question is:
How can I put my query to create those <SomeElement /> tags?
Look at this:
--This will create filled nodes
SELECT 'outer' AS [OuterNode/#attr]
,'inner' AS [OuterNode/InnerNode]
FOR XML PATH('row');
--The empty string is some kind of content
SELECT 'outer' AS [OuterNode/#attr]
,'' AS [OuterNode/InnerNode]
FOR XML PATH('row');
--the missing value (NULL) is omited by default
SELECT 'outer' AS [OuterNode/#attr]
,NULL AS [OuterNode/InnerNode]
FOR XML PATH('row');
--Now check what happens here:
--First XML has an empty element, while the second uses the self-closing element
DECLARE #xml1 XML=
N'<row>
<OuterNode attr="outer">
<InnerNode></InnerNode>
</OuterNode>
</row>';
DECLARE #xml2 XML=
N'<row>
<OuterNode attr="outer">
<InnerNode/>
</OuterNode>
</row>';
SELECT #xml1,#xml2;
The result is the same for both...
Some background: Semantically the empty element <element></element> is exactly the same as the self-closing element <element />. It should not make any difference, whether you use the one or the other. If your consumer cannot deal with this, it is a problem in the reading part.
Yes, you can force any content into XML on string level, but - as the example shows above - this is just a (dangerous) hack.
XML within T-SQL returns - by default - a missing node as NULL and an empty element as empty (depending on the datatype, and beware of the difference between an element and its text() node).
In short: This is nothing you should have to think about...

Select XML multiple only a few nodes with the same name

I'm trying to construct a soap message, and I was able to construct the entire message using a single select. Except the problem is, on only a few occasions the same node name is repeated twice.
So for example the required output result should be like so, with two separate id root nodes:
<SoapDocument>
<recordTarget>
<patientRole>
<id root="1.2.3.4" extension="1234567" />
<id root="1.2.3.5.6" extension="0123456789" />
</patientRole>
</recordTarget>
</SoapDocument>
I tried to use my sparse knowledge of xpath to construct the node names like so:
select
'1.2.3.4' AS 'recordTarget/patientRole/id[1]/#root',
'1234567' AS 'recordTarget/patientRole/id[1]/#extension',
'1.2.3.5.6' AS 'recordTarget/patientRole/id[2]/#root',
'0123456789' AS 'recordTarget/patientRole/id[2]/#extension'
FOR XML PATH('SoapDocument'),TYPE
Apparently xpath naming can't be applied to column names id[1] and id[2] like that? Am I missing something here or should the notation be different? What would be the easiest way to constuct the desired result?
From your question I assume, this is not tabular data, but fixed values and you are creating a medical document, assumably a CDA.
Try this:
SELECT
(
SELECT
'1.2.3.4' AS 'id/#root',
'1234567' AS 'id/#extension',
'',
'1.2.3.5.6' AS 'id/#root',
'0123456789' AS 'id/#extension'
FOR XML PATH('patientRole'),TYPE
) AS [SoapDocument/recordTarget]
FOR XML PATH('')
The result:
<SoapDocument>
<recordTarget>
<patientRole>
<id root="1.2.3.4" extension="1234567" />
<id root="1.2.3.5.6" extension="0123456789" />
</patientRole>
</recordTarget>
</SoapDocument>
Some explanation: The empty element in the middle allows you to place two elements with the same name in one query. There are various approaches how you get this into your surrounding tags. This is just one possibility.
UPDATE
I'd like to point to BdR's own answer! Great finding and worth an up-vote!
A little more elaboration on the answer from Shnugo, as it got me trying out some things using an "empty column".
If you do not give the emtpy column a name, it will reset to the XML root node. So the following columns will start from the XML root of the selection you are in at that point. However, if you explicitly name the empty separator column, then the following columns will continue in the hierarchy as set by that column name.
So the selection below will also result in the desired result. It's subtly different, but in my case it allows me to avoid using subselections.
select
'1.2.3.4' AS 'recordTarget/patientRole/id/#root',
'1234567' AS 'recordTarget/patientRole/id/#extension',
'' AS 'recordTarget/patientRole',
'1.2.3.5.6' AS 'recordTarget/patientRole/id/#root',
'0123456789' AS 'recordTarget/patientRole/id/#extension'
FOR XML PATH('SoapDocument'),TYPE
This should do the job:
WITH CTE AS (
SELECT *
FROM (VALUES('1.2.3.4','1234567'),
('1.2.3.5.6','0123456789')) V ([root], [extension]))
SELECT (SELECT (SELECT (SELECT [root] AS [#root],
[extension] AS [#extension]
FROM CTE
FOR XML PATH('id'), TYPE)
FOR XML PATH('patientRole'), TYPE)
FOR XML PATH ('recordTarget'), TYPE)
FOR XML PATH ('SoapDocument');

Storing the text of a stored procedure in an XML data type in SQL Server

I need to store the text of all of the stored procedures in a database into an XML data type. When I use, FOR XML PATH, the text within in the stored procedure contains serialized data characters like 
 and
for CRLF and ", etc. I need the text to stored in the xml structure without these characters because the text will need to be used to recreate the stored procedure.
This is the query that I use for FOR XML PATH:
SELECT
[View].name AS "#VName", [Module].definition AS "#VDefinition"
FROM
sys.views AS [View]
INNER JOIN
sys.sql_modules AS [Module] ON [Module].object_id = [View].object_id
FOR XML PATH ('View'), TYPE
I read that I should use CDATA for the text using FOR XML EXPLICIT. However, the output of the when I run the following query and view the XML data, it contains those characters also. I need the text to be in plain text without these characters.
This is my query:
SELECT
1 AS Tag,
0 AS Parent,
NULL AS [Database1!1],
NULL AS [StoredProcedure!2!VName],
NULL AS [StoredProcedure!2!cdata]
UNION ALL
SELECT
2 AS Tag,
1 AS Parent,
NULL,
[StoredProcedure].name as [StoredProcedure!2!!CDATA],
[Module].definition as [StoredProcedure!2!!CDATA]
FROM
sys.procedures AS [StoredProcedure]
INNER JOIN
sys.sql_modules [Module] ON [StoredProcedure].object_id = [Module].object_id
WHERE
[StoredProcedure].name NOT LIKE '%diagram%'
FOR XML EXPLICIT
How can I store the text of a the stored procedures that is in plain text? Or when I parse the xml data type to recreate the stored procedure can I deserialize it so that it does not have those characters?
Ideally, I would like to use FOR XML PATH but if that is not possible I will use FOR XML EXPLICIT.
If you want to store data with special characters within XML, there are two options (plus a joke option)
escaping
CDATA
just to mention: Convert everything to base64 or similar would work too :-)
The point is: You do not need this!
The only reason for CDATA (at least for me) is manually created content (copy'n'paste or typing). Whenever you build your XML automatically, you should rely on the implicitly applied escaping.
Why does it bother you, how the data is looking within the XML?
If you read this properly (not with SUBSTRING or other string based methods), you will get it back in the original look.
Try this:
DECLARE #TextWithSpecialCharacters NVARCHAR(100)=N'€ This is' + CHAR(13) + 'strange <ups, angular brackets! > And Ampersand &&&';
SELECT #TextWithSpecialCharacters FOR XML PATH('test');
returns
€ This is
strange <ups, angular brackets! > And Ampersand &&&
But this...
SELECT (SELECT #TextWithSpecialCharacters FOR XML PATH('test'),TYPE).value('/test[1]','nvarchar(100)');
...returns
€ This is
strange <ups, angular brackets! > And Ampersand &&&
Microsoft decided not even to support this with FOR XML (except EXPLICIT, which is a pain in the neck...)
Read two related answers (by me :-) about CDATA)
https://stackoverflow.com/a/38547537/5089204
https://stackoverflow.com/a/39034049/5089204 (with further links...)
When I use, FOR XML PATH, the text within in the stored procedure contains serialized data characters like 
 and
for CRLF and ", etc.
Yes, because that's how XML works. To take a clearer example, suppose your sproc contained this text:
IF #someString = '<' THEN
then to store it in XML, there must be some kind of encoding applied, since you can't have a bare < in the middle of your XML (I hope you can see why).
The real question is then not 'how do I stop my text being encoded when I store it as XML', but rather (as you guess might be the case):
Or when I parse the xml data type to recreate the stored procedure can I deserialize it so that it does not have those characters?
Yes, this is the approach you should be looking at.
You don't how us how you're getting your text out of the XML at the moment. The key thing to remember is that you can't (or rather shouldn't) treat XML as 'text with extra bits' - you should use methods that understand XML.
If you're extracting the text in T-SQL itself, use the various XQuery options. If in C#, use any of the various XML libraries. Just don't do a substring operation and expect that to work...
An example, if you are extracting in T-SQL:
DECLARE #someRandomText nvarchar(max) = 'I am some arbitrary text, eg a sproc definition.
I contain newlines
And arbitrary characters such as < > &
The end.';
-- Pack into XML
DECLARE #asXml xml = ( SELECT #someRandomText FOR XML PATH ('Example'), TYPE );
SELECT #asXml;
-- Extract
DECLARE #textOut nvarchar(max) = ( SELECT #asXml.value('.', 'nvarchar(max)') ) ;
SELECT #textOut;
But you can find many many tutorials on how to get values out of xml-typed data; this is just an example.
SELECT
1 as Tag,
0 as Parent,
[View].name AS 'StoredProcedure!1!Name',
[Module].definition AS 'StoredProcedure!1!Definition!cdata'
FROM sys.views AS [View]
INNER JOIN sys.sql_modules AS [Module] ON [Module].object_id = [View].object_id
FOR XML EXPLICIT
Sample of the output from Adventureworks2012:
<StoredProcedure Name="vStoreWithContacts">
<Definition><![CDATA[
CREATE VIEW [Sales].[vStoreWithContacts] AS
SELECT
s.[BusinessEntityID]
,s.[Name]
,ct.[Name] AS [ContactType]
,p.[Title]
,p.[FirstName]
,p.[MiddleName]
,p.[LastName]
,p.[Suffix]
,pp.[PhoneNumber]
,pnt.[Name] AS [PhoneNumberType]
,ea.[EmailAddress]
,p.[EmailPromotion]
FROM [Sales].[Store] s
INNER JOIN [Person].[BusinessEntityContact] bec
ON bec.[BusinessEntityID] = s.[BusinessEntityID]
INNER JOIN [Person].[ContactType] ct
ON ct.[ContactTypeID] = bec.[ContactTypeID]
INNER JOIN [Person].[Person] p
ON p.[BusinessEntityID] = bec.[PersonID]
LEFT OUTER JOIN [Person].[EmailAddress] ea
ON ea.[BusinessEntityID] = p.[BusinessEntityID]
LEFT OUTER JOIN [Person].[PersonPhone] pp
ON pp.[BusinessEntityID] = p.[BusinessEntityID]
LEFT OUTER JOIN [Person].[PhoneNumberType] pnt
ON pnt.[PhoneNumberTypeID] = pp.[PhoneNumberTypeID];
]]></Definition>
</StoredProcedure>
<StoredProcedure Name="vStoreWithAddresses">
<Definition><![CDATA[
CREATE VIEW [Sales].[vStoreWithAddresses] AS
SELECT
s.[BusinessEntityID]
,s.[Name]
,at.[Name] AS [AddressType]
,a.[AddressLine1]
,a.[AddressLine2]
,a.[City]
,sp.[Name] AS [StateProvinceName]
,a.[PostalCode]
,cr.[Name] AS [CountryRegionName]
FROM [Sales].[Store] s
INNER JOIN [Person].[BusinessEntityAddress] bea
ON bea.[BusinessEntityID] = s.[BusinessEntityID]
INNER JOIN [Person].[Address] a
ON a.[AddressID] = bea.[AddressID]
INNER JOIN [Person].[StateProvince] sp
ON sp.[StateProvinceID] = a.[StateProvinceID]
INNER JOIN [Person].[CountryRegion] cr
ON cr.[CountryRegionCode] = sp.[CountryRegionCode]
INNER JOIN [Person].[AddressType] at
ON at.[AddressTypeID] = bea.[AddressTypeID];
]]></Definition>
As you note there are no 
 /
/ "/ etc and NewLine characters is represented as new line

SQL Server 'To XML' Tag Name

I have the following code in the select block of my query which picks out rows from a table and outputs them in XML:
select ...
...
,substring(
(
Select RC_1.Master_Code AS [TopographyTDR]
From apex.Histo_Result_Coding as RC_1
Where RC_1.Histo_Report = Histo_Result_Coding.Histo_Report
ORDER BY RC_1.Histo_report
For XML auto
), 1, 1000) [TDRCodes]
...
and this gives an output similar to that shown below:
<RC_1 TopographyTDR="T77100"/><RC_1 TopographyTDR="T77100"/>
<RC_1 TopographyTDR="T01000"/><RC_1 TopographyTDR="T01000"/>
<RC_1 TopographyTDR="EGFR "/> <RC_1 TopographyTDR="GHER2"/>
<RC_1 TopographyTDR="T04020"/><RC_1 TopographyTDR="T04020"/>
<RC_1 TopographyTDR="T77100"/><RC_1 TopographyTDR="T77100"/>
This is the correct data, but I need the tag to be 'TopographyTDR' without the RC_1. i.e. the data should look like:
<TopographyTDR="T77100"/><TopographyTDR="T77100"/>
<TopographyTDR="T01000"/><TopographyTDR="T01000"/>
<TopographyTDR="EGFR "/> <TopographyTDR="GHER2"/>
<TopographyTDR="T04020"/><TopographyTDR="T04020"/>
<TopographyTDR="T77100"/><TopographyTDR="T77100"/>
Is there a simple way to do this? i.e. to avoid having the table name appear in the XML tag text?
Thanks in advance.
You can use for xml path instead of for xml auto and specify tag names explicitly.
Something like:
Select RC_1.Master_Code AS 'TopographyTDR'
From apex.Histo_Result_Coding as RC_1
Where RC_1.Histo_Report = Histo_Result_Coding.Histo_Report
ORDER BY RC_1.Histo_report
for XML path('')
Update:
Looking at your desired output more precisely - it doesn't looks like valid xml.
Despite on missing root node (it could be omitted for simplicity, I suppose), this format has fundamental problem: tag like <TopographyTDR="T77100"/> in fact doesn't has tag name but only has attribute TopographyTDR having value T77100. Are you sure you want such a pseudo-xml data?
Your desired format is not allowed. An XML node must have a tag name and a content or attributes. And you'll need a root...
You must use PATH instead of AUTO. Look at this:
select top 3 name
from sys.objects
for xml path(''),ROOT('root');
select top 3 name AS [#attrib]
from sys.objects
for xml path('item'),ROOT('root')

Delete empty XML nodes using T-SQL FOR XML PATH

I'm using FOR XML PATH to construct XML out of a table in SQL Server 2008R2. The XML has to be constructed as follows:
<Root>
<OuterElement>
<NumberNode>1</NumberNode>
<FormattedNumberNode>0001</KFormattedNumberNode>
<InnerContainerElement>
<InnerNodeOne>0240</InnerNodeOne>
<InnerNodeStartDate>201201</InnerNodeStartDate>
</InnerContainerElement>
</OuterElement>
</Root>
According to the schema files, the InnerContainerElement is optional, while the InnerNodeOne is required. The schema files aren't set up by me, are quite complex, referring each other and not having explicit XSD-namespaces, so I can't easily load them into the database.
The XML has to be created from a table, which is filled using the following query:
SELECT
1 AS NumberNode
, '0001' AS [FormattedNumberNode]
, '0240' AS [InnerNodeOne]
, '201201' AS [InnerNodeStartDate]
INTO #temporaryXMLStore
UNION
SELECT
2 AS NumberNode
, '0001' AS [FormattedNumberNode]
, NULL AS [InnerNodeOne]
, NULL AS [InnerNodeStartDate]
I can think of two ways to construct this XML with FOR XML PATH.
1) Using 'InnerContainerElement' as named result from an XML subquery:
SELECT
NumberNode
, [FormattedNumberNode]
, (
SELECT
[InnerNodeOne]
, [InnerNodeStartDate]
FOR XML PATH(''), TYPE
) AS [InnerContainerElement]
FROM #temporaryXMLStore
FOR XML PATH('OuterElement'), ROOT('Root') TYPE
2) Using 'InnerContainerElement' as an output element from an XML subquery, but without naming it:
SELECT
NumberNode
, [FormattedNumberNode]
, (
SELECT
[InnerNodeOne]
, [InnerNodeStartDate]
FOR XML PATH('InnerContainerElement'), TYPE
)
FROM #temporaryXMLStore
FOR XML PATH('OuterElement'), ROOT('Root'), TYPE
However, none of them gives the desired result: in both cases, the result looks like
<Root>
<OuterElement>
<NumberNode>1</NumberNode>
<FormattedNumberNode>0001</FormattedNumberNode>
<InnerContainerElement>
<InnerNodeOne>0240</InnerNodeOne>
<InnerNodeStartDate>201201</InnerNodeStartDate>
</InnerContainerElement>
</OuterElement>
<OuterElement>
<NumberNode>2</NumberNode>
<FormattedNumberNode>0001</FormattedNumberNode>
<InnerContainerElement></InnerContainerElement>
<!-- Or, when using the second codeblock: <InnerContainerElement /> -->
</OuterElement>
</Root>
Whenever InnerContainerElement is empty, it is still displayed as an empty element. This is invalid according to the schema: whenever the element InnerContainerElement is in the XML, InnerNodeOne is required too.
How do I construct my FOR XML PATH query in such a way that the InnerContainerElement is left out whenever it's empty?
You need to make sure that the InnerContainerElement has zero rows for the case when there is no content.
select T.NumberNode,
T.FormattedNumberNode,
(
select T.InnerNodeOne,
T.InnerNodeStartDate
where T.InnerNodeOne is not null or
T.InnerNodeStartDate is not null
for xml path('InnerContainerElement'), type
)
from #temporaryXMLStore as T
for xml path('OuterElement'), root('Root')
Or you could specify the element InnerContainerElement as a part of a column alias.
select T.NumberNode,
T.FormattedNumberNode,
T.InnerNodeOne as 'InnerContainerElement/InnerNodeOne',
T.InnerNodeStartDate as 'InnerContainerElement/InnerNodeStartDate'
from #temporaryXMLStore as T
for xml path('OuterElement'), root('Root')

Resources