XML parse issue from a json field - snowflake-cloud-data-platform

I have a json column with an XML field I am trying to parse
{
"guid": "ba410633-d191-deab-fe63-23f732c517aa",
"id": 1847510,
"request":<?xml version='1.0' encoding='UTF-8'?><ConnectRequest xmlns='http://www.acme.com/NetConnect' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.acme.com/API'><EAI>QWE</EAI><DBHost>TEST</DBHost><ReferenceId>aa</ReferenceId><Request xmlns='http://www.acme.com' version='1.0'><Products><PreciseIDServer><XMLVersion>5.0</XMLVersion><Subscriber><Preamble>ABC</Preamble><OpInitials>BCD</OpInitials><SubCode>2436170</SubCode></Subscriber></PreciseIDServer></Products></Request></ConnectRequest>"
}
XML field is like this
<?xml version='1.0' encoding='UTF-8'?>
<ConnectRequest xmlns='http://www.acme.com/NetConnect' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.acme.com/API'>
<EAI>QWE</EAI>
<DBHost>TEST</DBHost>
<ReferenceId>aa</ReferenceId>
<Request xmlns='http://www.acme.com' version='1.0'>
<Products>
<PreciseIDServer>
<XMLVersion>5.0</XMLVersion>
<Subscriber>
<Preamble>ABC</Preamble>
<OpInitials>BCD</OpInitials>
<SubCode>2436170</SubCode>
</Subscriber>
</PreciseIDServer>
</Products>
</Request>
</ConnectRequest>
I have tried the following but no luck, it returns null all the time
select xmlget(request, 'DBHost'):"$" as DBHost
from (select json_field:request::variant request from table)
Is it a datatype issue? I was able to parse another column from a table which has a variant column with xml data using the same xmlget.

with
XML(DBHOST, PRODUCTS, PRECISEIDSERVER, SUBCODE) as
(
select get(xmlget(parse_xml(json_field:request::string), 'DBHost'), '$')::string as DBHOST,
get(xmlget(parse_xml(json_field:request::string), 'Request'), '$')::string as PRODUCTS,
get(xmlget(parse_xml(PRODUCTS::string), 'PreciseIDServer'), '$')::string as PreciseIDServer,
get(xmlget(parse_json(PreciseIDServer)[1], 'SubCode'), '$')::int as SourceCode
from MY_TABLE
)
select DBHOST, SUBCODE from XML;
XMLGET would work that way on a variant column that contains only XML. In this case, the XML is not the variant. The JSON is the variant and the XML is one of the properties. It needs to be treated as a string property, and then manipulated as XML. To do that you need to convert it from a JSON property using ::string. You then need to use PARSE_XML to convert the string into a variant. Finally, you can use xmlget on the variant resulting from PARSE_XML and use GET with the final parameter of $ to indicate that you just want the value, not the begin and end tags too. The final ::string converts that final result into a string so that it doesn't get wrapped in double quotes.
One note - there's a missing close tag in the sample xml that fails when using the PARSE_XML function. I added the close tag for testing. The XML must be valid in order to convert from a string to an XML variant.
Edit: Updated to a CTE to add parsing for SubCode. To get at an XML node that's nested a few layers deep, a good approach is taking it step by step in columns with a CTE to select only the ones you want.

Related

Is that valid XML and how to replicate with SQL Server

I do have to replicate an XML file with SQL Server and I am now stumbling over the following structure inside the XML file and I don't know how to replicate that.
The structure looks like this at the moment for certain tags:
<ART_TAG1>
<UNMLIMITED/>
</ART_TAG1>
<ART_TAG2>
<ART_TAG3>
<Data_Entry/>
</ART_TAG3>
</ART_TAG2>
I am wondering if this is proper XML that the data inside (unlimited and Data_Entry) is enclosed with a closing XML tag. The XML validator https://www.w3schools.com/xml/xml_validator.asp is telling me this is correct. But now I am struggling with replicating that with Transact-SQL.
If I try to replicate that I can only come up with the following TSQL script, which obviously does not fully look like the original.
SELECT 'UNLIMITED' as 'ART_TAG1'
, 'Data_Entry' as 'ART_TAG2/ART_TAG3'
FOR XML PATH(''), ROOT('root')
<root>
<ART_TAG1>UNLIMITED</ART_TAG1>
<ART_TAG2>
<ART_TAG3>Data_Entry</ART_TAG3>
</ART_TAG2>
</root>
If I get this correctly, your question is:
How can I put my query to create those <SomeElement /> tags?
Look at this:
--This will create filled nodes
SELECT 'outer' AS [OuterNode/#attr]
,'inner' AS [OuterNode/InnerNode]
FOR XML PATH('row');
--The empty string is some kind of content
SELECT 'outer' AS [OuterNode/#attr]
,'' AS [OuterNode/InnerNode]
FOR XML PATH('row');
--the missing value (NULL) is omited by default
SELECT 'outer' AS [OuterNode/#attr]
,NULL AS [OuterNode/InnerNode]
FOR XML PATH('row');
--Now check what happens here:
--First XML has an empty element, while the second uses the self-closing element
DECLARE #xml1 XML=
N'<row>
<OuterNode attr="outer">
<InnerNode></InnerNode>
</OuterNode>
</row>';
DECLARE #xml2 XML=
N'<row>
<OuterNode attr="outer">
<InnerNode/>
</OuterNode>
</row>';
SELECT #xml1,#xml2;
The result is the same for both...
Some background: Semantically the empty element <element></element> is exactly the same as the self-closing element <element />. It should not make any difference, whether you use the one or the other. If your consumer cannot deal with this, it is a problem in the reading part.
Yes, you can force any content into XML on string level, but - as the example shows above - this is just a (dangerous) hack.
XML within T-SQL returns - by default - a missing node as NULL and an empty element as empty (depending on the datatype, and beware of the difference between an element and its text() node).
In short: This is nothing you should have to think about...

Select XML multiple only a few nodes with the same name

I'm trying to construct a soap message, and I was able to construct the entire message using a single select. Except the problem is, on only a few occasions the same node name is repeated twice.
So for example the required output result should be like so, with two separate id root nodes:
<SoapDocument>
<recordTarget>
<patientRole>
<id root="1.2.3.4" extension="1234567" />
<id root="1.2.3.5.6" extension="0123456789" />
</patientRole>
</recordTarget>
</SoapDocument>
I tried to use my sparse knowledge of xpath to construct the node names like so:
select
'1.2.3.4' AS 'recordTarget/patientRole/id[1]/#root',
'1234567' AS 'recordTarget/patientRole/id[1]/#extension',
'1.2.3.5.6' AS 'recordTarget/patientRole/id[2]/#root',
'0123456789' AS 'recordTarget/patientRole/id[2]/#extension'
FOR XML PATH('SoapDocument'),TYPE
Apparently xpath naming can't be applied to column names id[1] and id[2] like that? Am I missing something here or should the notation be different? What would be the easiest way to constuct the desired result?
From your question I assume, this is not tabular data, but fixed values and you are creating a medical document, assumably a CDA.
Try this:
SELECT
(
SELECT
'1.2.3.4' AS 'id/#root',
'1234567' AS 'id/#extension',
'',
'1.2.3.5.6' AS 'id/#root',
'0123456789' AS 'id/#extension'
FOR XML PATH('patientRole'),TYPE
) AS [SoapDocument/recordTarget]
FOR XML PATH('')
The result:
<SoapDocument>
<recordTarget>
<patientRole>
<id root="1.2.3.4" extension="1234567" />
<id root="1.2.3.5.6" extension="0123456789" />
</patientRole>
</recordTarget>
</SoapDocument>
Some explanation: The empty element in the middle allows you to place two elements with the same name in one query. There are various approaches how you get this into your surrounding tags. This is just one possibility.
UPDATE
I'd like to point to BdR's own answer! Great finding and worth an up-vote!
A little more elaboration on the answer from Shnugo, as it got me trying out some things using an "empty column".
If you do not give the emtpy column a name, it will reset to the XML root node. So the following columns will start from the XML root of the selection you are in at that point. However, if you explicitly name the empty separator column, then the following columns will continue in the hierarchy as set by that column name.
So the selection below will also result in the desired result. It's subtly different, but in my case it allows me to avoid using subselections.
select
'1.2.3.4' AS 'recordTarget/patientRole/id/#root',
'1234567' AS 'recordTarget/patientRole/id/#extension',
'' AS 'recordTarget/patientRole',
'1.2.3.5.6' AS 'recordTarget/patientRole/id/#root',
'0123456789' AS 'recordTarget/patientRole/id/#extension'
FOR XML PATH('SoapDocument'),TYPE
This should do the job:
WITH CTE AS (
SELECT *
FROM (VALUES('1.2.3.4','1234567'),
('1.2.3.5.6','0123456789')) V ([root], [extension]))
SELECT (SELECT (SELECT (SELECT [root] AS [#root],
[extension] AS [#extension]
FROM CTE
FOR XML PATH('id'), TYPE)
FOR XML PATH('patientRole'), TYPE)
FOR XML PATH ('recordTarget'), TYPE)
FOR XML PATH ('SoapDocument');

Extracting XML in a column from a SQL Server database

I have read dozens of posts and have tried numerous SQL queries to try and get this figured out. Sadly, I'm not a SQL expert (not even a novice) nor am I an XML expert. I understand basic queries from SQL, and understand XML tags, mostly.
I'm trying to query a database table, and have the data show a list of values from a column that contains XML. I'll give you an example of the data. I won't burden you with everything I have tried.
Here is an example of field inside of the column I need. So this is just one row, I would need to query the whole table to get all of the data I need.
When I select * from [table name] it returns hundreds of rows and when I double click in the column name of 'Document' on one row, I get the information I need.
It looks like this:
<code_set xmlns="">
<name>ExampleCodeTable</name>
<last_updated>2010-08-30T17:49:58.7919453Z</last_updated>
<code id="1" last_updated="2010-01-20T17:46:35.1658253-07:00"
start_date="1998-12-31T17:00:00-07:00"
end_date="9999-12-31T16:59:59.9999999-07:00">
<entry locale="en-US" name="T" description="Test1" />
</code>
<code id="2" last_updated="2010-01-20T17:46:35.1658253-07:00"
start_date="1998-12-31T17:00:00-07:00"
end_date="9999-12-31T16:59:59.9999999-07:00">
<entry locale="en-US" name="Z" description="Test2" />
</code>
<displayExpression>[Code] + ' - ' + [Description]</displayExpression>
<sortColumn>[Description]</sortColumn>
</code_set>
Ideally I would write it so it runs the query on the table and produces results like this:
Code Description
--------------------
(Data) (Data)
Any ideas? Is it even possible? The dozens of things I have tried that are always posted in stack, either return Nulls or fail.
Thanks for your help
Try something like this:
SELECT
CodeSetId = xc.value('#id', 'int'),
Description = xc.value('(entry/#description)[1]', 'varchar(50)')
FROM
dbo.YourTableNameHere
CROSS APPLY
YourXmlColumn.nodes('/code_set/code') AS XT(XC)
This basically uses the built-in XQuery to get an "in-memory" table (XT) with a single column (XC), each containing an XML fragment that represents each <code> node inside your <code_set> root node.
Once you have each of these XML fragments, you can use the .value() XQuery operator to "reach in" and grab some pieces of information from it, e.g. it's #id (attribute by the name of id), or the #description attribute on the contained <entry> subelement.
The following query will read the xml field in every row, then shred certain values into a tabular result set.
SELECT
-- get attribute [attribute name] from the parent node
parent.value('./#attribute name','varchar(max)') as ParentAttributeValue,
-- get the text value of the first child node
child.value('./text()', 'varchar(max)') as ChildNodeValueFromFirstChild,
-- get attribute attribute [attribute name] from the first child node
child.value('./#attribute name', 'varchar(max)') as ChildAttributeValueFromFirstChild
FROM
[table name]
CROSS APPLY
-- create a handle named parent that references that <parent node> in each row
[xml field name].nodes('//xpath to parent name') AS ParentName(parent)
CROSS APPLY
-- create a handle named child that references first <child node> in each row
parent.nodes('(xpath from parent/to child)[0]') AS FirstChildNode(child)
GO
Please provide the exact values you want to shred from the XML for a more precise answer.

SQL Server : read XML data

I have this query taken from the site www.SQLauthority.com:
DECLARE #MyXML XML
SET #MyXML = '<SampleXML>
<Colors>
<Color1>White</Color1>
<Color2>Blue</Color2>
<Color3>Black</Color3>
<Color4 Special="Light">Green</Color4>
<Color5>Red</Color5>
</Colors>
<Fruits>
<Fruits1>Apple</Fruits1>
<Fruits2>Pineapple</Fruits2>
<Fruits3>Grapes</Fruits3>
<Fruits4>Melon</Fruits4>
</Fruits>
</SampleXML>'
SELECT
a.b.value('Colors[1]/Color1[1]','varchar(10)') AS Color1,
a.b.value('Colors[1]/Color2[1]','varchar(10)') AS Color2,
a.b.value('Colors[1]/Color3[1]','varchar(10)') AS Color3,
a.b.value('Colors[1]/Color4[1]/#Special','varchar(10)')+' '+
+a.b.value('Colors[1]/Color4[1]','varchar(10)') AS Color4,
a.b.value('Colors[1]/Color5[1]','varchar(10)') AS Color5,
a.b.value('Fruits[1]/Fruits1[1]','varchar(10)') AS Fruits1,
a.b.value('Fruits[1]/Fruits2[1]','varchar(10)') AS Fruits2,
a.b.value('Fruits[1]/Fruits3[1]','varchar(10)') AS Fruits3,
a.b.value('Fruits[1]/Fruits4[1]','varchar(10)') AS Fruits4
FROM #MyXML.nodes('SampleXML') a(b)
I am not getting a better picture of how the nodes fetching from the xml data.
I have few queries regarding this.
what is a(b) in this?
how the structure will change if i have another node inside colors and all the existing child nodes appended to that?
ie:
<Colorss>
<Colors>
<Color1>White</Color1>
<Color2>Blue</Color2>
<Color3>Black</Color3>
<Color4 Special="Light">Green</Color4>
<Color5>Red</Color5>
</Colors>
<Colorss>
<Fruits>
<Fruits1>Apple</Fruits1>
<Fruits2>Pineapple</Fruits2>
<Fruits3>Grapes</Fruits3>
<Fruits4>Melon</Fruits4>
</Fruits>
what does it mean by a.b.value? When I mouse over it shows a is derived table. Can I check value of the table a?
Any help in this will be appreciated.
what is a(b) in this?
The call to .nodes('SampleXML') is a XQuery function which returns a pseudo table which contains one column of an XML fragment for each of the elements that this XPath expression matches - and the a(b) is the table alias (a) for that column, and b is the name of the column in that pseudo table containing the XML fragments.
what does it mean by a.b.value?
This is based on the above - a is the table alias for that temporary, inline pseudo table, b is the column name for the column in that table, and .value() is another XQuery function that will extract a single value from XML, based on the XPath expression (first argument) and it will return it as the datatype specified in the second argument.
You should check out those introductions to XQuery support in SQL Server to understand better:
Introduction to XQuery in SQL Server 2005
XQuery basics
and there are numerous other introductions and tutorials on XQuery - just search with your favorite search engine and you'll get tons of hits!
here's my stab # it:
a-refers to root;b-refers to root and child node
DECLARE #MyXML XML
SET #MyXML = '<SampleXML>
<Colors>
<Color1>White</Color1>
<Color2>Blue</Color2>
<Color3>Black</Color3>
<Color4 Special="Light">Green</Color4>
<Color5>Red
<Color6>Black44</Color6>
<Color7>Black445</Color7>
</Color5>
</Colors>
<Fruits>
<Fruits1>Apple</Fruits1>
<Fruits2>Pineapple</Fruits2>
<Fruits3>Grapes</Fruits3>
<Fruits4>Melon</Fruits4>
</Fruits>
</SampleXML>'
to get an inner child
SELECT
a.c.value('Colors1/Color11','varchar(10)') AS Color1,
a.c.value('Colors1/Color21','varchar(10)') AS Color2,
a.c.value('Colors1/Color31','varchar(10)') AS Color3,
a.c.value('Colors1/Color41/#Special','varchar(10)') AS Color4,
a.c.value('Colors1/Color51','varchar(10)') AS Color5,
a.c.value('Colors1/Color51/Color71','varchar(50)') AS Color6a,
a.c.value('Colors1/Color51/Color61','varchar(50)') AS Color6b, a.c.value('Fruits1/Fruits11','varchar(10)') AS Fruits1,
a.c.value('Fruits1/Fruits21','varchar(10)') AS Fruits2,
a.c.value('Fruits1/Fruits31','varchar(10)') AS Fruits3,
a.c.value('Fruits1/Fruits41','varchar(10)') AS Fruits4
FROM #MyXML.nodes('SampleXML') a(c)
A nodes() method invocation with the query expression /root/Color(n) would return a rowset with three rows, each containing a logical copy of the original XML document, and with the context item set to one of the nodes
see here

SQL Server Querying An XML Field

I have a table that contains some meta data in an XML field.
For example
<Meta>
<From>tst#test.com</From>
<To>
<Address>testing#123.com</Address>
<Address>2#2.com</Address>
</To>
<Subject>ESubject Goes Here</Subject>
</Meta>
I want to then be able to query this field to return the following results
From To Subject
tst#test.com testing#123.com Subject Goes Here
tst#test.com 2#2.com Subject Goes Here
I've written the following query
SELECT
MetaData.query('data(/Meta/From)') AS [From],
MetaData.query('data(/Meta/To/Address)') AS [To],
MetaData.query('data(/Meta/Subject)') AS [Subject]
FROM
Documents
However this only returns one record for that XML field. It combines both the 2 addresses into one result. Is it possible for split these on to separate records?
The result I'm getting is
From To Subject
tst#test.com testing#123.com 2#2.com Subject Goes Here
Thanks
Gav
You need to return the XML and then parse it using something like the following code:
StringReader stream = new StringReader(stringFromSQL);
XmlReader reader = XmlReader.Create(stream);
while (reader.Read())
{
// Do stuff
}
Where stringFromSQL is the whole string as read from your table.

Resources