Snowflake - parsing XML specific value - snowflake-cloud-data-platform

I am trying to get specific value from XML.
My xml looks like in the example
<w:KV w:Key="cardtype" w:Value="MC" />
So I need to get "cardtype" value, in this case result should be "MC"
What query should I use?

You can use the GET function:
select get( parse_xml('<w:KV w:Key="cardtype" w:Value="MC" />') , '#w:Value' ) ;
Returns "MC"

Related

XML parse issue from a json field

I have a json column with an XML field I am trying to parse
{
"guid": "ba410633-d191-deab-fe63-23f732c517aa",
"id": 1847510,
"request":<?xml version='1.0' encoding='UTF-8'?><ConnectRequest xmlns='http://www.acme.com/NetConnect' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.acme.com/API'><EAI>QWE</EAI><DBHost>TEST</DBHost><ReferenceId>aa</ReferenceId><Request xmlns='http://www.acme.com' version='1.0'><Products><PreciseIDServer><XMLVersion>5.0</XMLVersion><Subscriber><Preamble>ABC</Preamble><OpInitials>BCD</OpInitials><SubCode>2436170</SubCode></Subscriber></PreciseIDServer></Products></Request></ConnectRequest>"
}
XML field is like this
<?xml version='1.0' encoding='UTF-8'?>
<ConnectRequest xmlns='http://www.acme.com/NetConnect' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.acme.com/API'>
<EAI>QWE</EAI>
<DBHost>TEST</DBHost>
<ReferenceId>aa</ReferenceId>
<Request xmlns='http://www.acme.com' version='1.0'>
<Products>
<PreciseIDServer>
<XMLVersion>5.0</XMLVersion>
<Subscriber>
<Preamble>ABC</Preamble>
<OpInitials>BCD</OpInitials>
<SubCode>2436170</SubCode>
</Subscriber>
</PreciseIDServer>
</Products>
</Request>
</ConnectRequest>
I have tried the following but no luck, it returns null all the time
select xmlget(request, 'DBHost'):"$" as DBHost
from (select json_field:request::variant request from table)
Is it a datatype issue? I was able to parse another column from a table which has a variant column with xml data using the same xmlget.
with
XML(DBHOST, PRODUCTS, PRECISEIDSERVER, SUBCODE) as
(
select get(xmlget(parse_xml(json_field:request::string), 'DBHost'), '$')::string as DBHOST,
get(xmlget(parse_xml(json_field:request::string), 'Request'), '$')::string as PRODUCTS,
get(xmlget(parse_xml(PRODUCTS::string), 'PreciseIDServer'), '$')::string as PreciseIDServer,
get(xmlget(parse_json(PreciseIDServer)[1], 'SubCode'), '$')::int as SourceCode
from MY_TABLE
)
select DBHOST, SUBCODE from XML;
XMLGET would work that way on a variant column that contains only XML. In this case, the XML is not the variant. The JSON is the variant and the XML is one of the properties. It needs to be treated as a string property, and then manipulated as XML. To do that you need to convert it from a JSON property using ::string. You then need to use PARSE_XML to convert the string into a variant. Finally, you can use xmlget on the variant resulting from PARSE_XML and use GET with the final parameter of $ to indicate that you just want the value, not the begin and end tags too. The final ::string converts that final result into a string so that it doesn't get wrapped in double quotes.
One note - there's a missing close tag in the sample xml that fails when using the PARSE_XML function. I added the close tag for testing. The XML must be valid in order to convert from a string to an XML variant.
Edit: Updated to a CTE to add parsing for SubCode. To get at an XML node that's nested a few layers deep, a good approach is taking it step by step in columns with a CTE to select only the ones you want.

Change value of a xml root node attribute

I am trying to modify the attribute of an XML root attribute in XQuery with T-SQL but I don't manage to do that. My XML has a namespace in it and I can't exactly bypass this. When I query the value of the xml I successfully retrieve the value because I use: ;WITH XMLNAMESPACES (DEFAULT 'some:namespace:here:v1'). I also tried to use: 'declare default element namespace "some:namespace:here:v1";' in the XQuery but does not seem to work.
Any ideas of how can I achieve this ?
This is an example of the XML I am trying to modify.
DECLARE #XML_TO_READ XML = N'
<F2101 xmlns="some:namespace:here:v1" propertyToModify="valueToModify">
<person xmlns="some:namespace:here:v1" anotherPropertyToModify="anotherValueToModify" />
</F2101>'
I retrieve the value like this:
;WITH XMLNAMESPACES (DEFAULT 'some:namespace:here:v1')
SELECT propertyToModify =
#XML_TO_READ.value('(/F2101/#propertyToModify)[1]', 'nvarchar(50)')
And I tried to modify (update) the value like this:
SET #XML_TO_READ.modify('
declare default element namespace "some:namespace:here:v1";
replace value of (/F2101/propertyToModify/text())[1] with ("modifiedValue")')
I tried multiple solutions but I did not find anything that would work for my special case here.
Thanks in advance.
Your statement should be:
SET #XML_TO_READ.modify('
declare default element namespace "some:namespace:here:v1";
replace value of (/F2101/#propertyToModify)[1] with ("modifiedValue")')
Note the "#propertyToModify" rather than "propertyToModify/text()".
Also, documentation link: replace value of (XML DML).

SQL XML parsing using a attribute value supplied by another field in the same row

Context: I'm scraping some XML form descriptions from a Web Services table in hopes of using that name to identify what the user has inputted as response. Since this description changes for each step (row) of the process and each product I want something that can evaluate dynamically.
What I tried: The following was quite useful but it returns a dynamic attribute query result in it's own field ans using a coalesce to reduce the results as one field would lead to it's own complications: Get values from XML tags with dynamically specified data fields
Current Attempt:
I'm using the following code to generate the attribute name that I will use in the next step to query the attribute's value:
case when left([Return], 5) = '<?xml'
then lower(cast([Return] as xml).value('(/response/form/*/#name)[1]','varchar(30)'))
else ''
end as [FormRequest]
And as part of step 2 I have used the STUFF function to try and make the row-level query possible
case when len(FormRequest)>0
then stuff( ',' + 'cast([tmpFormResponse] as xml).value(''(/wrapper/#' + [FormRequest] + ')[1]'',''varchar(max)'')', 1, 1, '')
else ''
end as [FormResponse]
Instead of seeing 1 returned as my FormReponse feild value for the submit attribute (please see in yellow below) it's returning the query text -- cast([tmpFormResponse] as xml).value('(/wrapper/#submit)1','varchar(max)') -- instead (that which should be queried).
How should I action the value method so that I can dynamically strip out the response per row of XML data in tmpFormResponse based on the field value in the FormRequest field?
Thanx
You can check this out:
DECLARE #xml XML=
N'<root>
<SomeAttributes a="a" b="b" c="c"/>
<SomeAttributes a="aa" b="bb" c="cc"/>
</root>';
DECLARE #localName NVARCHAR(100)='b';
SELECT sa.value(N'(./#*[local-name()=sql:variable("#localName")])[1]','nvarchar(max)')
FROM #xml.nodes(N'/root/SomeAttributes') AS A(sa)
Ended up hacking up a solution to the problem by using PATINDEX and CHARINDEX to look for the value in the [FormRequest] field in the he tmpFormResponse field.

Trouble with XQuery SQL statement to get correct parent/child value

XML Schema: (Assumes that the XMLNAMESPACES are already set AS a and b)
<Layer1 xmlns="a">
<Layer2 xmlns="b">
<Layer3>
<id>val1</id>
<data>False</data>
</Layer3>
<Layer3>
<id>val2</id>
<data>True</data>
</Layer3>
</Layer2>
</Layer1>
I am using this bit of sql to attempt to accomplinsh my task.
ITEM.value('(/a:Layer1/b:Layer2/b:Layer3)[1]', 'varchar(max)') AS ReturnValue
What I am trying to do is get only the true values where id='val2' AND data='True'. Something like this:
ITEM.value('(/a:Layer1/b:Layer2/b:Layer3[id="val2" and data="True"]/b:data)[0]', 'varchar(max)') AS ReturnValue
For some reason the above returns null. I'd assume it is a syntax error.
In this query I would like the value to be returned as True from any Layer3 Parent where the id and data are following the conditions. I'd appreciate any help and thank you in advance.
From your code I take, that this is SQL-Server...
Your XML defines a default namespace for the first node <Layer1> and again a default namespace for the <Layer2>.
That means, that all elements below <Layer2> are in this new default namespace.
Your code misses the namespaces here [id="val2" and data="True"], but I do not really understand what you are trying to achieve actually...
DECLARE #ITEM XML=
N'<Layer1 xmlns="a">
<Layer2 xmlns="b">
<Layer3>
<id>val1</id>
<data>False</data>
</Layer3>
<Layer3>
<id>val2</id>
<data>True</data>
</Layer3>
</Layer2>
</Layer1>';
--This query will find the value "True" within <data> below a <Layer3>, where the <id> is "val2":
WITH XMLNAMESPACES('a' AS a,'b' AS b)
SELECT #ITEM.value('(/a:Layer1/b:Layer2/b:Layer3[b:id="val2"]/b:data/text())[1]', 'varchar(max)');
You might use a *: in front of each and any name to use a namespace wildcard:
SELECT #ITEM.value('(/*:Layer1/*:Layer2/*:Layer3[*:id="val2"]/*:data/text())[1]', 'varchar(max)');
One more approach was to define your inner default ns like this, which saves some typing:
WITH XMLNAMESPACES('a' AS a, DEFAULT 'b')
SELECT #ITEM.value('(/a:Layer1/Layer2/Layer3[id="val2"]/data/text())[1]', 'varchar(max)');

How do you add attributes to an existing Root XML Node using TSQL?

I've constructed some XML in TSQL.
declare #requestXML xml
set #requestXML = (
select #dataXML
for xml raw ('rtEvent')
The general output for what I now have follows the pattern resembling this:
<rtEvent>
<ctx>
.....
</ctx>
</rtEvent>
What I'd like to do now is add some attributes and values to the rtEvent root
element node but I'm not certain how to achieve it.
I've looked at the Modify method of the XML object and have observed the insert, replace value of, and delete operations but cannot seem to figure out how to use any of them to achieve the results I'm after.
Basically, I want to be able to modify the root node to reflect something like:
<rtEvent type="customType" email="someaddress#domain.com"
origin="eCommerce" wishedChannel="0" externalId="5515">
<ctx>
...
</ctx>
</rtEvent>
Should I be using the documented XML.Modify or is there a better method? How should it be done?
Just in case you wanted to see the modify method way of doing it:
DECLARE #requestXML XML = '<rtEvent><ctx>...</ctx></rtEvent>'
SET #requestXML.modify(
'insert
(
attribute type {"customeType"},
attribute email {"someaddress#domain.com"},
attribute origin {"eCommerce"},
attribute wishedChannel {"0"},
attribute externalId {"5515"}
)
into (/rtEvent)[1]')
SELECT #requestXML
it returns this:
<rtEvent type="customeType" email="someaddress#domain.com" origin="eCommerce" wishedChannel="0" externalId="5515">
<ctx>...</ctx>
</rtEvent>
Better use FOR XML PATH, which allows you to specify the naming and aliases as you like them:
SELECT 'SomeContext' AS [ctx]
FOR XML PATH('rtEvent')
This will return this:
<rtEvent>
<ctx>SomeContext</ctx>
</rtEvent>
But with the right attributes you get this:
SELECT 'customType' AS [#type]
,'someaddress#domain.com' AS [#email]
,'eCommerce' AS [#origin]
,0 AS [#wishedChannel]
,5515 AS [#externalId]
,'SomeContext' AS [ctx]
FOR XML PATH('rtEvent')
The result
<rtEvent type="customType" email="someaddress#domain.com" origin="eCommerce" wishedChannel="0" externalId="5515">
<ctx>SomeContext</ctx>
</rtEvent>

Resources