Parse XML column in SQL Server - sql-server

I have a XML column in SQL Server. Example data below
<row id="AC.1.TR.AUD.12800............" xml:space="preserve">
<c1>AC</c1>
<c2>1</c2>
<c3>TR</c3>
<c4>AUD</c4>
<c5>12800</c5>
<c17>20150129</c17>
<c18>CREDIT</c18>
<c18 m="2">DEBIT</c18>
<c19>4289540.22</c19>
<c19 m="2">-17955</c19>
<c20 m="2" />
<c21 m="2" />
<c22>52287350.51</c22>
<c22 m="2">-218862.47</c22>
<c23>-688471.2</c23>
<c23 m="2" />
<c24 m="2">2881.77</c24>
<c32 />
</row>
Starting from column c18 to c24 all are associated. If there are two sets in 18, then there will two group set. However,none of tags inside the two sets are mandatory. I need to parse these into a normal table structure.
Here is the proper output:
RECID C18 C19 C22 C23 C24
AC.1.TR.AUD.12800......... CREDIT 428950.22 52287350.51 -688471.2
AC.1.TR.AUD.12800......... DEBIT -17955.00 -218862.47 2881.77
Note: I have tried nodes and values options but none are helping me out in getting the related values between the tags.

You could use a PIVOT to get the row data into columns:
SELECT ID, C18,C19,C22,C23,C24
FROM
(
SELECT
Loc.value('../#id', 'varchar(255)') ID,
isnull(Loc.value('#m', 'varchar(100)'),'') m,
Loc.value('local-name(.)[1]', 'varchar(100)') c,
Loc.value('.', 'varchar(100)') cValue
FROM #XML.nodes('/row/child::node()') as T(Loc)
) AS SourceTable
PIVOT
(
Max(cValue)
FOR c IN (C18,C19,C22,C23,C24)
) AS PivotTable
order by 1, 2

Related

Split XML field into multiple delimited values - SQL

I have some XML content in a single field; I want to split each xml field in multiple rows.
The XML is something like that:
<env>
<id>id1<\id>
<DailyProperties>
<date>01/01/2022<\date>
<value>1<\value>
<\DailyProperties>
<DailyProperties>
<date>05/05/2022<\date>
<value>2<\value>
<\DailyProperties>
<\env>
I want to put everything in a table as:
ID DATE VALUE
id1 01/01/2022 1
id1 05/05/2022 2
For now I managed to parse the xml value, and I have found something online to get a string into multiple rows (like this), but my string should have some kind of delimiter. I did this:
SELECT
ID,
XMLDATA.X.query('/env/DailyProperties/date').value('.', 'varchar(100)') as r_date,
XMLDATA.X.query('/env/DailyProperties/value').value('.', 'varchar(100)') as r_value
from tableX
outer apply xmlData.nodes('.') as XMLDATA(X)
WHERE ID = 'id1'
but I get all values without a delimiter, as such:
01/10/202202/10/202203/10/202204/10/202205/10/202206/10/202207/10/202208/10/202209/10/202210/10/2022
Or, as in my example:
ID R_DATE R_VALUE
id01 01/01/202205/05/2022 12
I have found out that XQuery has a last() function that return the last value parsed; in my xml example it will return only 05/05/2022, so it should exists something for address the adding of a delimiter. The number of rows could vary, as it could vary the number of days of which I have a value.
Please try the following solution.
I had to fix your XML to make it well-formed.
SQL
DECLARE #tbl TABLE (id INT IDENTITY PRIMARY KEY, xmldata XML);
INSERT INTO #tbl (xmldata) VALUES
(N'<env>
<id>id1</id>
<DailyProperties>
<date>01/01/2022</date>
<value>1</value>
</DailyProperties>
<DailyProperties>
<date>05/05/2022</date>
<value>2</value>
</DailyProperties>
</env>');
SELECT p.value('(id/text())[1]','VARCHAR(20)') AS id
, c.value('(date/text())[1]','VARCHAR(10)') AS [date]
, c.value('(value/text())[1]','INT') AS [value]
FROM #tbl
CROSS APPLY xmldata.nodes('/env') AS t1(p)
OUTER APPLY t1.p.nodes('DailyProperties') AS t2(c);
Output
id
date
value
id1
01/01/2022
1
id1
05/05/2022
2
Yitzhak beat me to it by 2 min. Nonetheless, here's what I have:
--==== XML Data:
DECLARE #xml XML =
'<env>
<id>id1</id>
<DailyProperties>
<date>01/01/2022</date>
<value>1</value>
</DailyProperties>
<DailyProperties>
<date>05/05/2022</date>
<value>2</value>
</DailyProperties>
</env>';
--==== Solution:
SELECT
ID = ff2.xx.value('(text())[1]','varchar(20)'),
[Date] = ff.xx.value('(date/text())[1]', 'date'),
[Value] = ff.xx.value('(value/text())[1]', 'int')
FROM (VALUES(#xml)) AS f(X)
CROSS APPLY f.X.nodes('env/DailyProperties') AS ff(xx)
CROSS APPLY f.X.nodes('env/id') AS ff2(xx);
Returns:
ID Date Value
-------------------- ---------- -----------
id1 2022-01-01 1
id1 2022-05-05 2

Best way to rename XML tag name in SQL Server

Imagine I have xml just like this:
declare #pxml xml =
'<MediaClass>
<MediaStream2Client>
<Title>Test</Title>
<Type>Book</Type>
<Price>1.00</Price>
</MediaStream2Client>
</MediaClass>
'
Number of stream in tag <MediaStream2Client> can be random number from 1 to 100, so I can't simply parse everything from tag <MediaStream2Client>. Is there a way to remove any digit from this tag in SQL server using grep functionality?
XPath queries can be constructed dynamically and/or contain SQL variables or columns such as the following example...
declare #pxml xml = '<MediaClass>
<MediaStream1Client>
<Title>Test1</Title>
<Type>Book1</Type>
<Price>1.00</Price>
</MediaStream1Client>
<MediaStream10Client>
<Title>Test10</Title>
<Type>Book10</Type>
<Price>10.00</Price>
</MediaStream10Client>
<MediaStream100Client>
<Title>Test100</Title>
<Type>Book100</Type>
<Price>100.00</Price>
</MediaStream100Client>
</MediaClass>';
select
ElementName,
MediaStreamClient.value('(Title/text())[1]', 'nvarchar(50)') as Title,
MediaStreamClient.value('(Type/text())[1]', 'nvarchar(50)') as [Type],
MediaStreamClient.value('(Price/text())[1]', 'decimal(18,2)') as Price
from (
--This is just for this example, normally you'd use a Tally Table here...
select top 100 row_number() over (order by a.object_id, a.column_id, b.object_id, b.column_id)
from sys.columns a, sys.columns b
) Tally(N)
cross apply (select concat('MediaStream', N, 'Client')) dyn(ElementName)
cross apply #pxml.nodes('/MediaClass/*[local-name(.) = sql:column("ElementName")]') MediaClass(MediaStreamClient);
This returns the results:
ElementName
Title
Type
Price
MediaStream1Client
Test1
Book1
1.00
MediaStream10Client
Test10
Book10
10.00
MediaStream100Client
Test100
Book100
100.00

How to transform semicolon separated strings of column names and values into a new table

I am relatively new to Snowflake and struggle a bit with setting up a transformation for a semi-structured dataset. I have several log data batches, where each batch (table row in Snowflake) has the following columns: LOG_ID, COLUMN_NAMES, and LOG_ENTRIES .
COLUMN_NAMES contains a semicolon-separated list of columns names, e.g.:
“TIMESTAMP;Sensor A;Sensor B”, “TIMESTAMP;Sensor B;Sensor C”
LOG_ENTRIES:entry contains a semicolon separated list of values, e.g.
“2020-02-11 09:08:19; 99.24;12.25”
The COLUMN_NAMES string can be different between log batches (Snowflake rows), but the names in the order they appear describe the content of the LOG_ENTRIES column values of the same row. My goal is to transform the data into a table that has column names for all unique values present in the COLUMN_NAMES column, e.g.:
LOG_ID
TIMESTAMP
Sensor A
Sensor B
Sensor C
1
2020-02-11 09:08:19
99.24
12.25
NaN
2
2020-02-11 09:10:44
NaN
13.32
0.947
Can this be achieved with a snowflake script, and if so, how? :)
Best regards,
Johan
You should use the SPLIT_TO_TABLE function, split the two values, and join them by index.
After that, all you have to do is use PIVOT to invert the table.
Sample data:
create or replace table splittable (LOG_ID int, COLUMN_NAMES varchar, LOG_ENTRIES varchar);
insert into splittable (LOG_ID, COLUMN_NAMES, LOG_ENTRIES)
values (1, 'TIMESTAMP;Sensor A;Sensor B', '2020-02-11 09:08:19;99.24;12.25'),
(2, 'TIMESTAMP;Sensor B;Sensor C', '2020-02-11 09:10:44;13.32;0.947');
Solution proposal:
WITH src AS (
select LOG_ID, cn.VALUE as COLUMN_NAMES, le.VALUE as LOG_ENTRIES
from splittable as st,
lateral split_to_table(st.COLUMN_NAMES, ';') as cn,
lateral split_to_table(st.LOG_ENTRIES, ';') as le
where cn.INDEX = le.INDEX
)
select * from src
pivot (min(LOG_ENTRIES) for COLUMN_NAMES in ('TIMESTAMP','Sensor A','Sensor B','Sensor C'))
order by LOG_ID;
Reference: SPLIT_TO_TABLE, PIVOT
If the column list is variable and you can't define it then you have to write some generator, maybe it will help: CREATE A DYNAMIC PIVOT IN SNOWFLAKE
You could transform the data into an ACTUAL semi-structured data type that you can then natively query using Snowflake SQL.
WITH x AS (
SELECT column_names, log_entries
FROM (VALUES ('TIMESTAMP_;SENSOR1','2021-02-01'||';1.2')) x (column_names, log_entries)
),
y AS (
SELECT *
FROM x,
LATERAL FLATTEN(input => split(column_names,';')) f
),
z AS (
SELECT *
FROM x,
LATERAL FLATTEN(input => split(log_entries,';')) f
)
SELECT listagg(('"'||y.value||'":"'||z.value||'"'),',') as cnt
, parse_json('{'||cnt||'}') as var
FROM y
JOIN z
ON y.seq = z.seq
AND y.index = z.index
GROUP BY y.seq;

Executing a SELECT statement on a cast(text) column that has XML

I am trying to retrieve all values in the XML that contains defined values in the WHERE clause but I am only retrieving the first record and not the subsequent records in the IN operator. I am needing to the CAST a text column to XML and then retrieve the records but I am not able to make this work. Any help/direction would be appreciated.
Here is the XML:
<Payment>
<CoverageCd>COLL</CoverageCd>
<LossTypeCd>COLL</LossTypeCd>
<ClaimStatusCd>C</ClaimStatusCd>
<LossPaymentAmt>14596</LossPaymentAmt>
</Payment>
<Payment>
<CoverageCd>LIAB</CoverageCd>
<LossTypeCd>PD</LossTypeCd>
<ClaimStatusCd>C</ClaimStatusCd>
<LossPaymentAmt>3480</LossPaymentAmt>
</Payment>
Here is my SQL code:
SELECT
ad.AplusDataSysID,
CAST(ad.xmlAplus AS XML).value('(/ISO/PassportSvcRs/Reports/Report/ReportData/ISO/PassportSvcRs/PassportInqRs/Match/Claim/Payment/LossTypeCd)[1]','varchar(max)') AS LossTypeCode
FROM
[dbo].[AUT_Policy] p
INNER JOIN
[dbo].[IP_Policy] ip ON p.PolicySysID = ip.Aut_PolicyID
INNER JOIN
[dbo].[AUT_AplusData] ad ON ip.PolicySysID = ad.PolicySysID
WHERE
CAST(ad.xmlAplus AS XML).value('(/ISO/PassportSvcRs/Reports/Report/ReportData/ISO/PassportSvcRs/PassportInqRs/Match/Claim/Payment/LossTypeCd)[1]', 'VARCHAR(MAX)') IN ('BI','PD','COLL','COMP','PIP','UM','MEDPY','TOWL','RENT','OTHR');
Here is my SQL result:
Here is what the SQL result should look like:
It would appear that the XML nodes method is what you need.
-- Sample data
DECLARE #AUT_AplusData TABLE (AplusDataSysID INT, xmlAplus TEXT);
INSERT #AUT_AplusData VALUES (1,
'<Payment>
<CoverageCd>COLL</CoverageCd>
<LossTypeCd>COLL</LossTypeCd>
<ClaimStatusCd>C</ClaimStatusCd>
<LossPaymentAmt>14596</LossPaymentAmt>
</Payment>
<Payment>
<CoverageCd>LIAB</CoverageCd>
<LossTypeCd>PD</LossTypeCd>
<ClaimStatusCd>C</ClaimStatusCd>
<LossPaymentAmt>3480</LossPaymentAmt>
</Payment>');
-- Solution
SELECT
AplusDataSysID = ad.AplusDataSysID,
LossTypeCd = pay.loss.value('(LossTypeCd/text())[1]', 'varchar(8000)')
FROM #AUT_AplusData AS ad
CROSS APPLY (VALUES(CAST(ad.xmlAplus AS XML))) AS x(xmlAplus)
CROSS APPLY x.xmlAplus.nodes('/Payment') AS pay(loss);
Returns:
AplusDataSysID LossTypeCd
---------------- ---------------
1 COLL
1 PD

SQL Server group by key and sum over an xml column

As stated in the title i have a table with a composed key (main key and position key) and a Value column that contain an XML (with a fixed schema).
The fixed XML appear like the subsequent :
<Data>
<ItemACount></ItemACount>
<ItemBCount></ItemBCount>
</Data>
Both ItemACount and ItemBCount represent positive integer number.
I would like group records that have the same main key (but different position key) then, for each group calculate the sum of each ItemACount and ItemBCount.
I write the SQL code as below :
SELECT
MainKey AS MainKey
SUM ( [Value].value('/Data/ItemACount/#value') ) AS TotalItemACount ,
SUM ( [Value].value('/Data/ItemBCount/#value') ) AS TotalItemBCount
FROM
[dbo].[tblItems]
GROUP BY
[MainKey]
But I get a syntax error:
Cannot find either column "Value" or the user-defined function or aggregate "Value.value", or the name is ambiguous.
I would like to understand which is the correct syntax.
Try this
SELECT
MainKey AS MainKey
SUM ( [Value].value(('/Data/ItemACount/#value)[1]', 'int')) AS TotalItemACount ,
SUM ( [Value].value(('/Data/ItemBCount/#value)[1]', 'int')) AS TotalItemBCount
FROM [dbo].[tblItems]
GROUP BY [MainKey]

Resources