Executing a SELECT statement on a cast(text) column that has XML - sql-server

I am trying to retrieve all values in the XML that contains defined values in the WHERE clause but I am only retrieving the first record and not the subsequent records in the IN operator. I am needing to the CAST a text column to XML and then retrieve the records but I am not able to make this work. Any help/direction would be appreciated.
Here is the XML:
<Payment>
<CoverageCd>COLL</CoverageCd>
<LossTypeCd>COLL</LossTypeCd>
<ClaimStatusCd>C</ClaimStatusCd>
<LossPaymentAmt>14596</LossPaymentAmt>
</Payment>
<Payment>
<CoverageCd>LIAB</CoverageCd>
<LossTypeCd>PD</LossTypeCd>
<ClaimStatusCd>C</ClaimStatusCd>
<LossPaymentAmt>3480</LossPaymentAmt>
</Payment>
Here is my SQL code:
SELECT
ad.AplusDataSysID,
CAST(ad.xmlAplus AS XML).value('(/ISO/PassportSvcRs/Reports/Report/ReportData/ISO/PassportSvcRs/PassportInqRs/Match/Claim/Payment/LossTypeCd)[1]','varchar(max)') AS LossTypeCode
FROM
[dbo].[AUT_Policy] p
INNER JOIN
[dbo].[IP_Policy] ip ON p.PolicySysID = ip.Aut_PolicyID
INNER JOIN
[dbo].[AUT_AplusData] ad ON ip.PolicySysID = ad.PolicySysID
WHERE
CAST(ad.xmlAplus AS XML).value('(/ISO/PassportSvcRs/Reports/Report/ReportData/ISO/PassportSvcRs/PassportInqRs/Match/Claim/Payment/LossTypeCd)[1]', 'VARCHAR(MAX)') IN ('BI','PD','COLL','COMP','PIP','UM','MEDPY','TOWL','RENT','OTHR');
Here is my SQL result:
Here is what the SQL result should look like:

It would appear that the XML nodes method is what you need.
-- Sample data
DECLARE #AUT_AplusData TABLE (AplusDataSysID INT, xmlAplus TEXT);
INSERT #AUT_AplusData VALUES (1,
'<Payment>
<CoverageCd>COLL</CoverageCd>
<LossTypeCd>COLL</LossTypeCd>
<ClaimStatusCd>C</ClaimStatusCd>
<LossPaymentAmt>14596</LossPaymentAmt>
</Payment>
<Payment>
<CoverageCd>LIAB</CoverageCd>
<LossTypeCd>PD</LossTypeCd>
<ClaimStatusCd>C</ClaimStatusCd>
<LossPaymentAmt>3480</LossPaymentAmt>
</Payment>');
-- Solution
SELECT
AplusDataSysID = ad.AplusDataSysID,
LossTypeCd = pay.loss.value('(LossTypeCd/text())[1]', 'varchar(8000)')
FROM #AUT_AplusData AS ad
CROSS APPLY (VALUES(CAST(ad.xmlAplus AS XML))) AS x(xmlAplus)
CROSS APPLY x.xmlAplus.nodes('/Payment') AS pay(loss);
Returns:
AplusDataSysID LossTypeCd
---------------- ---------------
1 COLL
1 PD

Related

Split XML field into multiple delimited values - SQL

I have some XML content in a single field; I want to split each xml field in multiple rows.
The XML is something like that:
<env>
<id>id1<\id>
<DailyProperties>
<date>01/01/2022<\date>
<value>1<\value>
<\DailyProperties>
<DailyProperties>
<date>05/05/2022<\date>
<value>2<\value>
<\DailyProperties>
<\env>
I want to put everything in a table as:
ID DATE VALUE
id1 01/01/2022 1
id1 05/05/2022 2
For now I managed to parse the xml value, and I have found something online to get a string into multiple rows (like this), but my string should have some kind of delimiter. I did this:
SELECT
ID,
XMLDATA.X.query('/env/DailyProperties/date').value('.', 'varchar(100)') as r_date,
XMLDATA.X.query('/env/DailyProperties/value').value('.', 'varchar(100)') as r_value
from tableX
outer apply xmlData.nodes('.') as XMLDATA(X)
WHERE ID = 'id1'
but I get all values without a delimiter, as such:
01/10/202202/10/202203/10/202204/10/202205/10/202206/10/202207/10/202208/10/202209/10/202210/10/2022
Or, as in my example:
ID R_DATE R_VALUE
id01 01/01/202205/05/2022 12
I have found out that XQuery has a last() function that return the last value parsed; in my xml example it will return only 05/05/2022, so it should exists something for address the adding of a delimiter. The number of rows could vary, as it could vary the number of days of which I have a value.
Please try the following solution.
I had to fix your XML to make it well-formed.
SQL
DECLARE #tbl TABLE (id INT IDENTITY PRIMARY KEY, xmldata XML);
INSERT INTO #tbl (xmldata) VALUES
(N'<env>
<id>id1</id>
<DailyProperties>
<date>01/01/2022</date>
<value>1</value>
</DailyProperties>
<DailyProperties>
<date>05/05/2022</date>
<value>2</value>
</DailyProperties>
</env>');
SELECT p.value('(id/text())[1]','VARCHAR(20)') AS id
, c.value('(date/text())[1]','VARCHAR(10)') AS [date]
, c.value('(value/text())[1]','INT') AS [value]
FROM #tbl
CROSS APPLY xmldata.nodes('/env') AS t1(p)
OUTER APPLY t1.p.nodes('DailyProperties') AS t2(c);
Output
id
date
value
id1
01/01/2022
1
id1
05/05/2022
2
Yitzhak beat me to it by 2 min. Nonetheless, here's what I have:
--==== XML Data:
DECLARE #xml XML =
'<env>
<id>id1</id>
<DailyProperties>
<date>01/01/2022</date>
<value>1</value>
</DailyProperties>
<DailyProperties>
<date>05/05/2022</date>
<value>2</value>
</DailyProperties>
</env>';
--==== Solution:
SELECT
ID = ff2.xx.value('(text())[1]','varchar(20)'),
[Date] = ff.xx.value('(date/text())[1]', 'date'),
[Value] = ff.xx.value('(value/text())[1]', 'int')
FROM (VALUES(#xml)) AS f(X)
CROSS APPLY f.X.nodes('env/DailyProperties') AS ff(xx)
CROSS APPLY f.X.nodes('env/id') AS ff2(xx);
Returns:
ID Date Value
-------------------- ---------- -----------
id1 2022-01-01 1
id1 2022-05-05 2

Parse XML - Retrieve the Portion Between the Double Quotes

I have the following XML that is in an XML column in SQL Server. I am able to retrieve the data between the tags and list it in table format using the code at the bottom. I can retrieve the values between all the tags except for the one I have in bold below that is in double quotes. I can get the value X just fine but I need to get the 6 that is in between the double quotes in this part: <Organization501cInd organization501cTypeTxt="6">X</Organization501cInd>
WITH XMLNAMESPACES (DEFAULT 'http://www.irs.gov/efile')
SELECT ID, FilingYear, FilingPeriod, FilingType, [FileName]
, Organization501c3Ind = c.value('(//Organization501c3Ind/text())[1]','varchar(MAX)')
, Organization501cInd = c.value('(//Organization501cInd/text())[1]','varchar(MAX)')
, Organization501cTypeTxt = c.value('(//Organization501cTypeTxt/text())[1]','varchar(MAX)')
FROM Form990
CROSS APPLY XMLData.nodes('//Return') AS t(c)
CROSS APPLY XMLData.nodes('//Return/ReturnHeader/Filer') AS t2(c2)
XML:
<ReturnData documentCnt="2">
<IRS990 documentId="IRS990-01" referenceDocumentId="IRS990ScheduleO-01" referenceDocumentName="IRS990ScheduleO ReasonableCauseExplanation" softwareId="19009670">
<PrincipalOfficerNm>CAREY BAKER</PrincipalOfficerNm>
<USAddress>
<AddressLine1Txt>PO BOX 11275</AddressLine1Txt>
<CityNm>TALLAHASSEE</CityNm>
<StateAbbreviationCd>FL</StateAbbreviationCd>
<ZIPCd>32302</ZIPCd>
</USAddress>
<GrossReceiptsAmt>104241</GrossReceiptsAmt>
<GroupReturnForAffiliatesInd>false</GroupReturnForAffiliatesInd>
<Organization501cInd organization501cTypeTxt="6">X</Organization501cInd>
Thoughts?
Without a minimal reproducible example by the OP, shooting from the hip.
SQL
-- DDL and sample data population, start
DECLARE #Form990 TABLE (ID INT IDENTITY PRIMARY KEY, XMLData XML);
INSERT INTO #Form990(XMLData) VALUES
(N'<Return xmlns="http://www.irs.gov/efile" returnVersion="2019v5.1">
<ReturnData documentCnt="2">
<IRS990 documentId="IRS990-01" referenceDocumentId="IRS990ScheduleO-01" referenceDocumentName="IRS990ScheduleO ReasonableCauseExplanation" softwareId="19009670">
<PrincipalOfficerNm>CAREY BAKER</PrincipalOfficerNm>
<USAddress>
<AddressLine1Txt>PO BOX 11275</AddressLine1Txt>
<CityNm>TALLAHASSEE</CityNm>
<StateAbbreviationCd>FL</StateAbbreviationCd>
<ZIPCd>32302</ZIPCd>
</USAddress>
<GrossReceiptsAmt>104241</GrossReceiptsAmt>
<GroupReturnForAffiliatesInd>false</GroupReturnForAffiliatesInd>
<Organization501c3Ind>X</Organization501c3Ind>
<Organization501cInd organization501cTypeTxt="6">X</Organization501cInd>
</IRS990>
</ReturnData>
</Return>');
-- DDL and sample data population, end
WITH XMLNAMESPACES (DEFAULT 'http://www.irs.gov/efile')
SELECT -- ID, FilingYear, FilingPeriod, FilingType, [FileName]
Organization501c3Ind = c.value('(Organization501c3Ind/text())[1]','varchar(MAX)')
, Organization501cInd = c.value('(Organization501cInd/text())[1]','varchar(MAX)')
, Organization501cTypeTxt = c.value('(Organization501cInd/#organization501cTypeTxt)[1]','varchar(MAX)')
FROM #Form990
CROSS APPLY XMLData.nodes('/Return/ReturnData/IRS990') AS t(c)
Output
+----------------------+---------------------+-------------------------+
| Organization501c3Ind | Organization501cInd | Organization501cTypeTxt |
+----------------------+---------------------+-------------------------+
| X | X | 6 |
+----------------------+---------------------+-------------------------+
NOTE: XML element and attribute names are case-sensitive. i.e.: Organization501cTypeTxt will not match an attribute named organization501cTypeTxt.
When extracting attributes you need to use the # accessor in your XPath query. Try something like the following...
WITH XMLNAMESPACES (DEFAULT 'http://www.irs.gov/efile')
SELECT ID, FilingYear, FilingPeriod, FilingType, [FileName],
Organization501cInd = c2.value('(Organization501cInd/text())[1]','varchar(MAX)'),
organization501cTypeTxt = c2.value('(Organization501cInd/#organization501cTypeTxt)[1]','varchar(MAX)')
FROM Form990
CROSS APPLY XMLData.nodes('/ReturnData') AS t(c)
CROSS APPLY t.c.nodes('IRS990') AS t2(c2);

Why TRY_PARSE its so slow?

I have this query that basically returns (right now) only 10 rows as results:
select *
FROM Table1 as o
inner join Table2 as t on t.Field1 = o.Field2
where Code = 123456 and t.FakeData is not null
Now, if I want to parse the field FakeData (which, unfortunately, can contain different types of data, from DateTime to Surname/etc; i.e. nvarchar(70)), for data show and/or filtering:
select *, TRY_PARSE(t.FakeData as date USING 'en-GB') as RealDate
FROM Table1 as o
inner join Table2 as t on t.Field1 = o.Field2
where Code = 123456 and t.FakeData is not null
It takes x10 the query to be executed.
Where am I wrong? How can I speed up?
I can't edit the database, I'm just a customer which read data.
The TSQL documentation for TRY_PARSE makes the following observation:
Keep in mind that there is a certain performance overhead in parsing the string value.
NB: I am assuming your typical date format would be dd/mm/yyyy.
The following is something of a shot-in-the-dark that might help. By progressively assessing the nvarchar column if it is a candidate as a date it is possible to reduce the number of uses of that function. Note that a data point established in one apply can then be referenced in a subsequent apply:
CREATE TABLE mytable(
FakeData NVARCHAR(60) NOT NULL
);
INSERT INTO mytable(FakeData) VALUES (N'oiwsuhd ouhw dcouhw oduch woidhc owihdc oiwhd cowihc');
INSERT INTO mytable(FakeData) VALUES (N'9603200-0297r2-0--824');
INSERT INTO mytable(FakeData) VALUES (N'12/03/1967');
INSERT INTO mytable(FakeData) VALUES (N'12/3/2012');
INSERT INTO mytable(FakeData) VALUES (N'3/3/1812');
INSERT INTO mytable(FakeData) VALUES (N'ohsw dciuh iuh pswiuh piwsuh cpiuwhs dcpiuhws ipdcu wsiu');
select
t.FakeData, oa3.RealDate
from mytable as t
outer apply (
select len(FakeData) as fd_len
) oa1
outer apply (
select case when oa1.fd_len > 10 then 0
when len(replace(FakeData,'/','')) + 2 = oa1.fd_len then 1
else 0
end as is_candidate
) oa2
outer apply (
select case when oa2.is_candidate = 1 then TRY_PARSE(t.FakeData as date USING 'en-GB') end as RealDate
) oa3
FakeData
RealDate
oiwsuhd ouhw dcouhw oduch woidhc owihdc oiwhd cowihc
null
9603200-0297r2-0--824
null
12/03/1967
1967-03-12
12/3/2012
2012-03-12
3/3/1812
1812-03-03
ohsw dciuh iuh pswiuh piwsuh cpiuwhs dcpiuhws ipdcu wsiu
null
db<>fiddle here

Remove string portion from inconsistent string of comma-separated values

SQL Server 2017 on Azure.
Given a field called Categories in a table called dbo.sources:
ID Categories
1 ABC01, FFG02, ERERE, CC201
2 GDF01, ABC01, GREER, DS223
3 DSF12, GREER
4 ABC01
5 NULL
What is the syntax for a query that would remove ABC01 from any record where it exists, but keep the other codes in the string?
Results would be:
ID Categories
1 AFFG02, ERERE, CC201
2 GDF01, GREER, DS223
3 DSF12, GREER
4 NULL
5 NULL
Normalising and then denormalising your data, you can do this:
USE Sandbox;
GO
CREATE TABLE dbo.Sources (ID int,
Categories varchar(MAX));
INSERT INTO dbo.Sources
VALUES (1,'ABC01,FFG02,ERERE,CC201'), --I **assume you don't really have the space)
(2,'GDF01,ABC01,GREER,DS223'),
(3,'DSF12,GREER'),
(4,'ABC01'),
(5,NULL);
GO
DECLARE #Source varchar(5) = 'ABC01'; --Value to remove
WITH CTE AS(
SELECT S.ID,
STRING_AGG(NULLIF(SS.[value],#Source),',') WITHIN GROUP(ORDER BY S.ID) AS Categories
FROM dbo.Sources S
CROSS APPLY STRING_SPLIT(S.Categories,',') SS
GROUP BY S.ID)
UPDATE S
SET Categories = C.Categories
FROM dbo.Sources S
JOIN CTE C ON S.ID = C.ID;
GO
SELECT ID,
Categories
FROM dbo.Sources
GO
DROP TABLE dbo.Sources;
Although this seems like a bit overkill, compared to the REPLACE, it shows why normalising it is a far better idea in the first place, and how simple it is to actually do so.
You can use Replace as follows:
update dbo.sources set
category = replace(replace(category,'ABC01',''),', ','')
where category like '%ABC01%'

Creating permutation via recursive CTE in SQL server?

Looking at :
;WITH cte AS(
SELECT 1 AS x UNION
SELECT 2 AS x UNION
SELECT 3 AS x
)
I can create permutation table for all 3 values :
SELECT T1.x , y=T2.x , z=t3.x
FROM cte T1
JOIN cte T2
ON T1.x != T2.x
JOIN cte T3
ON T2.x != T3.x AND T1.x != T3.x
This uses the power of SQL's cartesian product plus eliminating equal values.
OK.
But is it possible to enhance this recursive pseudo CTE :
;WITH cte AS(
SELECT 1 AS x , 2 AS y , 3 AS z
UNION ALL
...
)
SELECT * FROM cte
So that it will yield same result as :
NB there are other solutions in SO that uses recursive CTE , but it is not spread to columns , but string representation of the permutations
I tried to do the lot in a CTE.
However trying to "redefine" a rowset dynamically is a little tricky. While the task is relatively easy using dynamic SQL doing it without poses some issues.
While this answer may not be the most efficient or straight forward, or even correct in the sense that it's not all CTE it may give others a basis to work from.
To best understand my approach read the comments, but it might be worthwhile looking at each CTE expression in turn with by altering the bit of code below in the main block, with commenting out the section below it.
SELECT * FROM <CTE NAME>
Good luck.
IF OBJECT_ID('tempdb..#cteSchema') IS NOT NULL
DROP Table #cteSchema
GO
-- BASE CTE
;WITH cte AS( SELECT 1 AS x, 2 AS y, 3 AS z),
-- So we know what columns we have from the CTE we extract it to XML
Xml_Schema AS ( SELECT CONVERT(XML,(SELECT * FROM cte FOR XML PATH(''))) AS MySchema ),
-- Next we need to get a list of the columns from the CTE, by querying the XML, getting the values and assigning a num to the column
MyColumns AS (SELECT D.ROWS.value('fn:local-name(.)','SYSNAME') AS ColumnName,
D.ROWS.value('.','SYSNAME') as Value,
ROW_NUMBER() OVER (ORDER BY D.ROWS.value('fn:local-name(.)','SYSNAME')) AS Num
FROM Xml_Schema
CROSS APPLY Xml_Schema.MySchema.nodes('/*') AS D(ROWS) ),
-- How many columns we have in the CTE, used a coupld of times below
ColumnStats AS (SELECT MAX(NUM) AS ColumnCount FROM MyColumns),
-- create a cartesian product of the column names and values, so now we get each column with it's possible values,
-- so {x=1, x =2, x=3, y=1, y=2, y=3, z=1, z=2, z=3} -- you get the idea.
PossibleValues AS (SELECT MyC.ColumnName, MyC.Num AS ColumnNum, MyColumns.Value, MyColumns.Num,
ROW_NUMBER() OVER (ORDER BY MyC.ColumnName, MyColumns.Value, MyColumns.Num ) AS ID
FROM MyColumns
CROSS APPLY MyColumns MyC
),
-- Now we have the possibly values of each "column" we now have to concat the values together using this recursive CTE.
AllRawXmlRows AS (SELECT CONVERT(VARCHAR(MAX),'<'+ISNULL((SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = 1),'')+'>'+Value) as ConcatedValue, Value,ID, Counterer = 1 FROM PossibleValues
UNION ALL
SELECT CONVERT(VARCHAR(MAX),CONVERT(VARCHAR(MAX), AllRawXmlRows.ConcatedValue)+'</'+(SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = Counterer)+'><'+(SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = Counterer+1)+'>'+CONVERT(VARCHAR(MAX),PossibleValues.Value)) AS ConcatedValue, PossibleValues.Value, PossibleValues.ID,
Counterer = Counterer+1
FROM AllRawXmlRows
INNER JOIN PossibleValues ON AllRawXmlRows.ConcatedValue NOT LIKE '%'+PossibleValues.Value+'%' -- I hate this, there has to be a better way of making sure we don't duplicate values....
AND AllRawXmlRows.ID <> PossibleValues.ID
AND Counterer < (SELECT ColumnStats.ColumnCount FROM ColumnStats)
),
-- The above made a list but was missing the final closing XML element. so we add it.
-- we also restict the list to the items that contain all columns, the section above builds it up over many columns
XmlRows AS (SELECT DISTINCT
ConcatedValue +'</'+(SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = Counterer)+'>'
AS ConcatedValue
FROM AllRawXmlRows WHERE Counterer = (SELECT ColumnStats.ColumnCount FROM ColumnStats)
),
-- Wrap the output in row and table tags to create the final XML
FinalXML AS (SELECT (SELECT CONVERT(XML,(SELECT CONVERT(XML,ConcatedValue) FROM XmlRows FOR XML PATH('row'))) FOR XML PATH('table') )as XMLData),
-- Prepare a CTE that represents the structure of the original CTE with
DataTable AS (SELECT cte.*, XmlData
FROM FinalXML, cte)
--SELECT * FROM <CTE NAME>
-- GETS destination columns with XML data.
SELECT *
INTO #cteSchema
FROM DataTable
DECLARE #XML VARCHAR(MAX) ='';
SELECT #Xml = XMLData FROM #cteSchema --Extract XML Data from the
ALTER TABLE #cteSchema DROP Column XMLData -- Removes the superflous column
DECLARE #h INT
EXECUTE sp_xml_preparedocument #h OUTPUT, #XML
SELECT *
FROM OPENXML(#h, '/table/row', 2)
WITH #cteSchema -- just use the #cteSchema to define the structure of the xml that has been constructed
EXECUTE sp_xml_removedocument #h
How about translating 1,2,3 into a column, which will look exactly like the example you started from, and use the same approach ?
;WITH origin (x,y,z) AS (
SELECT 1,2,3
), translated (x) AS (
SELECT col
FROM origin
UNPIVOT ( col FOR cols IN (x,y,z)) AS up
)
SELECT T1.x , y=T2.x , z=t3.x
FROM translated T1
JOIN translated T2
ON T1.x != T2.x
JOIN translated T3
ON T2.x != T3.x AND T1.x != T3.x
ORDER BY 1,2,3
If I understood correctly the request, this might just do the trick.
And to run it on more columns, just need to add them origin cte definition + unpivot column list.
Now, i dont know how you pass your 1 - n values for it to be dynamic, but if you tell me, i could try edit the script to be dynamic too.

Resources