I have some XML content in a single field; I want to split each xml field in multiple rows.
The XML is something like that:
<env>
<id>id1<\id>
<DailyProperties>
<date>01/01/2022<\date>
<value>1<\value>
<\DailyProperties>
<DailyProperties>
<date>05/05/2022<\date>
<value>2<\value>
<\DailyProperties>
<\env>
I want to put everything in a table as:
ID DATE VALUE
id1 01/01/2022 1
id1 05/05/2022 2
For now I managed to parse the xml value, and I have found something online to get a string into multiple rows (like this), but my string should have some kind of delimiter. I did this:
SELECT
ID,
XMLDATA.X.query('/env/DailyProperties/date').value('.', 'varchar(100)') as r_date,
XMLDATA.X.query('/env/DailyProperties/value').value('.', 'varchar(100)') as r_value
from tableX
outer apply xmlData.nodes('.') as XMLDATA(X)
WHERE ID = 'id1'
but I get all values without a delimiter, as such:
01/10/202202/10/202203/10/202204/10/202205/10/202206/10/202207/10/202208/10/202209/10/202210/10/2022
Or, as in my example:
ID R_DATE R_VALUE
id01 01/01/202205/05/2022 12
I have found out that XQuery has a last() function that return the last value parsed; in my xml example it will return only 05/05/2022, so it should exists something for address the adding of a delimiter. The number of rows could vary, as it could vary the number of days of which I have a value.
Please try the following solution.
I had to fix your XML to make it well-formed.
SQL
DECLARE #tbl TABLE (id INT IDENTITY PRIMARY KEY, xmldata XML);
INSERT INTO #tbl (xmldata) VALUES
(N'<env>
<id>id1</id>
<DailyProperties>
<date>01/01/2022</date>
<value>1</value>
</DailyProperties>
<DailyProperties>
<date>05/05/2022</date>
<value>2</value>
</DailyProperties>
</env>');
SELECT p.value('(id/text())[1]','VARCHAR(20)') AS id
, c.value('(date/text())[1]','VARCHAR(10)') AS [date]
, c.value('(value/text())[1]','INT') AS [value]
FROM #tbl
CROSS APPLY xmldata.nodes('/env') AS t1(p)
OUTER APPLY t1.p.nodes('DailyProperties') AS t2(c);
Output
id
date
value
id1
01/01/2022
1
id1
05/05/2022
2
Yitzhak beat me to it by 2 min. Nonetheless, here's what I have:
--==== XML Data:
DECLARE #xml XML =
'<env>
<id>id1</id>
<DailyProperties>
<date>01/01/2022</date>
<value>1</value>
</DailyProperties>
<DailyProperties>
<date>05/05/2022</date>
<value>2</value>
</DailyProperties>
</env>';
--==== Solution:
SELECT
ID = ff2.xx.value('(text())[1]','varchar(20)'),
[Date] = ff.xx.value('(date/text())[1]', 'date'),
[Value] = ff.xx.value('(value/text())[1]', 'int')
FROM (VALUES(#xml)) AS f(X)
CROSS APPLY f.X.nodes('env/DailyProperties') AS ff(xx)
CROSS APPLY f.X.nodes('env/id') AS ff2(xx);
Returns:
ID Date Value
-------------------- ---------- -----------
id1 2022-01-01 1
id1 2022-05-05 2
Imagine I have xml just like this:
declare #pxml xml =
'<MediaClass>
<MediaStream2Client>
<Title>Test</Title>
<Type>Book</Type>
<Price>1.00</Price>
</MediaStream2Client>
</MediaClass>
'
Number of stream in tag <MediaStream2Client> can be random number from 1 to 100, so I can't simply parse everything from tag <MediaStream2Client>. Is there a way to remove any digit from this tag in SQL server using grep functionality?
XPath queries can be constructed dynamically and/or contain SQL variables or columns such as the following example...
declare #pxml xml = '<MediaClass>
<MediaStream1Client>
<Title>Test1</Title>
<Type>Book1</Type>
<Price>1.00</Price>
</MediaStream1Client>
<MediaStream10Client>
<Title>Test10</Title>
<Type>Book10</Type>
<Price>10.00</Price>
</MediaStream10Client>
<MediaStream100Client>
<Title>Test100</Title>
<Type>Book100</Type>
<Price>100.00</Price>
</MediaStream100Client>
</MediaClass>';
select
ElementName,
MediaStreamClient.value('(Title/text())[1]', 'nvarchar(50)') as Title,
MediaStreamClient.value('(Type/text())[1]', 'nvarchar(50)') as [Type],
MediaStreamClient.value('(Price/text())[1]', 'decimal(18,2)') as Price
from (
--This is just for this example, normally you'd use a Tally Table here...
select top 100 row_number() over (order by a.object_id, a.column_id, b.object_id, b.column_id)
from sys.columns a, sys.columns b
) Tally(N)
cross apply (select concat('MediaStream', N, 'Client')) dyn(ElementName)
cross apply #pxml.nodes('/MediaClass/*[local-name(.) = sql:column("ElementName")]') MediaClass(MediaStreamClient);
This returns the results:
ElementName
Title
Type
Price
MediaStream1Client
Test1
Book1
1.00
MediaStream10Client
Test10
Book10
10.00
MediaStream100Client
Test100
Book100
100.00
I am trying to query XML with SQL. Suppose I have the following XML.
<xml>
<dataSetData>
<text>ABC</text>
</dataSetData>
<generalData>
<id>123</id>
<text>text data</text>
</generalData>
<generalData>
<id>456</id>
<text>text data 2</text>
</generalData>
<specialData>
<id>123</id>
<text>special data text</text>
</specialData>
<specialData>
<id>456</id>
<text>special data text 2</text>
</specialData>
</xml>
I want to write a SELECT query that returns 2 rows as follows:
DataSetData | GeneralDataID | GeneralDataText | SpecialDataTest
ABC | 123 | text data | special data text
ABC | 456 | text data 2 | special data text 2
My current approach is as follows:
SELECT
dataset.nodes.value('(dataSetData/text)[1]', 'nvarchar(500)'),
general.nodes.value('(generalData/text)[1]', 'nvarchar(500)'),
special.nodes.value('(specialData/text)[1]', 'nvarchar(500)'),
FROM #MyXML.nodes('xml') AS dataset(nodes)
OUTER APPLY #MyXML.nodes('xml/generalData') AS general(nodes)
OUTER APPLY #MyXML.nodes('xml/specialData') AS special(nodes)
WHERE
general.nodes.value('(generalData/text/id)[1]', 'nvarchar(500)') = special.nodes.value('(specialData/text/id)[1]', 'nvarchar(500)')
What I do not like here is that I have to use OUTER APPLY twice and that I have to use the WHERE clause to JOIN the correct elements.
My question therefore is: Is it possible to construct the query in a way where I do not have to use the WHERE clause in such a way, because I am pretty sure that this affects performance very negatively if files become larger.
Shouldn't it be possible to JOIN the correct nodes (that is, the corresponding generalData and specialData nodes) with some XPATH statement?
Your XPath expressions are completely off.
Please try the following. It is pretty efficient. You can test its performance with a large XML.
SQL
-- DDL and sample data population, start
DECLARE #xml XML =
N'<xml>
<dataSetData>
<text>ABC</text>
</dataSetData>
<generalData>
<id>123</id>
<text>text data</text>
</generalData>
<generalData>
<id>456</id>
<text>text data 2</text>
</generalData>
<specialData>
<id>123</id>
<text>special data text</text>
</specialData>
<specialData>
<id>456</id>
<text>special data text 2</text>
</specialData>
</xml>';
-- DDL and sample data population, end
SELECT c.value('(dataSetData/text/text())[1]', 'VARCHAR(20)') AS DataSetData
, g.value('(id/text())[1]', 'INT') AS GeneralDataID
, g.value('(text/text())[1]', 'VARCHAR(30)') AS GeneralDataText
, sp.value('(id/text())[1]', 'INT') AS SpecialDataID
, sp.value('(text/text())[1]', 'VARCHAR(30)') AS SpecialDataTest
FROM #xml.nodes('/xml') AS t(c)
OUTER APPLY c.nodes('generalData') AS general(g)
OUTER APPLY c.nodes('specialData') AS special(sp)
WHERE g.value('(id/text())[1]', 'INT') = sp.value('(id/text())[1]', 'INT');
Output
+-------------+---------------+-----------------+---------------+---------------------+
| DataSetData | GeneralDataID | GeneralDataText | SpecialDataID | SpecialDataTest |
+-------------+---------------+-----------------+---------------+---------------------+
| ABC | 123 | text data | 123 | special data text |
| ABC | 456 | text data 2 | 456 | special data text 2 |
+-------------+---------------+-----------------+---------------+---------------------+
I want to suggest one more solution:
DECLARE #xml XML=
N'<xml>
<dataSetData>
<text>ABC</text>
</dataSetData>
<generalData>
<id>123</id>
<text>text data</text>
</generalData>
<generalData>
<id>456</id>
<text>text data 2</text>
</generalData>
<specialData>
<id>123</id>
<text>special data text</text>
</specialData>
<specialData>
<id>456</id>
<text>special data text 2</text>
</specialData>
</xml>';
--The query
SELECT #xml.value('(/xml/dataSetData/text/text())[1]','varchar(100)')
,B.*
,#xml.value('(/xml/specialData[(id/text())[1] cast as xs:int? = sql:column("B.General_Id")]/text/text())[1]','varchar(100)') AS Special_Text
FROM #xml.nodes('/xml/generalData') A(gd)
CROSS APPLY(SELECT A.gd.value('(id/text())[1]','int') AS General_Id
,A.gd.value('(text/text())[1]','varchar(100)') AS General_Text) B;
The idea in short:
We can read the <dataSetData>, as it is not repeating, directly from the variable.
We can use .nodes() to get a derived set of all <generalData> entries.
Now the magic trick: I use APPLY to get the values from the XML as regular columns into the result set.
This trick allows now to use sql:column() in order to build a XQuery predicate to find the corresponding <specialData>.
One more approach with FLWOR
You might try this:
SELECT #xml.query
('
<xml>
{
for $i in distinct-values(/xml/generalData/id/text())
return
<combined dsd="{/xml/dataSetData/text/text()}"
id="{$i}"
gd="{/xml/generalData[id=$i]/text/text()}"
sd="{/xml/specialData[id=$i]/text/text()}"/>
}
</xml>
');
The result
<xml>
<combined dsd="ABC" id="123" gd="text data" sd="special data text" />
<combined dsd="ABC" id="456" gd="text data 2" sd="special data text 2" />
</xml>
The idea in short:
With the help of distinct-values() we get a list of all id values in your XML
we can iterate this and pick the corresponding values
We return the result as a re-structured XML
Now you can use .nodes('/xml/combined') against this new XML and retrieve all values easily.
Performance test
I just want to add a performance test:
CREATE TABLE dbo.TestXml(TheXml XML);
INSERT INTO dbo.TestXml VALUES
(
(
SELECT 'blah1' AS [dataSetData/text]
,(SELECT o.[object_id] AS [id]
,o.[name] AS [text]
FROM sys.objects o
FOR XML PATH('generalData'),TYPE)
,(SELECT o.[object_id] AS [id]
,o.create_date AS [text]
FROM sys.objects o
FOR XML PATH('specialData'),TYPE)
FOR XML PATH('xml'),TYPE
)
)
,(
(
SELECT 'blah2' AS [dataSetData/text]
,(SELECT o.[object_id] AS [id]
,o.[name] AS [text]
FROM sys.objects o
FOR XML PATH('generalData'),TYPE)
,(SELECT o.[object_id] AS [id]
,o.create_date AS [text]
FROM sys.objects o
FOR XML PATH('specialData'),TYPE)
FOR XML PATH('xml'),TYPE
)
)
,(
(
SELECT 'blah3' AS [dataSetData/text]
,(SELECT o.[object_id] AS [id]
,o.[name] AS [text]
FROM sys.objects o
FOR XML PATH('generalData'),TYPE)
,(SELECT o.[object_id] AS [id]
,o.create_date AS [text]
FROM sys.objects o
FOR XML PATH('specialData'),TYPE)
FOR XML PATH('xml'),TYPE
)
);
GO
--just a dummy call to avoid *first call bias*
SELECT x.query('.') FROM dbo.TestXml
CROSS APPLY TheXml.nodes('/xml//*') A(x)
GO
DECLARE #t DATETIME2=SYSUTCDATETIME();
--My first approach
SELECT TheXml.value('(/xml/dataSetData/text/text())[1]','varchar(100)') AS DataSetValue
,B.*
,TheXml.value('(/xml/specialData[(id/text())[1] cast as xs:int? = sql:column("B.General_Id")]/text/text())[1]','varchar(100)') AS Special_Text
INTO dbo.testResult1
FROM dbo.TestXml
CROSS APPLY TheXml.nodes('/xml/generalData') A(gd)
CROSS APPLY(SELECT A.gd.value('(id/text())[1]','int') AS General_Id
,A.gd.value('(text/text())[1]','varchar(100)') AS General_Text) B;
SELECT DATEDIFF(MILLISECOND,#t,SYSUTCDATETIME());
GO
DECLARE #t DATETIME2=SYSUTCDATETIME();
--My second approach
SELECT B.c.value('#dsd','varchar(100)') AS dsd
,B.c.value('#id','int') AS id
,B.c.value('#gd','varchar(100)') AS gd
,B.c.value('#sd','varchar(100)') AS sd
INTO dbo.TestResult2
FROM dbo.TestXml
CROSS APPLY (SELECT TheXml.query
('
<xml>
{
for $i in distinct-values(/xml/generalData/id/text())
return
<combined dsd="{/xml/dataSetData/text/text()}"
id="{$i}"
gd="{/xml/generalData[id=$i]/text/text()}"
sd="{/xml/specialData[id=$i]/text/text()}"/>
}
</xml>
') AS ResultXml) A
CROSS APPLY A.ResultXml.nodes('/xml/combined') B(c)
SELECT DATEDIFF(MILLISECOND,#t,SYSUTCDATETIME());
GO
DECLARE #t DATETIME2=SYSUTCDATETIME();
--Yitzhak'S approach
SELECT c.value('(dataSetData/text/text())[1]', 'VARCHAR(20)') AS DataSetData
, g.value('(id/text())[1]', 'INT') AS GeneralDataID
, g.value('(text/text())[1]', 'VARCHAR(30)') AS GeneralDataText
, sp.value('(id/text())[1]', 'INT') AS SpecialDataID
, sp.value('(text/text())[1]', 'VARCHAR(30)') AS SpecialDataTest
INTO dbo.TestResult3
FROM dbo.TestXml
CROSS APPLY TheXml.nodes('/xml') AS t(c)
OUTER APPLY c.nodes('generalData') AS general(g)
OUTER APPLY c.nodes('specialData') AS special(sp)
WHERE g.value('(id/text())[1]', 'INT') = sp.value('(id/text())[1]', 'INT');
SELECT DATEDIFF(MILLISECOND,#t,SYSUTCDATETIME());
GO
SELECT * FROM TestResult1;
SELECT * FROM TestResult2;
SELECT * FROM TestResult3;
GO
--careful with real data!
DROP TABLE testResult1
DROP TABLE testResult2
DROP TABLE testResult3
DROP TABLE dbo.TestXml;
The result is clearly pointing against XQuery. (Someone might say so sad! now :-) ).
The predicate approach is by far the slowest (4700ms). The FLWOR approach is on rank 2 (1200ms) and the winner is - tatatataaaaa - Yitzhak's approach (400ms, by factor ~10!).
Which solution is best for you, will depend on the actual data (count of elements per XML, count of XMLs and so on). But the visual elegance is - regrettfully - not the only parameter for this choice :-)
Sorry to add this as another answer, but I don't want to add to the other answer. It's big enough already :-)
A combination of Yitzhak's and mine is even faster:
--This is the additional code to be placed into the performance comparison
DECLARE #t DATETIME2=SYSUTCDATETIME();
SELECT TheXml.value('(/xml/dataSetData/text/text())[1]', 'VARCHAR(20)') AS DataSetData
,B.*
, sp.value('(id/text())[1]', 'INT') AS SpecialDataID
, sp.value('(text/text())[1]', 'VARCHAR(30)') AS SpecialDataTest
INTO dbo.TestResult4
FROM dbo.TestXml
CROSS APPLY TheXml.nodes('/xml/generalData') AS A(g)
CROSS APPLY(SELECT g.value('(id/text())[1]', 'INT') AS GeneralDataID
, g.value('(text/text())[1]', 'VARCHAR(30)') AS GeneralDataText) B
OUTER APPLY TheXml.nodes('/xml/specialData[id=sql:column("B.GeneralDataID")]') AS special(sp);
SELECT DATEDIFF(MILLISECOND,#t,SYSUTCDATETIME());
The idea in short:
We read the <dataSetData> directly (no repetition)
We use APPLY .nodes() to get all <generalData>
We use APPLY SELECT to fetch the values of <generalData> elements as real columns.
We use another APPLY .nodes() to fetch the corresponding <specialData> elements
One advantage of this solution: If there might be more than one special-data entry per general-data element, this would work too.
This is now the fastest in my test (~300ms).
I am trying to retrieve all values in the XML that contains defined values in the WHERE clause but I am only retrieving the first record and not the subsequent records in the IN operator. I am needing to the CAST a text column to XML and then retrieve the records but I am not able to make this work. Any help/direction would be appreciated.
Here is the XML:
<Payment>
<CoverageCd>COLL</CoverageCd>
<LossTypeCd>COLL</LossTypeCd>
<ClaimStatusCd>C</ClaimStatusCd>
<LossPaymentAmt>14596</LossPaymentAmt>
</Payment>
<Payment>
<CoverageCd>LIAB</CoverageCd>
<LossTypeCd>PD</LossTypeCd>
<ClaimStatusCd>C</ClaimStatusCd>
<LossPaymentAmt>3480</LossPaymentAmt>
</Payment>
Here is my SQL code:
SELECT
ad.AplusDataSysID,
CAST(ad.xmlAplus AS XML).value('(/ISO/PassportSvcRs/Reports/Report/ReportData/ISO/PassportSvcRs/PassportInqRs/Match/Claim/Payment/LossTypeCd)[1]','varchar(max)') AS LossTypeCode
FROM
[dbo].[AUT_Policy] p
INNER JOIN
[dbo].[IP_Policy] ip ON p.PolicySysID = ip.Aut_PolicyID
INNER JOIN
[dbo].[AUT_AplusData] ad ON ip.PolicySysID = ad.PolicySysID
WHERE
CAST(ad.xmlAplus AS XML).value('(/ISO/PassportSvcRs/Reports/Report/ReportData/ISO/PassportSvcRs/PassportInqRs/Match/Claim/Payment/LossTypeCd)[1]', 'VARCHAR(MAX)') IN ('BI','PD','COLL','COMP','PIP','UM','MEDPY','TOWL','RENT','OTHR');
Here is my SQL result:
Here is what the SQL result should look like:
It would appear that the XML nodes method is what you need.
-- Sample data
DECLARE #AUT_AplusData TABLE (AplusDataSysID INT, xmlAplus TEXT);
INSERT #AUT_AplusData VALUES (1,
'<Payment>
<CoverageCd>COLL</CoverageCd>
<LossTypeCd>COLL</LossTypeCd>
<ClaimStatusCd>C</ClaimStatusCd>
<LossPaymentAmt>14596</LossPaymentAmt>
</Payment>
<Payment>
<CoverageCd>LIAB</CoverageCd>
<LossTypeCd>PD</LossTypeCd>
<ClaimStatusCd>C</ClaimStatusCd>
<LossPaymentAmt>3480</LossPaymentAmt>
</Payment>');
-- Solution
SELECT
AplusDataSysID = ad.AplusDataSysID,
LossTypeCd = pay.loss.value('(LossTypeCd/text())[1]', 'varchar(8000)')
FROM #AUT_AplusData AS ad
CROSS APPLY (VALUES(CAST(ad.xmlAplus AS XML))) AS x(xmlAplus)
CROSS APPLY x.xmlAplus.nodes('/Payment') AS pay(loss);
Returns:
AplusDataSysID LossTypeCd
---------------- ---------------
1 COLL
1 PD
I have a table like below. ID column should be added into XML tag as Value with respective ID Column value and return as one XML tag
Ex:
Specification | ID
<car><Name>BMW</Name></car> | 245
<car><Name>Audi</Name></car> | 290
<car><Name>Skoda</Name></car>| 295
Output Should be as
| Specification |
| <car><Name>BMW</Name><ID>245</ID></car><car><Name>Audi</Name><ID>290</ID></car><car><Name>Audi</Name><ID>295</ID></car> |
You could use modify() and insert xml like this
DECLARE #SampleData AS TABLE
(
Specification xml,
Id int
)
INSERT INTO #SampleData
VALUES
('<car><Name>BMW</Name></car>', 245),
('<car><Name>Audi</Name></car>', 290),
('<car><Name>Skoda</Name></car>', 295)
Update #SampleData SET Specification.modify('insert <Id>{sql:column("sd.Id")}</Id> into (/car)[1]')
FROM #SampleData sd
SELECT sd.* FROM #SampleData sd
Demo link: Rextester
Some reference links:
modify xml
insert xml
this is one way I can think of. You can strip the value of the car name using the .node, and reconstruct the XML. Something like,
CREATE TABLE #Temp
(
Specification XML,
ID INT
)
INSERT #Temp ( Specification, ID )
VALUES
('<car><Name>BMW</Name></car>', 245),
('<car><Name>Audi</Name></car>', 290),
('<car><Name>Skoda</Name></car>', 295)
SELECT * FROM
(
SELECT
Val.value('(text())[1]', 'varchar(32)') AS Name,
T.ID
FROM #Temp AS T
CROSS APPLY Specification.nodes('car/Name') AS Id(Val)
) X
FOR XML PATH('Name'), ROOT ('Car')
DROP TABLE #Temp