I am trying to query XML with SQL. Suppose I have the following XML.
<xml>
<dataSetData>
<text>ABC</text>
</dataSetData>
<generalData>
<id>123</id>
<text>text data</text>
</generalData>
<generalData>
<id>456</id>
<text>text data 2</text>
</generalData>
<specialData>
<id>123</id>
<text>special data text</text>
</specialData>
<specialData>
<id>456</id>
<text>special data text 2</text>
</specialData>
</xml>
I want to write a SELECT query that returns 2 rows as follows:
DataSetData | GeneralDataID | GeneralDataText | SpecialDataTest
ABC | 123 | text data | special data text
ABC | 456 | text data 2 | special data text 2
My current approach is as follows:
SELECT
dataset.nodes.value('(dataSetData/text)[1]', 'nvarchar(500)'),
general.nodes.value('(generalData/text)[1]', 'nvarchar(500)'),
special.nodes.value('(specialData/text)[1]', 'nvarchar(500)'),
FROM #MyXML.nodes('xml') AS dataset(nodes)
OUTER APPLY #MyXML.nodes('xml/generalData') AS general(nodes)
OUTER APPLY #MyXML.nodes('xml/specialData') AS special(nodes)
WHERE
general.nodes.value('(generalData/text/id)[1]', 'nvarchar(500)') = special.nodes.value('(specialData/text/id)[1]', 'nvarchar(500)')
What I do not like here is that I have to use OUTER APPLY twice and that I have to use the WHERE clause to JOIN the correct elements.
My question therefore is: Is it possible to construct the query in a way where I do not have to use the WHERE clause in such a way, because I am pretty sure that this affects performance very negatively if files become larger.
Shouldn't it be possible to JOIN the correct nodes (that is, the corresponding generalData and specialData nodes) with some XPATH statement?
Your XPath expressions are completely off.
Please try the following. It is pretty efficient. You can test its performance with a large XML.
SQL
-- DDL and sample data population, start
DECLARE #xml XML =
N'<xml>
<dataSetData>
<text>ABC</text>
</dataSetData>
<generalData>
<id>123</id>
<text>text data</text>
</generalData>
<generalData>
<id>456</id>
<text>text data 2</text>
</generalData>
<specialData>
<id>123</id>
<text>special data text</text>
</specialData>
<specialData>
<id>456</id>
<text>special data text 2</text>
</specialData>
</xml>';
-- DDL and sample data population, end
SELECT c.value('(dataSetData/text/text())[1]', 'VARCHAR(20)') AS DataSetData
, g.value('(id/text())[1]', 'INT') AS GeneralDataID
, g.value('(text/text())[1]', 'VARCHAR(30)') AS GeneralDataText
, sp.value('(id/text())[1]', 'INT') AS SpecialDataID
, sp.value('(text/text())[1]', 'VARCHAR(30)') AS SpecialDataTest
FROM #xml.nodes('/xml') AS t(c)
OUTER APPLY c.nodes('generalData') AS general(g)
OUTER APPLY c.nodes('specialData') AS special(sp)
WHERE g.value('(id/text())[1]', 'INT') = sp.value('(id/text())[1]', 'INT');
Output
+-------------+---------------+-----------------+---------------+---------------------+
| DataSetData | GeneralDataID | GeneralDataText | SpecialDataID | SpecialDataTest |
+-------------+---------------+-----------------+---------------+---------------------+
| ABC | 123 | text data | 123 | special data text |
| ABC | 456 | text data 2 | 456 | special data text 2 |
+-------------+---------------+-----------------+---------------+---------------------+
I want to suggest one more solution:
DECLARE #xml XML=
N'<xml>
<dataSetData>
<text>ABC</text>
</dataSetData>
<generalData>
<id>123</id>
<text>text data</text>
</generalData>
<generalData>
<id>456</id>
<text>text data 2</text>
</generalData>
<specialData>
<id>123</id>
<text>special data text</text>
</specialData>
<specialData>
<id>456</id>
<text>special data text 2</text>
</specialData>
</xml>';
--The query
SELECT #xml.value('(/xml/dataSetData/text/text())[1]','varchar(100)')
,B.*
,#xml.value('(/xml/specialData[(id/text())[1] cast as xs:int? = sql:column("B.General_Id")]/text/text())[1]','varchar(100)') AS Special_Text
FROM #xml.nodes('/xml/generalData') A(gd)
CROSS APPLY(SELECT A.gd.value('(id/text())[1]','int') AS General_Id
,A.gd.value('(text/text())[1]','varchar(100)') AS General_Text) B;
The idea in short:
We can read the <dataSetData>, as it is not repeating, directly from the variable.
We can use .nodes() to get a derived set of all <generalData> entries.
Now the magic trick: I use APPLY to get the values from the XML as regular columns into the result set.
This trick allows now to use sql:column() in order to build a XQuery predicate to find the corresponding <specialData>.
One more approach with FLWOR
You might try this:
SELECT #xml.query
('
<xml>
{
for $i in distinct-values(/xml/generalData/id/text())
return
<combined dsd="{/xml/dataSetData/text/text()}"
id="{$i}"
gd="{/xml/generalData[id=$i]/text/text()}"
sd="{/xml/specialData[id=$i]/text/text()}"/>
}
</xml>
');
The result
<xml>
<combined dsd="ABC" id="123" gd="text data" sd="special data text" />
<combined dsd="ABC" id="456" gd="text data 2" sd="special data text 2" />
</xml>
The idea in short:
With the help of distinct-values() we get a list of all id values in your XML
we can iterate this and pick the corresponding values
We return the result as a re-structured XML
Now you can use .nodes('/xml/combined') against this new XML and retrieve all values easily.
Performance test
I just want to add a performance test:
CREATE TABLE dbo.TestXml(TheXml XML);
INSERT INTO dbo.TestXml VALUES
(
(
SELECT 'blah1' AS [dataSetData/text]
,(SELECT o.[object_id] AS [id]
,o.[name] AS [text]
FROM sys.objects o
FOR XML PATH('generalData'),TYPE)
,(SELECT o.[object_id] AS [id]
,o.create_date AS [text]
FROM sys.objects o
FOR XML PATH('specialData'),TYPE)
FOR XML PATH('xml'),TYPE
)
)
,(
(
SELECT 'blah2' AS [dataSetData/text]
,(SELECT o.[object_id] AS [id]
,o.[name] AS [text]
FROM sys.objects o
FOR XML PATH('generalData'),TYPE)
,(SELECT o.[object_id] AS [id]
,o.create_date AS [text]
FROM sys.objects o
FOR XML PATH('specialData'),TYPE)
FOR XML PATH('xml'),TYPE
)
)
,(
(
SELECT 'blah3' AS [dataSetData/text]
,(SELECT o.[object_id] AS [id]
,o.[name] AS [text]
FROM sys.objects o
FOR XML PATH('generalData'),TYPE)
,(SELECT o.[object_id] AS [id]
,o.create_date AS [text]
FROM sys.objects o
FOR XML PATH('specialData'),TYPE)
FOR XML PATH('xml'),TYPE
)
);
GO
--just a dummy call to avoid *first call bias*
SELECT x.query('.') FROM dbo.TestXml
CROSS APPLY TheXml.nodes('/xml//*') A(x)
GO
DECLARE #t DATETIME2=SYSUTCDATETIME();
--My first approach
SELECT TheXml.value('(/xml/dataSetData/text/text())[1]','varchar(100)') AS DataSetValue
,B.*
,TheXml.value('(/xml/specialData[(id/text())[1] cast as xs:int? = sql:column("B.General_Id")]/text/text())[1]','varchar(100)') AS Special_Text
INTO dbo.testResult1
FROM dbo.TestXml
CROSS APPLY TheXml.nodes('/xml/generalData') A(gd)
CROSS APPLY(SELECT A.gd.value('(id/text())[1]','int') AS General_Id
,A.gd.value('(text/text())[1]','varchar(100)') AS General_Text) B;
SELECT DATEDIFF(MILLISECOND,#t,SYSUTCDATETIME());
GO
DECLARE #t DATETIME2=SYSUTCDATETIME();
--My second approach
SELECT B.c.value('#dsd','varchar(100)') AS dsd
,B.c.value('#id','int') AS id
,B.c.value('#gd','varchar(100)') AS gd
,B.c.value('#sd','varchar(100)') AS sd
INTO dbo.TestResult2
FROM dbo.TestXml
CROSS APPLY (SELECT TheXml.query
('
<xml>
{
for $i in distinct-values(/xml/generalData/id/text())
return
<combined dsd="{/xml/dataSetData/text/text()}"
id="{$i}"
gd="{/xml/generalData[id=$i]/text/text()}"
sd="{/xml/specialData[id=$i]/text/text()}"/>
}
</xml>
') AS ResultXml) A
CROSS APPLY A.ResultXml.nodes('/xml/combined') B(c)
SELECT DATEDIFF(MILLISECOND,#t,SYSUTCDATETIME());
GO
DECLARE #t DATETIME2=SYSUTCDATETIME();
--Yitzhak'S approach
SELECT c.value('(dataSetData/text/text())[1]', 'VARCHAR(20)') AS DataSetData
, g.value('(id/text())[1]', 'INT') AS GeneralDataID
, g.value('(text/text())[1]', 'VARCHAR(30)') AS GeneralDataText
, sp.value('(id/text())[1]', 'INT') AS SpecialDataID
, sp.value('(text/text())[1]', 'VARCHAR(30)') AS SpecialDataTest
INTO dbo.TestResult3
FROM dbo.TestXml
CROSS APPLY TheXml.nodes('/xml') AS t(c)
OUTER APPLY c.nodes('generalData') AS general(g)
OUTER APPLY c.nodes('specialData') AS special(sp)
WHERE g.value('(id/text())[1]', 'INT') = sp.value('(id/text())[1]', 'INT');
SELECT DATEDIFF(MILLISECOND,#t,SYSUTCDATETIME());
GO
SELECT * FROM TestResult1;
SELECT * FROM TestResult2;
SELECT * FROM TestResult3;
GO
--careful with real data!
DROP TABLE testResult1
DROP TABLE testResult2
DROP TABLE testResult3
DROP TABLE dbo.TestXml;
The result is clearly pointing against XQuery. (Someone might say so sad! now :-) ).
The predicate approach is by far the slowest (4700ms). The FLWOR approach is on rank 2 (1200ms) and the winner is - tatatataaaaa - Yitzhak's approach (400ms, by factor ~10!).
Which solution is best for you, will depend on the actual data (count of elements per XML, count of XMLs and so on). But the visual elegance is - regrettfully - not the only parameter for this choice :-)
Sorry to add this as another answer, but I don't want to add to the other answer. It's big enough already :-)
A combination of Yitzhak's and mine is even faster:
--This is the additional code to be placed into the performance comparison
DECLARE #t DATETIME2=SYSUTCDATETIME();
SELECT TheXml.value('(/xml/dataSetData/text/text())[1]', 'VARCHAR(20)') AS DataSetData
,B.*
, sp.value('(id/text())[1]', 'INT') AS SpecialDataID
, sp.value('(text/text())[1]', 'VARCHAR(30)') AS SpecialDataTest
INTO dbo.TestResult4
FROM dbo.TestXml
CROSS APPLY TheXml.nodes('/xml/generalData') AS A(g)
CROSS APPLY(SELECT g.value('(id/text())[1]', 'INT') AS GeneralDataID
, g.value('(text/text())[1]', 'VARCHAR(30)') AS GeneralDataText) B
OUTER APPLY TheXml.nodes('/xml/specialData[id=sql:column("B.GeneralDataID")]') AS special(sp);
SELECT DATEDIFF(MILLISECOND,#t,SYSUTCDATETIME());
The idea in short:
We read the <dataSetData> directly (no repetition)
We use APPLY .nodes() to get all <generalData>
We use APPLY SELECT to fetch the values of <generalData> elements as real columns.
We use another APPLY .nodes() to fetch the corresponding <specialData> elements
One advantage of this solution: If there might be more than one special-data entry per general-data element, this would work too.
This is now the fastest in my test (~300ms).
Related
Imagine I have xml just like this:
declare #pxml xml =
'<MediaClass>
<MediaStream2Client>
<Title>Test</Title>
<Type>Book</Type>
<Price>1.00</Price>
</MediaStream2Client>
</MediaClass>
'
Number of stream in tag <MediaStream2Client> can be random number from 1 to 100, so I can't simply parse everything from tag <MediaStream2Client>. Is there a way to remove any digit from this tag in SQL server using grep functionality?
XPath queries can be constructed dynamically and/or contain SQL variables or columns such as the following example...
declare #pxml xml = '<MediaClass>
<MediaStream1Client>
<Title>Test1</Title>
<Type>Book1</Type>
<Price>1.00</Price>
</MediaStream1Client>
<MediaStream10Client>
<Title>Test10</Title>
<Type>Book10</Type>
<Price>10.00</Price>
</MediaStream10Client>
<MediaStream100Client>
<Title>Test100</Title>
<Type>Book100</Type>
<Price>100.00</Price>
</MediaStream100Client>
</MediaClass>';
select
ElementName,
MediaStreamClient.value('(Title/text())[1]', 'nvarchar(50)') as Title,
MediaStreamClient.value('(Type/text())[1]', 'nvarchar(50)') as [Type],
MediaStreamClient.value('(Price/text())[1]', 'decimal(18,2)') as Price
from (
--This is just for this example, normally you'd use a Tally Table here...
select top 100 row_number() over (order by a.object_id, a.column_id, b.object_id, b.column_id)
from sys.columns a, sys.columns b
) Tally(N)
cross apply (select concat('MediaStream', N, 'Client')) dyn(ElementName)
cross apply #pxml.nodes('/MediaClass/*[local-name(.) = sql:column("ElementName")]') MediaClass(MediaStreamClient);
This returns the results:
ElementName
Title
Type
Price
MediaStream1Client
Test1
Book1
1.00
MediaStream10Client
Test10
Book10
10.00
MediaStream100Client
Test100
Book100
100.00
I have the following XML that is in an XML column in SQL Server. I am able to retrieve the data between the tags and list it in table format using the code at the bottom. I can retrieve the values between all the tags except for the one I have in bold below that is in double quotes. I can get the value X just fine but I need to get the 6 that is in between the double quotes in this part: <Organization501cInd organization501cTypeTxt="6">X</Organization501cInd>
WITH XMLNAMESPACES (DEFAULT 'http://www.irs.gov/efile')
SELECT ID, FilingYear, FilingPeriod, FilingType, [FileName]
, Organization501c3Ind = c.value('(//Organization501c3Ind/text())[1]','varchar(MAX)')
, Organization501cInd = c.value('(//Organization501cInd/text())[1]','varchar(MAX)')
, Organization501cTypeTxt = c.value('(//Organization501cTypeTxt/text())[1]','varchar(MAX)')
FROM Form990
CROSS APPLY XMLData.nodes('//Return') AS t(c)
CROSS APPLY XMLData.nodes('//Return/ReturnHeader/Filer') AS t2(c2)
XML:
<ReturnData documentCnt="2">
<IRS990 documentId="IRS990-01" referenceDocumentId="IRS990ScheduleO-01" referenceDocumentName="IRS990ScheduleO ReasonableCauseExplanation" softwareId="19009670">
<PrincipalOfficerNm>CAREY BAKER</PrincipalOfficerNm>
<USAddress>
<AddressLine1Txt>PO BOX 11275</AddressLine1Txt>
<CityNm>TALLAHASSEE</CityNm>
<StateAbbreviationCd>FL</StateAbbreviationCd>
<ZIPCd>32302</ZIPCd>
</USAddress>
<GrossReceiptsAmt>104241</GrossReceiptsAmt>
<GroupReturnForAffiliatesInd>false</GroupReturnForAffiliatesInd>
<Organization501cInd organization501cTypeTxt="6">X</Organization501cInd>
Thoughts?
Without a minimal reproducible example by the OP, shooting from the hip.
SQL
-- DDL and sample data population, start
DECLARE #Form990 TABLE (ID INT IDENTITY PRIMARY KEY, XMLData XML);
INSERT INTO #Form990(XMLData) VALUES
(N'<Return xmlns="http://www.irs.gov/efile" returnVersion="2019v5.1">
<ReturnData documentCnt="2">
<IRS990 documentId="IRS990-01" referenceDocumentId="IRS990ScheduleO-01" referenceDocumentName="IRS990ScheduleO ReasonableCauseExplanation" softwareId="19009670">
<PrincipalOfficerNm>CAREY BAKER</PrincipalOfficerNm>
<USAddress>
<AddressLine1Txt>PO BOX 11275</AddressLine1Txt>
<CityNm>TALLAHASSEE</CityNm>
<StateAbbreviationCd>FL</StateAbbreviationCd>
<ZIPCd>32302</ZIPCd>
</USAddress>
<GrossReceiptsAmt>104241</GrossReceiptsAmt>
<GroupReturnForAffiliatesInd>false</GroupReturnForAffiliatesInd>
<Organization501c3Ind>X</Organization501c3Ind>
<Organization501cInd organization501cTypeTxt="6">X</Organization501cInd>
</IRS990>
</ReturnData>
</Return>');
-- DDL and sample data population, end
WITH XMLNAMESPACES (DEFAULT 'http://www.irs.gov/efile')
SELECT -- ID, FilingYear, FilingPeriod, FilingType, [FileName]
Organization501c3Ind = c.value('(Organization501c3Ind/text())[1]','varchar(MAX)')
, Organization501cInd = c.value('(Organization501cInd/text())[1]','varchar(MAX)')
, Organization501cTypeTxt = c.value('(Organization501cInd/#organization501cTypeTxt)[1]','varchar(MAX)')
FROM #Form990
CROSS APPLY XMLData.nodes('/Return/ReturnData/IRS990') AS t(c)
Output
+----------------------+---------------------+-------------------------+
| Organization501c3Ind | Organization501cInd | Organization501cTypeTxt |
+----------------------+---------------------+-------------------------+
| X | X | 6 |
+----------------------+---------------------+-------------------------+
NOTE: XML element and attribute names are case-sensitive. i.e.: Organization501cTypeTxt will not match an attribute named organization501cTypeTxt.
When extracting attributes you need to use the # accessor in your XPath query. Try something like the following...
WITH XMLNAMESPACES (DEFAULT 'http://www.irs.gov/efile')
SELECT ID, FilingYear, FilingPeriod, FilingType, [FileName],
Organization501cInd = c2.value('(Organization501cInd/text())[1]','varchar(MAX)'),
organization501cTypeTxt = c2.value('(Organization501cInd/#organization501cTypeTxt)[1]','varchar(MAX)')
FROM Form990
CROSS APPLY XMLData.nodes('/ReturnData') AS t(c)
CROSS APPLY t.c.nodes('IRS990') AS t2(c2);
I have data in the below format, I would like to find the users that match any and all of the words within the comma delimited skills column:
Name | id | skills |
-------------------------------------------------------
Bbarker | 5987 | Needles, Pins, Surgery, Word, Excel |
CJerald | 5988 | Bartender, Shots |
RSarah | 5600 | Pins, Ground, Hot, Coffee |
So if I am searching for "Needles, Pins", it should return Bbarker and RSarahs rows.
How would I achieve something like this using SQL ?
I dont even know where to begin or what to search for, any help in the right direction would be great!
Thanks!
Poor design aside, sometimes we are stuck and have to deal with that poor design.
I agree that if you have the option of redesigning I would pursue that route, in the meantime there are ways you can deal with delimited data.
If you are SQL Server version 2016+ there is a built in function call STRING_SLIT() that can be used.
If you are prior to SQL Server 2016 you basically have to convert to XML as a workaround
Here's a working example of both you can explore:
DECLARE #TestData TABLE
(
[Name] NVARCHAR(100)
, [Id] INT
, [skills] NVARCHAR(100)
);
--Test data
INSERT INTO #TestData (
[Name]
, [Id]
, [skills]
)
VALUES ( 'Bbarker', 5987, 'Needles, Pins, Surgery, Word, Excel' )
, ( 'CJerald', 5988, 'Bartender, Shots' )
, ( 'RSarah', 5600, 'Pins, Ground, Hot, Coffee' );
--search words
DECLARE #Search NVARCHAR(100) = 'Needles, Pins';
--sql server 2016+ using STING_SPLIT
SELECT DISTINCT [a].*
FROM #TestData [a]
CROSS APPLY STRING_SPLIT([a].[skills], ',') [sk] --split your column
CROSS APPLY STRING_SPLIT(#Search, ',') [srch] --split your search
WHERE LTRIM(RTRIM([sk].[value])) = LTRIM(RTRIM([srch].[value])); --filter where they equal
--Prior to sql server 2016, convert XML
SELECT DISTINCT [td].*
FROM #TestData [td]
--below we are converting to xml and then spliting those out for your column
CROSS APPLY (
SELECT [Split].[a].[value]('.', 'NVARCHAR(MAX)') [value]
FROM (
SELECT CAST('<X>' + REPLACE([td].[skills], ',', '</X><X>') + '</X>' AS XML) AS [String]
) AS [A]
CROSS APPLY [String].[nodes]('/X') AS [Split]([a])
) AS [sk]
--same here for the search
CROSS APPLY (
SELECT [Split].[a].[value]('.', 'NVARCHAR(MAX)') [value]
FROM (
SELECT CAST('<X>' + REPLACE(#Search, ',', '</X><X>') + '</X>' AS XML) AS [String]
) AS [A]
CROSS APPLY [String].[nodes]('/X') AS [Split]([a])
) AS [srch]
WHERE LTRIM(RTRIM([sk].[value])) = LTRIM(RTRIM([srch].[value])); --then as before where those are equal
Both will get you the output of:
Name Id skills
---------- ------- ------------------------------------
Bbarker 5987 Needles, Pins, Surgery, Word, Excel
RSarah 5600 Pins, Ground, Hot, Coffee
How about this?
SELECT DISTINCT Name, id
FROM table
WHERE skills LIKE '%Needles%'
OR skills LIKE '%Pins%'
Looking at :
;WITH cte AS(
SELECT 1 AS x UNION
SELECT 2 AS x UNION
SELECT 3 AS x
)
I can create permutation table for all 3 values :
SELECT T1.x , y=T2.x , z=t3.x
FROM cte T1
JOIN cte T2
ON T1.x != T2.x
JOIN cte T3
ON T2.x != T3.x AND T1.x != T3.x
This uses the power of SQL's cartesian product plus eliminating equal values.
OK.
But is it possible to enhance this recursive pseudo CTE :
;WITH cte AS(
SELECT 1 AS x , 2 AS y , 3 AS z
UNION ALL
...
)
SELECT * FROM cte
So that it will yield same result as :
NB there are other solutions in SO that uses recursive CTE , but it is not spread to columns , but string representation of the permutations
I tried to do the lot in a CTE.
However trying to "redefine" a rowset dynamically is a little tricky. While the task is relatively easy using dynamic SQL doing it without poses some issues.
While this answer may not be the most efficient or straight forward, or even correct in the sense that it's not all CTE it may give others a basis to work from.
To best understand my approach read the comments, but it might be worthwhile looking at each CTE expression in turn with by altering the bit of code below in the main block, with commenting out the section below it.
SELECT * FROM <CTE NAME>
Good luck.
IF OBJECT_ID('tempdb..#cteSchema') IS NOT NULL
DROP Table #cteSchema
GO
-- BASE CTE
;WITH cte AS( SELECT 1 AS x, 2 AS y, 3 AS z),
-- So we know what columns we have from the CTE we extract it to XML
Xml_Schema AS ( SELECT CONVERT(XML,(SELECT * FROM cte FOR XML PATH(''))) AS MySchema ),
-- Next we need to get a list of the columns from the CTE, by querying the XML, getting the values and assigning a num to the column
MyColumns AS (SELECT D.ROWS.value('fn:local-name(.)','SYSNAME') AS ColumnName,
D.ROWS.value('.','SYSNAME') as Value,
ROW_NUMBER() OVER (ORDER BY D.ROWS.value('fn:local-name(.)','SYSNAME')) AS Num
FROM Xml_Schema
CROSS APPLY Xml_Schema.MySchema.nodes('/*') AS D(ROWS) ),
-- How many columns we have in the CTE, used a coupld of times below
ColumnStats AS (SELECT MAX(NUM) AS ColumnCount FROM MyColumns),
-- create a cartesian product of the column names and values, so now we get each column with it's possible values,
-- so {x=1, x =2, x=3, y=1, y=2, y=3, z=1, z=2, z=3} -- you get the idea.
PossibleValues AS (SELECT MyC.ColumnName, MyC.Num AS ColumnNum, MyColumns.Value, MyColumns.Num,
ROW_NUMBER() OVER (ORDER BY MyC.ColumnName, MyColumns.Value, MyColumns.Num ) AS ID
FROM MyColumns
CROSS APPLY MyColumns MyC
),
-- Now we have the possibly values of each "column" we now have to concat the values together using this recursive CTE.
AllRawXmlRows AS (SELECT CONVERT(VARCHAR(MAX),'<'+ISNULL((SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = 1),'')+'>'+Value) as ConcatedValue, Value,ID, Counterer = 1 FROM PossibleValues
UNION ALL
SELECT CONVERT(VARCHAR(MAX),CONVERT(VARCHAR(MAX), AllRawXmlRows.ConcatedValue)+'</'+(SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = Counterer)+'><'+(SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = Counterer+1)+'>'+CONVERT(VARCHAR(MAX),PossibleValues.Value)) AS ConcatedValue, PossibleValues.Value, PossibleValues.ID,
Counterer = Counterer+1
FROM AllRawXmlRows
INNER JOIN PossibleValues ON AllRawXmlRows.ConcatedValue NOT LIKE '%'+PossibleValues.Value+'%' -- I hate this, there has to be a better way of making sure we don't duplicate values....
AND AllRawXmlRows.ID <> PossibleValues.ID
AND Counterer < (SELECT ColumnStats.ColumnCount FROM ColumnStats)
),
-- The above made a list but was missing the final closing XML element. so we add it.
-- we also restict the list to the items that contain all columns, the section above builds it up over many columns
XmlRows AS (SELECT DISTINCT
ConcatedValue +'</'+(SELECT ColumnName FROM MyColumns WHERE MyColumns.Num = Counterer)+'>'
AS ConcatedValue
FROM AllRawXmlRows WHERE Counterer = (SELECT ColumnStats.ColumnCount FROM ColumnStats)
),
-- Wrap the output in row and table tags to create the final XML
FinalXML AS (SELECT (SELECT CONVERT(XML,(SELECT CONVERT(XML,ConcatedValue) FROM XmlRows FOR XML PATH('row'))) FOR XML PATH('table') )as XMLData),
-- Prepare a CTE that represents the structure of the original CTE with
DataTable AS (SELECT cte.*, XmlData
FROM FinalXML, cte)
--SELECT * FROM <CTE NAME>
-- GETS destination columns with XML data.
SELECT *
INTO #cteSchema
FROM DataTable
DECLARE #XML VARCHAR(MAX) ='';
SELECT #Xml = XMLData FROM #cteSchema --Extract XML Data from the
ALTER TABLE #cteSchema DROP Column XMLData -- Removes the superflous column
DECLARE #h INT
EXECUTE sp_xml_preparedocument #h OUTPUT, #XML
SELECT *
FROM OPENXML(#h, '/table/row', 2)
WITH #cteSchema -- just use the #cteSchema to define the structure of the xml that has been constructed
EXECUTE sp_xml_removedocument #h
How about translating 1,2,3 into a column, which will look exactly like the example you started from, and use the same approach ?
;WITH origin (x,y,z) AS (
SELECT 1,2,3
), translated (x) AS (
SELECT col
FROM origin
UNPIVOT ( col FOR cols IN (x,y,z)) AS up
)
SELECT T1.x , y=T2.x , z=t3.x
FROM translated T1
JOIN translated T2
ON T1.x != T2.x
JOIN translated T3
ON T2.x != T3.x AND T1.x != T3.x
ORDER BY 1,2,3
If I understood correctly the request, this might just do the trick.
And to run it on more columns, just need to add them origin cte definition + unpivot column list.
Now, i dont know how you pass your 1 - n values for it to be dynamic, but if you tell me, i could try edit the script to be dynamic too.
I'm trying to pull values from and XML string that is stored in an ntext field.
With this code I'm getting the correct results but there is another case where the root(ListA) XML is different (ListB). Can i use Case/If or anything else to take in account the other root
SELECT
List_Name.value('name[1]', 'nvarchar(max)') as List_Name
FROM Lists
CROSS APPLY (SELECT CAST(content AS XML)) AS A(B)
CROSS APPLY A.B.nodes('//ListA') AS Lists(List_Name)
thanks in advance
1) You should store XML values into a XML column not into a ntext.
2) I'm not sure if this is what you need but you could loot at this example:
SELECT z.XmlCol.value('(text())[1]', 'NVARCHAR(50)') AS li_element_text,
z.XmlCol.value('(#type)[1]', 'NVARCHAR(50)') AS li_type_attribute
FROM
(
SELECT CONVERT(XML, N'<ol><li type="disc">A1</li><li type="square">A2</li></ol>')
UNION ALL
SELECT CONVERT(XML, N'<ul><li type="circle">B1</li></ul>')
) x(XmlCol)
CROSS APPLY x.XmlCol.nodes('/*') y(XmlCol)
CROSS APPLY y.XmlCol.nodes('li') z(XmlCol)
Output:
li_element_text li_type_attribute
--------------- -----------------
A1 disc
A2 square
B1 circle