XPath in T-SQL query - sql-server

I have two tables, XMLtable and filterTable.
I need all the XMLtable.ID values from XMLtable where the data in Col_X contains MyElement, the contents of which matches filterColumn in filterTable.
The XML for each row in Col_X may contain multiple MyElement's, and I want that ID in case ANY of those elements match ANY of the values in filterColumn.
The problem is that those columns are actually of varchar(max) datatype, and the table itself is huge (like 50GB huge). So this query needs to be as optimized as possible.
Here's an example for where I am now, which merely returns the row where the first matching element equals one of the ones I'm looking for. Due to a plethora of different error messages I can't seem to be able to change this to compare to all of the same named elements as I want to.
SELECT ID,
CAST(Col_X AS XML).value('(//*[local-name()=''MyElement''])', N'varchar(25)')
FROM XMLtable
...and then compare the results to filterTable. This already takes 5+ minutes.
What I'm trying to achieve is something like:
SELECT ID
FROM XMLtable
WHERE CAST(Col_X AS XML).query('(//*[local-name()=''MyElement''])')
IN (SELECT filterColumn FROM filterTable)
The only way I can currently achieve this is to use the LIKE operator, which takes like a thousand times longer.
Now, obviously it's not an option to start changing the datatypes of the columns or anything else. This is what I have to work with. :)

Try this:
SELECT
ID,
MyElementValue
FROM
(
SELECT ID, myE.value('(./text())[1]', N'VARCHAR(25)') AS 'MyElementValue'
FROM XMLTable
CROSS APPLY (SELECT CAST(Col_X AS XML)) as X(Col_X)
CROSS APPLY X.Col_X.nodes('(//*[local-name()="MyElement"])') as T2(myE)
) T1
WHERE MyElementValue IN (SELECT filterColumn FROM filterTable)
and this:
SELECT
ID,
MyElementValue
FROM
(
SELECT ID, myE.value('(./text())[1]', N'VARCHAR(25)') AS 'MyElementValue'
FROM XMLTable
CROSS APPLY (SELECT CAST(Col_X AS XML)) as X(Col_X)
CROSS APPLY X.Col_X.nodes('//MyElement') as T2(myE)
) T1
WHERE MyElementValue IN (SELECT filterColumn FROM filterTable)
Update
I think that you are experiencing what is described here Compute Scalars, Expressions and Execution Plan Performance. The cast to XML is deferred to each call to the value function. The test you should make is to change the datatype of Col_X to XML.
If that is not an option you could query the rows you need from XMLTable into a temporary table that has an XML column and then do the query above against the temporary table without the need to cast to XML.
CREATE TABLE #XMLTable
(
ID int,
Col_X xml
)
INSERT INTO #XMLTable(ID, Col_X)
SELECT ID, Col_X
FROM XMLTable
SELECT
ID,
MyElementValue
FROM
(
SELECT ID, myE.value('(./text())[1]', N'varchar(25)') AS 'MyElementValue'
FROM #XMLTable
CROSS APPLY Col_X.nodes('//MyElement') as T2(myE)
) T1
WHERE MyElementValue IN (SELECT filterColumn FROM filterTable)
DROP TABLE #XMLTable

You could try something like this. It does at least functionally do what you want, I believe. You'll have to explore its performance with your data set empirically.
SELECT ID
FROM
(
SELECT xt.ID, CAST(xt.Col_X AS XML) [content] FROM XMLTable AS xt
) AS src
INNER JOIN FilterTable AS f
ON f.filterColumn IN
(
SELECT
elt.value('.', 'varchar(25)')
FROM src.content.nodes('//MyElement') AS T(elt)
)

I finally got this working, and with far better performance than I expected. Below is the script that finally produced the correct result in 5 - 6 minutes.
SELECT ID, myE.value('.', N'VARCHAR(25)') AS 'MyElementValue'
FROM (SELECT ID, CAST(Col_X AS XML) AS Col_X
FROM XMLTable) T1
CROSS APPLY Col_X.nodes('(//*[local-name()=''MyElement''])' T2(myE)
WHERE myE.value('.', N'varchar(25)') IN (SELECT filterColumn FROM filterTable)
Thanks for the help tho people!

Related

Set Multiple XMLNAMESPACES to account for different versions of SSRS/SQL in an SQL Query

I am trying to query an SSRS Report Server Database searching for any report that uses a specific stored procedure using LIKE statement.
Code below works perfectly:
WITH XMLNAMESPACES
( DEFAULT
'http://schemas.microsoft.com/sqlserver/reporting/2016/01/reportdefinition'
, 'http://schemas.microsoft.com/SQLServer/reporting/reportdesigner' AS ReportDefinition )
SELECT
CATDATA.Name AS ReportName
,CATDATA.Path AS ReportPathLocation
,xmlcolumn.value('(#Name)[1]', 'VARCHAR(250)') AS DataSetName
,xmlcolumn.value('(Query/DataSourceName)[1]','VARCHAR(250)') AS DataSoureName
,xmlcolumn.value('(Query/CommandText)[1]','VARCHAR(2500)') AS CommandText
FROM (
SELECT C.Name
,C.Path
,CONVERT(XML,CONVERT(VARBINARY(MAX),C.Content)) AS reportXML
FROM ReportServer.dbo.Catalog C
WHERE C.Content is not null
AND C.Type = 2
) CATDATA
CROSS APPLY reportXML.nodes('/Report/DataSets/DataSet') xmltable ( xmlcolumn )
WHERE
xmlcolumn.value('(Query/CommandText)[1]','VARCHAR(500)') LIKE '%sp_%'
ORDER BY CATDATA.Name
However I want to query across multiple XML namespaces to account for the changes in SSRS/SQL versions over time to ensure the query doesn't miss any records.
'http://schemas.microsoft.com/sqlserver/reporting/2010/01/reportdefinition'
'http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition'
I am still a relative beginner with XML, any ideas or advice how I would go about doing this?
Many thanks
If you are not bothered about nodes in different namespaces clashing, then you can just ignore the namespaces altogether, by using *:node
SELECT
CATDATA.Name AS ReportName
,CATDATA.Path AS ReportPathLocation
,xmlcolumn.value('(#Name)[1]', 'VARCHAR(250)') AS DataSetName
,xmlcolumn.value('(*:Query/*:DataSourceName/text())[1]','VARCHAR(250)') AS DataSoureName
,xmlcolumn.value('(*:Query/*:CommandText/text())[1]','VARCHAR(2500)') AS CommandText
FROM (
SELECT C.Name
,C.Path
,CONVERT(XML,CONVERT(VARBINARY(MAX),C.Content)) AS reportXML
FROM ReportServer.dbo.Catalog C
WHERE C.Content is not null
AND C.Type = 2
) CATDATA
CROSS APPLY reportXML.nodes('/*:Report/*:DataSets/*:DataSet') xmltable ( xmlcolumn )
WHERE
xmlcolumn.value('(*:Query/*:CommandText/text())[1]','VARCHAR(500)') LIKE '%sp_%'
ORDER BY CATDATA.Name
You should always use /text() to get the inner text of a node, for performance reasons.
Note that you can merge your WHERE filter into the nodes filter and do it directly in XQuery:
CROSS APPLY reportXML.nodes(
'/*:Report/*:DataSets/*:DataSet[
*:Query[*:CommandText[
contains(text()[1], "sp_")
]]]') xmltable ( xmlcolumn )

SQL Server: Run separate CTE for each record without cursor or function

Given table A with a column LocationID and many records,
Is it possible to fully run a CTE for each record, without using a cursor (while fetch loop) or function (via cross apply)?
I can't run a CTE from the table A because the CTE will go deep through a parent-child hierarchical table (ParentID, ChildID) to find all descendants of a specific type, for each LocationID of table A. It seems that if I do CTE using table A, it will mix the children of all LocationID in table A.
Basically I need to separately run a CTE, for each LocationID of table A, and put in a table with LocationID and ChildID columns (LocationID are the ones from table A and ChildID are all descendants of a specific type found via CTE).
This is your basic layout.
;with CTE AS
(
select .......
)
select *
from CTE
cross apply (select distinct Location from TableA) a
where CTE.Location=a.Location
Some sample data and expected results will provide for a better answer.
You could do something like this:
Declare #LocationID As Int
Select
LocationID
, 0 as Processed
Into #Temp_Table
From TableA
While Exists (Select Top 1 1 From #Temp_Table Where Processed = 0)
Begin
Select Top 1 #LocationID = LocationID
From #Temp_Table
Where Processed = 0
Order By LocationID
/* Do your processing here */
Update #Temp_Table Set
Processed = 0
Where LocationID = #LocationID
End
It's still RBAR but (in my environment, at least) it's way faster than a cursor.
I was able to find a solution. I just had to keep the original LocationID as a reference, then in the CTE results, which will include all possible records as it goes deep into the list, I apply the filter I need. Yes, all records are mixed in the results, however because the reference to origin table's LocationID was kept (as OriginalParentID) I'm still able to retrieve it.
;WITH CTE AS
(
--Original list of parents
SELECT a.LocationID AS OriginalParentID, l.ParentID, l.ChildID, l.ChildType
FROM TableA a
INNER JOIN tblLocationHierarchy l ON l.ParentID = a.LocationID
UNION ALL
--Getting all descendants of the original list of parents
SELECT CTE.OriginalParentID, l.ParentID, l.ChildID, l.ChildType
FROM tblLocationHierarchy l
INNER JOIN CTE ON CTE.ChildID = l.ParentID
)
SELECT OriginalParentID, ChildID
FROM CTE
--Filtering is done here
WHERE ChildType = ...

SQL queries combined into one row

I'm having some difficulty combining the following queries, so that the results display in one row rather than in multiple rows:
SELECT value FROM dbo.parameter WHERE name='xxxxx.name'
SELECT dbo.contest.name AS Event_Name
FROM contest
INNER JOIN open_box on open_box.contest_id = contest.id
GROUP BY dbo.contest.name
SELECT COUNT(*) FROM open_option AS total_people
SELECT SUM(scanned) AS TotalScanned,SUM(number) AS Totalnumber
FROM dbo.open_box
GROUP BY contest_id
SELECT COUNT(*) FROM open AS reff
WHERE refer = 'True'
I would like to display data from the fields in each column similar to what is shown in the image below. Any help is appreciated!
Tab's solution is fine, I just wanted to show an alternative way of doing this. The following statement uses subqueries to get the information in one row:
SELECT
[xxxx.name]=(SELECT value FROM dbo.parameter WHERE name='xxxxx.name'),
[Event Name]=(SELECT dbo.contest.name
FROM contest
INNER JOIN open_box on open_box.contest_id = contest.id
GROUP BY dbo.contest.name),
[Total People]=(SELECT COUNT(*) FROM open_option),
[Total Scanned]=(SELECT SUM(scanned)
FROM dbo.open_box
GROUP BY contest_id),
[Total Number]=(SELECT SUM(number)
FROM dbo.open_box
GROUP BY contest_id),
Ref=(SELECT COUNT(*) FROM open WHERE refer = 'True');
This requires the Total Scanned and Total Number to be queried seperately.
Update: if you then want to INSERT that into another table there are essentially two ways to do that.
Create the table directly from the SELECT statement:
SELECT
-- the fields from the first query
INTO
[database_name].[schema_name].[new_table_name]; -- creates table new_table_name
Insert into a table that already exists from the INSERT
INSERT INTO [database_name].[schema_name].[existing_table_name](
-- the fields in the existing_table_name
)
SELECT
-- the fields from the first query
Just CROSS JOIN the five queries as derived tables:
SELECT * FROM (
Query1
) AS q1
CROSS JOIN (
Query2
) AS q2
CROSS JOIN (...
Assuming that each of your individual queries only returns one row, then this CROSS JOIN should result in only one row.

SELECT INTO query

I have to write an SELECT INTO T-SQL script for a table which has columns acc_number, history_number and note.
How do i facilitate an incremental value of history_number for each record being inserted via SELECT INTO.
Note, that the value for history_number comes off as a different value for each account from a different table.
SELECT history_number = IDENTITY(INT,1,1),
... etc...
INTO NewTable
FROM ExistingTable
WHERE ...
You could use ROW_NUMBER instead of identity i.e. ROW_NUMBER() OVER (ORDER BY )
SELECT acc_number
,o.historynumber
,note
,o.historynumber+DENSE_RANK() OVER (Partition By acc_number ORDER BY Note) AS NewHistoryNumber
--Or some other order by probably a timestamp...
FROM Table t
INNER JOIN OtherTable o
ON ....
Working Fiddle
The will give you an incremented count starting from history number for each accnum. I suggest you use a better order by in the rank but there was not enough info in the question.
This answer to this question may help you as well
Question
Suppose your SELECT statement is like this
SELECT acc_number,
history_number,
note
FROM [Table]
Try this Query as below.
SELECT ROW_NUMBER() OVER (ORDER BY acc_number) ID,
acc_number,
history_number,
note
INTO [NewTable]
FROM [Table]

SQL Server count(*) issue

I am using SQL Server 2008 Enterprise. I am using the following statement in SQL Server Management Studio as a part of a store procedure, and there is following error (compile error when I press F5 to run the store procedure). But when I removed count(), all error disappears. Any ideas what is wrong with count()? Another question is, my purpose is to return the total number of matched result and only return a part of result to implement paging (using tt.rowNum between #startPos and #requireCount + #startPos-1), any ideas how to implement that?
SELECT *
FROM (SELECT count(*), t.id,t.AdditionalInfo, ROW_NUMBER()
OVER (order by t.id) AS rowNum
FROM dbo.foo t
CROSS APPLY t.AdditionalInfo.nodes('/AdditionalInfo')
AS MyTestXMLQuery(AdditionalInfo)
WHERE
(Tag4=''+#InputTag4+'' OR Tag5=''+#InputTag5+'')
and (MyTestXMLQuery.AdditionalInfo.value
('(Item[#Name="Tag1"]/#Value)[1]', 'varchar(50)')
LIKE '%'+#Query+'%'
or MyTestXMLQuery.AdditionalInfo.value
('(Item[#Name="Tag2"]/#Value)[1]', 'varchar(50)')
LIKE '%'+#Query+'%'
or MyTestXMLQuery.AdditionalInfo.value
('(Item[#Name="Tag3"]/#Value)[1]', 'varchar(50)')
LIKE '%'+#Query+'%') ) tt
WHERE tt.rowNum between #startPos and #requireCount + #startPos-1
Error message,
Column 'dbo.foo.ID' is invalid in the select list
because it is not contained in either an aggregate function
or the GROUP BY clause.
No column name was specified for column 1 of 'tt'.
thanks in advance,
George
Replace it with
SELECT count(*) over() AS [Count]
It needs an alias as it is a column in a derived table.
The empty over() clause will return the count in the whole derived table. Is that what you need?
You generally can't mix aggregate functions and normal field selections without a GROUP BY clause.
In queries where you are only selecting a COUNT(*) it assumes you mean to lump everything together in one group. Once you select another field (without a corresponding GROUP BY), you introduce a contradiction to that assumption and it will not execute.
You need to have a GROUP BY clause. Try this:
SELECT *
FROM (SELECT
count(*) AS c, t.id,t.AdditionalInfo
FROM
dbo.foo t
CROSS APPLY
t.AdditionalInfo.nodes('/AdditionalInfo') AS MyTestXMLQuery(AdditionalInfo)
WHERE
(Tag4=''+#InputTag4+'' OR Tag5=''+#InputTag5+'')
and (MyTestXMLQuery.AdditionalInfo.value('(Item[#Name="Tag1"]/#Value)[1]', 'varchar(50)') LIKE '%'+#Query+'%'
or MyTestXMLQuery.AdditionalInfo.value('(Item[#Name="Tag2"]/#Value)[1]', 'varchar(50)') LIKE '%'+#Query+'%'
or MyTestXMLQuery.AdditionalInfo.value('(Item[#Name="Tag3"]/#Value)[1]', 'varchar(50)') LIKE '%'+#Query+'%')
GROUP BY t.id,t.AdditionalInfo
) tt
WHERE tt.rowNum between #startPos and #requireCount + #startPos-1
There might be more. Not sure.
Either way, it would do you a lot of good to learn about the theory behind the relational database model. This query needs a lot more help than what I just added. I mean it needs A LOT more help.
Edit: You also can't have a ROW_NUMBER() in a query that selects COUNT(*). What would you be trying to number? The number of Counts?
A guess, cause I can't run it, but try Changing it to:
Select * From
(Select count(*), t.id, t.AdditionalInfo, ROW_NUMBER()
OVER (order by t.id) AS rowNum
From dbo.foo t
CROSS APPLY t.AdditionalInfo.nodes('/AdditionalInfo')
AS MyTestXMLQuery(AdditionalInfo)
Where
(Tag4=''+#InputTag4+'' OR Tag5=''+#InputTag5+'')
and (MyTestXMLQuery.AdditionalInfo.value
('(Item[#Name="Tag1"]/#Value)[1]', 'varchar(50)')
LIKE '%'+#Query+'%'
or MyTestXMLQuery.AdditionalInfo.value
('(Item[#Name="Tag2"]/#Value)[1]', 'varchar(50)')
LIKE '%'+#Query+'%'
or MyTestXMLQuery.AdditionalInfo.value
('(Item[#Name="Tag3"]/#Value)[1]', 'varchar(50)')
LIKE '%'+#Query+'%')
Group By t.id, t.AdditionalInfo ) tt
Where tt.rowNum between #startPos and #requireCount + #startPos-1

Resources