I've got 34 rows in a database, each row has a column containing xml - the xml is actually in an NVARCHAR(MAX) column not an XML column.
For each row I am selecting values in the xml elements as a single resultset. The performance is pretty poor. I've tried two different queries. The first takes roughly 22 seconds to execute and the second takes 7.
Even at 7 seconds, this is far slower than optimal, I'm hoping for 1-2 seconds at most.
So then I read a rumor online that if you convert the NVARCHAR data to a XML using a temp table or table variable, you will achieve a performance gain, which at least in my case was true... It now executes in under a second. What I'm looking for now is an explanation that can tell my why these 2 approaches actually affect performance.
22 seconds:
SELECT
c.ID,
c.ChannelName,
[Name] = d.c.value('name[1]','varchar(100)'),
[Type] = d.c.value('transportName[1]','varchar(100)'),
[Enabled] = d.c.value('enabled[1]','BIT'),
[Queued] = d.c.value('properties[1]/destinationConnectorProperties[1]/queueEnabled[1]','varchar(100)'),
[RetryInterval] = d.c.value('properties[1]/destinationConnectorProperties[1]/retryIntervalMillis[1]','INT'),
[MaxRetries] = d.c.value('properties[1]/destinationConnectorProperties[1]/retryCount[1]','INT'),
[RotateQueue] = d.c.value('properties[1]/destinationConnectorProperties[1]/rotate[1]','BIT'),
[ThreadCount] = d.c.value('properties[1]/destinationConnectorProperties[1]/threadCount[1]','INT'),
[WaitForPrevious] = d.c.value('waitForPrevious[1]','BIT'),
[Destination] = COALESCE(
d.c.value('properties[1]/channelId[1]','varchar(100)'),
d.c.value('properties[1]/remoteAddress[1]','varchar(100)'),
d.c.value('properties[1]/wsdlUrl[1]','varchar(1024)')),
[DestinationPort] = COALESCE(
d.c.value('properties[1]/remotePort[1]','varchar(100)'),
d.c.value('properties[1]/port[1]','varchar(1024)')),
[Service] = d.c.value('properties[1]/service[1]','varchar(1024)'),
[Operation] = d.c.value('properties[1]/operation[1]','varchar(1024)')
FROM
(
SELECT
[ID],
[ChannelName] = [Name],
[CFG] = Convert(XML, Channel)
FROM
dbo.CHANNEL
) c
CROSS APPLY c.CFG.nodes('/channel/destinationConnectors/connector') d(c)
7 seconds, due to use of text(). I have no idea why text speeds things up.
SELECT
c.ID,
c.ChannelName,
[Name] = d.c.value('(name/text())[1]','varchar(100)'),
[Type] = d.c.value('(transportName/text())[1]','varchar(100)'),
[Enabled] = d.c.value('(enabled/text())[1]','BIT'),
[Queued] = d.c.value('(properties/destinationConnectorProperties/queueEnabled/text())[1]','varchar(100)'),
[RetryInterval] = d.c.value('(properties/destinationConnectorProperties/retryIntervalMillis/text())[1]','INT'),
[MaxRetries] = d.c.value('(properties/destinationConnectorProperties/retryCount/text())[1]','INT'),
[RotateQueue] = d.c.value('(properties/destinationConnectorProperties/rotate/text())[1]','BIT'),
[ThreadCount] = d.c.value('(properties/destinationConnectorProperties/threadCount/text())[1]','INT'),
[WaitForPrevious] = d.c.value('(waitForPrevious/text())[1]','BIT'),
[Destination] = COALESCE(
d.c.value('(properties/channelId/text())[1]','varchar(100)'),
d.c.value('(properties/remoteAddress/text())[1]','varchar(100)'),
d.c.value('(properties/wsdlUrl/text())[1]','varchar(1024)')),
[DestinationPort] = COALESCE(
d.c.value('(properties/remotePort/text())[1]','varchar(100)'),
d.c.value('(properties/port/text())[1]','varchar(1024)')),
[Service] = d.c.value('(properties/service/text())[1]','varchar(1024)'),
[Operation] = d.c.value('(properties/operation/text())[1]','varchar(1024)')
FROM
(
SELECT
[ID],
[ChannelName] = [Name],
[CFG] = Convert(XML, Channel)
FROM
dbo.CHANNEL
) c
CROSS APPLY c.CFG.nodes('/channel/destinationConnectors/connector') d(c)
This query uses the text() approach but puts converts the NVARCHAR column to xml column in a table variable first. Executes in less than a second...
DECLARE #Xml AS TABLE (
[ID] NVARCHAR(36) NOT NULL Primary Key,
[Name] NVARCHAR(100) NOT NULL,
[CFG] XML NOT NULL
);
INSERT INTO #Xml (ID, Name, CFG)
SELECT
c.ID,
c.Name,
Convert(XML, c.Channel)
FROM
[dbo].[CHANNEL] c;
SELECT
c.ID,
c.ChannelName,
[Name] = d.c.value('(name/text())[1]','varchar(100)'),
[Type] = d.c.value('(transportName/text())[1]','varchar(100)'),
[Enabled] = d.c.value('(enabled/text())[1]','BIT'),
[Queued] = d.c.value('(properties/destinationConnectorProperties/queueEnabled/text())[1]','varchar(100)'),
[RetryInterval] = d.c.value('(properties/destinationConnectorProperties/retryIntervalMillis/text())[1]','INT'),
[MaxRetries] = d.c.value('(properties/destinationConnectorProperties/retryCount/text())[1]','INT'),
[RotateQueue] = d.c.value('(properties/destinationConnectorProperties/rotate/text())[1]','BIT'),
[ThreadCount] = d.c.value('(properties/destinationConnectorProperties/threadCount/text())[1]','INT'),
[WaitForPrevious] = d.c.value('(waitForPrevious/text())[1]','BIT'),
[Destination] = COALESCE(
d.c.value('(properties/channelId/text())[1]','varchar(100)'),
d.c.value('(properties/remoteAddress/text())[1]','varchar(100)'),
d.c.value('(properties/wsdlUrl/text())[1]','varchar(1024)')),
[DestinationPort] = COALESCE(
d.c.value('(properties/remotePort/text())[1]','varchar(100)'),
d.c.value('(properties/port/text())[1]','varchar(1024)')),
[Service] = d.c.value('(properties/service/text())[1]','varchar(1024)'),
[Operation] = d.c.value('(properties/operation/text())[1]','varchar(1024)')
FROM
(
SELECT
[ID],
[ChannelName] = [Name],
[CFG]
FROM
#Xml
) c
CROSS APPLY c.CFG.nodes('/channel/destinationConnectors/connector') d(c)
I can give you one answer and one guess:
First I use a declared table variable to mock up your scenario:
DECLARE #tbl TABLE(s NVARCHAR(MAX));
INSERT INTO #tbl VALUES
(N'<root>
<SomeElement>This is first text of element1
<InnerElement>This is text of inner element1</InnerElement>
This is second text of element1
</SomeElement>
<SomeElement>This is first text of element2
<InnerElement>This is text of inner element2</InnerElement>
This is second text of element2
</SomeElement>
</root>')
,(N'<root>
<SomeElement>This is first text of elementA
<InnerElement>This is text of inner elementA</InnerElement>
This is second text of elementA
</SomeElement>
<SomeElement>This is first text of elementB
<InnerElement>This is text of inner elementB</InnerElement>
This is second text of elementB
</SomeElement>
</root>');
--This query will read the XML with a cast out of a sub-select. You might use a CTE instead, but this should be syntactical sugar only...
SELECT se.value(N'(.)[1]','nvarchar(max)') SomeElementsContent
,se.value(N'(InnerElement)[1]','nvarchar(max)') InnerElementsContent
,se.value(N'(./text())[1]','nvarchar(max)') ElementsFirstText
,se.value(N'(./text())[2]','nvarchar(max)') ElementsSecondText
FROM (SELECT CAST(s AS XML) FROM #tbl) AS tbl(TheXml)
CROSS APPLY TheXml.nodes(N'/root/SomeElement') AS A(se);
--The second part uses a table to write in the typed XML and read from there:
DECLARE #tbl2 TABLE(x XML)
INSERT INTO #tbl2
SELECT CAST(s AS XML) FROM #tbl;
SELECT se.value(N'(.)[1]','nvarchar(max)') SomeElementsContent
,se.value(N'(InnerElement)[1]','nvarchar(max)') InnerElementsContent
,se.value(N'(./text())[1]','nvarchar(max)') ElementsFirstText
,se.value(N'(./text())[2]','nvarchar(max)') ElementsSecondText
FROM #tbl2 t2
CROSS APPLY t2.x.nodes(N'/root/SomeElement') AS A(se);
Why is /text() faster than without /text()?
If you look at my example, the content of an element is everything from the opening tag down to the closing tag. The text() of an element is the floating text between these tags. You can see this in the results of the select above. The text() is one separately stored portion in a tree structure actually (read next section). To fetch it, is a one-step-action. Otherwise a complex structure has to be analysed to find everything between the opening tag and its corresponding closing tag - even if there is nothing else than the text().
Why should I store XML in the appropriate type?
XML is not just text with some silly extra characters! It is a document with a complex structure. The XML is not stored as the text you see. XML is stored in a tree structure. Whenever you cast a string, which represents an XML, into a real XML, this very expensive work must be done. When the XML is presented to you (or any other output) the representing string is (re)built from scratch.
Why is the pre-casted approach faster
This is guessing...
In my example both approaches are quite equal and lead to (almost) the same execution plan.
SQL Server will not work down everything the way you might expect this. This is not a procedural system where you state do this, than do this and after do this!. You tell the engine what you want, and the engine decides how to do this best. And the engine is pretty good with this!
Before execution starts, the engine tries to estimate the costs of approaches. CONVERT (or CAST) is a rather cheap operation. It could be, that the engine decides to work down the list of your calls and do the cast for each single need over and over, because it thinks, that this is cheaper than the expensive creation of a derived table...
Related
I'm trying to extract the data from SQL server execution plans in a generic way.
As an example the execution plan for
SELECT *
FROM sys.all_objects o1
as shown in SSMS is below
The UI shows nodes along with costs for each node and percentages. How can I extract this from the underlying XML into a table structure?
I've tried to query the XML by my self, but it seems that the XML structure is changing from query to query.
This should get you started (DB Fiddle example).
DECLARE #X XML = N'<?xml version="1.0" encoding="utf-16"?><ShowPlanXML ...';
DECLARE #Nodes TABLE
(
PlanId INT,
NodeId INT,
PhysicalOp VARCHAR(200),
EstimatedTotalSubtreeCost FLOAT,
EstimatedOperatorCost FLOAT,
ParentNodeId INT NULL,
PRIMARY KEY(PlanId, NodeId)
);
WITH XMLNAMESPACES (default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
plans AS
(
SELECT ROW_NUMBER() over (order by qp) as PlanId, qp.query('.') as plan_xml
FROM #X.nodes('//QueryPlan') n(qp)
)
INSERT #Nodes(PlanId, NodeId, PhysicalOp, EstimatedTotalSubtreeCost, ParentNodeId)
SELECT PlanId,
NodeId = relop.value('#NodeId', 'int'),
PhysicalOp = relop.value('#PhysicalOp', 'varchar(200)'),
EstimatedTotalSubtreeCost = relop.value('#EstimatedTotalSubtreeCost', 'float'),
/*XPath ancestor axis not supported so just go up a few levels and look for the closest ancestor Relop*/
ParentNodeId = COALESCE(
relop.value('..[local-name() = "RelOp"]/#NodeId', 'int'),
relop.value('../..[local-name() = "RelOp"]/#NodeId', 'int'),
relop.value('../../..[local-name() = "RelOp"]/#NodeId', 'int'),
relop.value('../../../..[local-name() = "RelOp"]/#NodeId', 'int')
)
FROM plans
CROSS APPLY plan_xml.nodes('//RelOp') n(relop);
UPDATE N1
SET EstimatedOperatorCost = EstimatedTotalSubtreeCost - ISNULL((SELECT SUM(EstimatedTotalSubtreeCost) FROM #Nodes N2 WHERE N1.PlanId = N2.PlanId AND N2.ParentNodeId = N1.NodeId),0)
FROM #Nodes N1
SELECT *,
EstPctOperatorCost = FORMAT(EstimatedOperatorCost/MAX(EstimatedTotalSubtreeCost) OVER (PARTITION BY PlanId), 'P0')
FROM #Nodes
The execution plan is a tree - there are likely more elegant ways of getting the parent operator than my attempt!
The above is not battle tested across a sample size of more than two execution plans so you may well encounter issues with it that you will need to fix.
You can visit the URI http://schemas.microsoft.com/sqlserver/2004/07/showplan to see information about the various schemas though for some reason I've never got to the bottom of it displays "The request is blocked." for me unless I use incognito mode.
Environment: SQL Server 2019 (v15).
I have a large query that uses too much space when run as a single SELECT statement. When I try to run it, I get the following error:
Could not allocate a new page for database 'TEMPDB' because of insufficient disk space in filegroup 'DEFAULT'.
However, the problem breaks down naturally into a dozen or so pieces, so I wrote a WHILE loop to iterate through each piece and insert into a results table. Unfortunately, the first iteration of the WHILE loop also returns the same memory error. All the WHILE loop is doing is changing a few values in the WHERE clause.
The key thing confusing me here, is that when I manually run one iteration of the INSERT statement, absent all looping logic, it works perfectly.
Manually coding the first iteration to use the first institution_name just works, so I don't think the joins here are going wrong and causing the memory error.
WITH my_cte AS
(
SELECT [columns]
FROM mytable a
INNER JOIN bigtable b ON a.institution_name = b.institution_name
AND a.personID = b.personID
WHERE a.institution_name = 'ABC'
AND b.institution_name = 'ABC'
)
INSERT INTO results (personID, institution_name, ...)
SELECT personID, institution_name, [some aggregations]
FROM my_cte
GROUP BY personID, institution_name;
The version with the WHILE loop fails. I need to run the query with different values for institution_name.
Here I show three different values but even just the first iteration fails.
DECLARE #INSTITUTION varchar(10)
DECLARE #COUNTER int
SET #COUNTER = 0
DECLARE #LOOKUP table (temp_val varchar(10), temp_id int)
INSERT INTO #LOOKUP (temp_val, temp_id)
VALUES ('ABC', 1), ('DEF', 2), ('GHI', 3)
WHILE #COUNTER < 3
BEGIN
SET #COUNTER = #COUNTER + 1
SELECT #INSTITUTION = temp_val
FROM #LOOKUP
WHERE temp_id = #COUNTER;
WITH my_cte AS
(
SELECT [columns]
FROM mytable a
INNER JOIN bigtable b ON a.institution_name = b.institution_name
AND a.personID = b.personID
WHERE a.institution_name = #INSTITUTION
AND b.institution_name = #INSTITUTION
)
INSERT INTO results (personID, institution_name, ...)
SELECT personID, institution_name, [some aggregations]
FROM my_cte
GROUP BY personID, institution_name
END
As I write this question, I have quite literally just copy-pasted the insert statement a dozen times, changed the relevant WHERE clause, and run it without errors. Could it be some kind of datatype issue where the query can properly subset if a string literal is put in the WHERE column, but the lookup on my temporary table is failing due to the datatype? I notice that mytable.institution_name is varchar(10) while bigtable.institution_name is nvarchar(10). Setting the temp table to use nvarchar(10) didn't fix it either.
I have the following tables:
tbl_File:
FileID | Filename
-----------------
1 | test.jpg
and
tbl_Tag:
TagID | TagName
---------------
1 | Red
and
tbl_TagFile:
ID | TagID | FileID
-------------------
1 | 1 | 1
I need to pass a non-inclusive query against these tables. For example, imagine a list of checkboxes to select one or more tags, and then a search button. I need to pass the TagID's to the query as a PIPE delimited string, such as "1|2|5|"
The search results need to be non-inclusive, such as if it must meet all the criteria. If 3 tags are selected, the results are to be files that have all 3 tags associated with them.
I think I've made this too complicated, but tried iterating over the tags using charindex and stuff to work my way through the string, but it seems there must be an easier way.
I'd like to do this as a function... Such as
SELECT FileID, Filename
FROM tbl_Files
WHERE dbo.udf_FileExistswithTags(#Tags, FileID) = 1
Any efficient way to do this?
It doesn't sound from your example scenario that the actual "need" is to pass a pipe-delimited string. I would highly suggest abandoning that idea and using a Table Value Parameter in your stored procedure. This has numerous advantages in that you will not hit a datatype limit or a "number of parameters" limit that might occur with very large sets of criteria. Additionally it gets away from any need to run a (potentially very slow) UDF.
Split the string into tokens on the application side, and then insert each token as a row in the TVP. Example below:
Create the TVP type in your database:
CREATE TYPE [dbo].[FileNameType] AS TABLE
(
fileName varchar(1000)
)
On the application side, build your list of filename tokens into a recordset:
private static List<SqlDataRecord> BuildFileNameTokenRecords(IEnumerable<string> tokens)
{
var records = new List<SqlDataRecord>();
foreach (string token in tokens){
var record = new SqlDataRecord(
new SqlMetaData[]
{
new SqlMetaData("fileName", SqlDbType.Varchar),
}
);
records.Add(record);
}
return records;
}
Wherever you run your proc from (rough code here):
var records = BuildFileNameTokenRecords(listofstrings);
var sqlCmd = sqlDb.GetStoredProcCommand("FileExists");
sqlDb.AddInParameter(sqlCmd, "tvpFilenameTokens", SqlDbType.Structured, records);
ExecuteNonQuery(sqlCmd);
Filtering your select statement then simply becomes a matter of joining on the tokens in the table parameter. Something like this:
CREATE PROCEDURE dbo.FileExists
(
-- Put additional parameters here
#tvpFilenameTokens dbo.FileNameType READONLY,
)
AS
BEGIN
SELECT FileID, Filename
FROM tbl_Files INNER JOIN #tvpFilenameTokens
ON tbl_Files.FileID = #tvpFilenameTokens.fileName
END
Here is an option that should scale. All of the functionality is available back to SQL Server 2005. It uses a CTE to separate the portion of the query that finds only the FileIDs that have all of the TagIDs passed in, and then that list of FileIDs is joined to the [File] table to get the details. It also uses an INNER JOIN instead of an IN list to match the TagID's.
Please note that the example below uses a SQLCLR splitter that is freely available in the SQL# library (which I wrote, but this function is in the Free version). The specific splitter used is not the important part; it should just be one that is either SQLCLR, an inline tally-table (like the one used in #wewesthemenace's answer), or is the XML method. Just don't use a splitter based on a WHILE-loop or a recursive CTE.
---- TEST SETUP
DECLARE #File TABLE
(
FileID INT NOT NULL PRIMARY KEY,
[Filename] NVARCHAR(200) NOT NULL
);
DECLARE #TagFile TABLE
(
TagID INT NOT NULL,
FileID INT NOT NULL,
PRIMARY KEY (TagID, FileID)
);
INSERT INTO #File VALUES (1, 'File1.txt');
INSERT INTO #File VALUES (2, 'File2.txt');
INSERT INTO #File VALUES (3, 'File3.txt');
INSERT INTO #TagFile VALUES (1, 1);
INSERT INTO #TagFile VALUES (2, 1);
INSERT INTO #TagFile VALUES (5, 1);
INSERT INTO #TagFile VALUES (1, 2);
INSERT INTO #TagFile VALUES (2, 2);
INSERT INTO #TagFile VALUES (4, 2);
INSERT INTO #TagFile VALUES (1, 3);
INSERT INTO #TagFile VALUES (2, 3);
INSERT INTO #TagFile VALUES (5, 3);
INSERT INTO #TagFile VALUES (6, 3);
---- DONE WITH TEST SETUP
DECLARE #TagsToGet VARCHAR(100); -- this would be the proc input parameter
SET #TagsToGet = '1|2|5';
CREATE TABLE #Tags (TagID INT NOT NULL PRIMARY KEY);
DECLARE #NumTags INT;
INSERT INTO #Tags (TagID)
SELECT split.SplitVal
FROM SQL#.String_Split4k(#TagsToGet, '|', 1) split;
SET #NumTags = ##ROWCOUNT;
;WITH files AS
(
SELECT tf.FileID
FROM #TagFile tf
INNER JOIN #Tags tg
ON tg.TagID = tf.TagID
GROUP BY tf.FileID
HAVING COUNT(*) = #NumTags
)
SELECT fl.*
FROM #File fl
INNER JOIN files
ON files.FileID = fl.FileID
ORDER BY fl.[Filename] ASC;
DROP TABLE #Tags; -- don't need this if code above is placed in a proc
Results:
FileID Filename
1 File1.txt
3 File3.txt
Notes
As much as I love TVPs (and I do, when they are done correctly and used appropriately), I would say that they are a bit much for this type of small scale, single dimensional array scenario. There won't really be any performance gain over using a SQLCLR streaming TVF string splitter but it would require more app code and the additional User-Defined Table Type, which can't be updated without first dropping all procs that reference it. That doesn't happen all of the time, but needs to be considered in terms of long-term maintenance costs.
The JOIN between TagFile and the temporary table populated from the split operation should be much more efficient than using an IN list with a subquery for the split operation. An IN list is short-hand for all of the values in it to be their own OR conditions. Hence the JOIN is a fully set-based approach that lets the Query Optimizer do its thang.
The structure I used for the test #TagFile table only has the two relevant IDs in it: TagID and FileID. It does not have the ID field that I assume is an IDENTITY field on this table. Unless there is a very specific reason for needing that IDENTITY field, I would suggest removing it. It adds to inherent benefit as the combination of TagID and FileID is a natural key (i.e. it is both NOT NULL and Unique). And if the Clustered PK of this table were simply those two fields, the JOIN to the temp table of those split-out TagIDs would be quite fast, even with millions of rows in TagFile.
One reason that this approach works so much better than trying to handle this via a function per FileID (outside of the obvious set-based is better than cursor-based reason) is that the list of TagIDs is the same for all files to be checked. So splitting that out more than one time is a waste of effort.
By not splitting the TagID list inline in the query I am able to capture the number of elements in that list with no additional effort. Hence this saves from needing to do a secondary calculation.
Here is a function called DelimitedSplit8K by Jeff Moden. This is used to split strings of length up to 8000. For more info, read this: http://www.sqlservercentral.com/articles/Tally+Table/72993/
CREATE FUNCTION [dbo].[DelimitedSplit8K](
#pString VARCHAR(8000), --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
#pDelimiter CHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (--10E+1 or 10 rows
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString, t.N, 1) = #pDelimiter
),
cteLen(N1, L1) AS(--==== Return start and length (for use in substring)
SELECT
s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter, #pString, s.N1), 0) - s.N1, 8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
Your query would now be:
DECLARE #pString VARCHAR(8000) = '1|3|5'
SELECT
f.*
FROM tbl_File f
INNER JOIN tbl_TagFile tf ON tf.FileID = f.FileID
WHERE
tf.TagID IN(SELECT CAST(item AS INT) FROM dbo.DelimitedSplit8K(#pString, '|'))
GROUP BY f.FileID, f.FileName
HAVING COUNT(tf.ID) = (LEN(#pString) - LEN(REPLACE(#pString,'|','')) + 1)
The statement below counts the number of TagID in the parameter by counting the occurrence of the delimiter | + 1.
(LEN(#pString) - LEN(REPLACE(#pString,'|','')) + 1)
Here is an option that does not require UDF's.
It can be argued that this is also complicated.
DECLARE #TagList VARCHAR(50)
-- pass in this
SET #TagList = '1|3|6'
SELECT
FinalSet.FileID,
FinalSet.Tag,
FinalSet.TotalMatches
FROM
(
SELECT
tbl_TagFile.FileID,
tbl_TagFile.Tag,
COUNT(*) OVER(PARTITION BY tbl_TagFile.FileID) TotalMatches
FROM
(
SELECT 1 FileID, '1' Tag UNION ALL
SELECT 1 , '2' UNION ALL
SELECT 1 , '3' UNION ALL
SELECT 1 , '6' UNION ALL
SELECT 2 , '1' UNION ALL
SELECT 2 , '3'
) tbl_TagFile
INNER JOIN
(
SELECT tbl_Tag.Tag
FROM
(
SELECT '1' Tag UNION ALL
SELECT '2' UNION ALL
SELECT '3' UNION ALL
SELECT '4' UNION ALL
SELECT '5' UNION ALL
SELECT '6'
) tbl_Tag
WHERE '|' + #TagList + '|' LIKE '%|' + Tag + '|%'
) LimitedTagTable
ON LimitedTagTable.Tag = tbl_TagFile.Tag
) FinalSet
WHERE
FinalSet.TotalMatches = (LEN(#TagList) - LEN(REPLACE(#TagList,'|','')) + 1)
There's some complications in this around data types and indexes and stuff but you can see the concept - you are only getting the records that match your passed in string.
subtable LimitedTagTable is your tag list filtered by your input pipe delimited string
subtable FinalSet joins your limited tag list to your list of files
column TotalMatches works out how many tag matches your file had
Finally this line limits the output to those files that had enough matches:
FinalSet.TotalMatches = (LEN(#TagList) - LEN(REPLACE(#TagList,'|','')) + 1)
Please experiment with different inputs and datasets and see if it suits as I have made a number of assumptions.
I'm answering my own question, in hopes that someone can let me know if/how flawed it is. So far it seems to be working but just early testing.
Function:
ALTER FUNCTION [dbo].[udf_FileExistsByTags]
(
#FileID int
,#Tags nvarchar(max)
)
RETURNS bit
AS
BEGIN
DECLARE #Exists bit = 0
DECLARE #Count int = 0
DECLARE #TagTable TABLE ( FileID int, TagID int )
DECLARE #Tag int
WHILE len(#Tags) > 0
BEGIN
SET #Tag = CAST(LEFT(#Tags, charindex('|', #Tags + '|') -1) as int)
SET #Count = #Count + 1
IF EXISTS (SELECT * FROM tbl_FileTag WHERE FileID = #FileID AND TagID = #Tag )
BEGIN
INSERT INTO #TagTable ( FileID, TagID ) VALUES ( #FileID, #Tag )
END
SET #Tags = STUFF(#Tags, 1, charindex('|', #Tags + '|'), '')
END
SET #Exists = CASE WHEN #Count = (SELECT COUNT(*) FROM #TagTable) THEN 1 ELSE 0 END
RETURN #Exists
END
Then in the query:
SELECT * FROM tbl_File a WHERE dbo.udf_FileExistsByTags(a.FileID, #Tags) = 1
So now I'm looking for errors.
What do you think? Probably not every efficient, however this search will be used only on a periodic basis.
I have a SourceTable and a table variable #TQueries containing various T-SQL predicates that target SourceTable.
The expected result is to dynamically generate SELECT statements that return a list of Id's as specified by the predicates in #TQueries. Each dynamically generated SELECT statement also needs to execute in a particular order, and the final set of values needs to be unique and the ordering must be preserved.
Fortunately, there's a limit to how many values need to be retrieved and how many dynamic queries need to be generated. The Id list should contain at most 10 Ids, and we don't expect more than 7 queries.
The following is a sample of this setup, not the actual data/database:
-- Set up some test data, this is quick and dirty just to provide some data to test against
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[SourceTable]') AND type in (N'U'))
BEGIN
-- Create a numbers table, sorta
SELECT TOP 20
IDENTITY(INT,1,1) AS Id,
ABS(CHECKSUM(NewId())) % 100 AS [SomeValue]
INTO [SourceTable]
FROM sysobjects a
END
DECLARE #TQueries TABLE (
[Ordinal] INT,
[WherePredicate] NVARCHAR(MAX),
[OrderByPredicate] NVARCHAR(MAX)
);
-- Simulate SELECTs with different order by that get different data due to varying WHERE clauses and ORDER conditions
INSERT INTO #TQueries VALUES ( 1, N'[Id] IN (6,11,13,7,10,3,15)', '[SomeValue] ASC' ) -- Sort Asc
INSERT INTO #TQueries VALUES ( 2, N'[Id] IN (9,15,14,20,17)', '[SomeValue] DESC' ) -- Sort Desc
INSERT INTO #TQueries VALUES ( 3, N'[Id] IN (20,10,1,16,11,19,9,15,17,6,2,3,13)', 'NEWID()' ) -- Sort Random
My main issue has been avoiding the use of a CURSOR or iterating through the rows one by one. The closest I've come to a set operation that meets this criteria is using a table variable to store the results of each query or a massive CTE.
Suggestions and comments are welcome.
Here's a solution that builds a single statement both to run all the queries and to return the results.
It uses a similar approach as in your answer when iterating over the #TQueries table, i.e. it also uses {...} tokens where column values from #TQuery should go, and it puts the values there with nested REPLACE() calls.
Other than that, it heavily depends on ranking functions, and I'm not sure if doesn't really abuse them. You'd need to test this method before deciding if it's better or worse than the one you've got so far.
DECLARE #QueryTemplate nvarchar(max), #FinalSQL nvarchar(max);
SET #QueryTemplate =
N'SELECT
[Id],
QueryRank = {Ordinal},
RowRank = ROW_NUMBER() OVER (ORDER BY {OrderByPredicate})
FROM [dbo].[SourceTable]
WHERE {WherePredicate}
';
SET #FinalSQL =
N'WITH AllData AS (
' +
SUBSTRING(
(
SELECT
'UNION ALL ' +
REPLACE(REPLACE(REPLACE(#QueryTemplate,
'{Ordinal}' , [Ordinal] ),
'{OrderByPredicate}', [OrderByPredicate]),
'{WherePredicate}' , [WherePredicate] )
FROM #TQueries
ORDER BY [Ordinal]
FOR XML PATH (''), TYPE
).value('.', 'nvarchar(max)'),
11, -- starting just after the first 'UNION ALL '
CAST(0x7FFFFFFF AS int) -- max int; no need to specify the exact length
) +
'),
RankedData AS (
SELECT
[Id],
QueryRank,
RowRank,
ValueRank = ROW_NUMBER() OVER (PARTITION BY [Id] ORDER BY QueryRank)
FROM AllData
)SELECT TOP (#top)
[Id]
FROM RankedData
WHERE ValueRank = 1
ORDER BY
QueryRank,
RowRank
';
PRINT #FinalSQL;
EXECUTE sp_executesql #FinalSQL, N'#top int', 10;
Basically, every subquery gets these auxiliary columns:
QueryRank – a constant value (within the subquery's result set) derived from [Ordinal];
RowRank – a ranking assigned to a row based on the [OrderByPredicate].
The result sets are UNIONed and then every entry of every unique value is again ranked (ValueRank) based on the query ranking.
When pulling the final result set, duplicates are suppressed (by the condition ValueRank = 1), and QueryRank and RowRank are used in the ORDER BY clause to preserve the original row order.
I used EXECUTE sp_executesql #query instead of EXECUTE (#query), because the former allows you to add parameters to the query. In particular, I parametrised the number of results to return (the argument of TOP). But you could certainly concatenate that value into the dynamic script directly, just like other things, if you prefer EXECUTE () over EXECUTE sq_executesql.
If you like, you can try this query at SQL Fiddle. (Note: the SQL Fiddle version replaces the #TQueries table variable with the TQueries table.)
This is what I've managed to piece together cobbled from my original response and improved upon by comments from #AndriyM
DECLARE #sql_prefix NVARCHAR(MAX);
SET #sql_prefix =
N'DECLARE #TResults TABLE (
[Ordinal] INT IDENTITY(1,1),
[ContentItemId] INT
);
DECLARE #max INT, #top INT;
SELECT #max = 10;';
DECLARE #sql_insert_template NVARCHAR(MAX), #sql_body NVARCHAR(MAX);
SET #sql_insert_template =
N'SELECT #top = #max - COUNT(*) FROM #TResults;
INSERT INTO #TResults
SELECT TOP (#top) [Id]
FROM [dbo].[SourceTable]
WHERE
{WherePredicate}
AND NOT EXISTS (
SELECT 1
FROM #TResults AS [tr]
WHERE [tr].[ContentItemId] = [SourceTable].[Id]
)
ORDER BY {OrderByPredicate};';
WITH Query ([Ordinal],[SqlCommand]) AS (
SELECT
[Ordinal],
REPLACE(REPLACE(#sql_insert_template, '{WherePredicate}', [WherePredicate]), '{OrderByPredicate}', [OrderByPredicate])
FROM #TQueries
)
SELECT
#sql_body = #sql_prefix + (
SELECT [SqlCommand]
FROM Query
ORDER BY [Ordinal] ASC
FOR XML PATH(''),TYPE).value('.', 'varchar(max)') + CHAR(13)+CHAR(10)
+N' SELECT * FROM #TResults ORDER BY [Ordinal]';
EXEC(#sql_body);
The basic idea is to use a table variable to hold the results of each query. I create a template for the SQL and replace the values in the template based on what is stored in #TQueries.
Once the entire script is completed I run it with EXEC.
On SQL Server 2008 R2, I am trying to read XML value as table.
So far, I am here :
DECLARE #XMLValue AS XML;
SET #XMLValue = '<SearchQuery>
<ResortID>1453</ResortID>
<CheckInDate>2011-10-27</CheckInDate>
<CheckOutDate>2011-11-04</CheckOutDate>
<Room>
<NumberOfADT>2</NumberOfADT>
<CHD>
<Age>10</Age>
</CHD>
<CHD>
<Age>12</Age>
</CHD>
</Room>
<Room>
<NumberOfADT>1</NumberOfADT>
</Room>
<Room>
<NumberOfADT>1</NumberOfADT>
<CHD>
<Age>7</Age>
</CHD>
</Room>
</SearchQuery>';
SELECT
Room.value('(NumberOfADT)[1]', 'INT') AS NumberOfADT
FROM #XMLValue.nodes('/SearchQuery/Room') AS SearchQuery(Room);
As you can see, Room node sometimes get CHD child nodes but sometimes don't.
Assume that I am getting this XML value as a Stored Procedure parameter. So, I need to work with the values in order to query my database tables. What would be the best way to read this XML parameter entirely?
EDIT
I think I need to express what I am expecting in return here. The below script code is for the table what I need here :
DECLARE #table AS TABLE(
ResorrtID INT,
CheckInDate DATE,
CheckOutDate DATE,
NumberOfADT INT,
CHDCount INT,
CHDAges NVARCHAR(100)
);
For the XML value I have provide above, the below Insert t-sql is suitable :
INSERT INTO #table VALUES(1453, '2011-10-27', '2011-11-04', 2, 2, '10;12');
INSERT INTO #table VALUES(1453, '2011-10-27', '2011-11-04', 1, 0, NULL);
INSERT INTO #table VALUES(1453, '2011-10-27', '2011-11-04', 1, 1, '7');
CHDCount is for the number of CHD nodes under Room node. Also, how many Room node I have, that many table row I am having here.
As for how it should look, see the below picture :
Actually, this code is for hotel reservation search query. So, I need
to work with these values I got from XML parameter to query my tables
and return available rooms. I am telling this because maybe it helps
you guys to see it through. I am not looking for a complete code for
room reservation system. That would be so selfish.
select S.X.value('ResortID[1]', 'int') as ResortID,
S.X.value('CheckInDate[1]', 'date') as CheckInDate,
S.X.value('CheckOutDate[1]', 'date') as CheckOutDate,
R.X.value('NumberOfADT[1]', 'int') as NumberOfADT,
R.X.value('count(CHD)', 'int') as CHDCount,
stuff((select ';'+C.X.value('.', 'varchar(3)')
from R.X.nodes('CHD/Age') as C(X)
for xml path('')), 1, 1, '') as CHDAges
from #XMLValue.nodes('/SearchQuery') as S(X)
cross apply S.X.nodes('Room') as R(X)
This should get you close:
SELECT ResortID = #xmlvalue.value('(//ResortID)[1]', 'int')
, CheckInDate = #xmlvalue.value('(//CheckInDate)[1]', 'date')
, CheckOutDate = #xmlvalue.value('(//CheckOutDate)[1]', 'date')
, NumberOfAdt = Room.value('(NumberOfADT)[1]', 'INT')
, CHDCount = Room.value('count(./CHD)', 'int')
, CHDAges = Room.query('for $c in ./CHD
return concat(($c/Age)[1], ";")').value('(.)[1]',
'varchar(100)')
FROM #XMLValue.nodes('/SearchQuery/Room') AS SearchQuery ( Room ) ;