I'm trying to select the max id from any n number linked tuples in a sql server db - we are writing an upgrade script for some data sets within an app, and need to know what the highest upgrade available is based on what the data's version is currently. For example, using the following simplified table 'versions':
oldVersionId newVersionId
1 2
2 3
3 4
10 11
We know we are version 1, and want to get the highest version out there that we can upgrade to; which would return 4 in this case, and not 11. We can have 0-n number of upgradable versions available at any given time. I'm not an sql wiz, and could only think to query using a variable number of chained selects:
select newVersion from versions where oldVersionId = (select newVersion from versions where oldVersionId = 1)
But it not an n numbered search, and won't return correctly if the number of elements is greater or less then the given. Is sql capable of performing such a query, and what elements / keywords should I be looking at to write one?
Solution:
You learn something new every day - I needed a hybrid of the two answers. Turns out sql can query a dataset using the tree-like child/parent linking that I'm way more comfortable with in OO languages.
In sql server you can set up a recursive tree walking call using a aliased table. You need an anchor and then the recursive bit. The first call is the anchor, I can use any value in the table, or a list of values, etc. The second select call just says to use the rest of the table to scan against.
Here is the syntax:
--Create the new alias (s)
;with s (oldVersionId, newVersionId) as
(
--set up the anchor node,
select oldVersionId, newVersionId from #t
where oldVersionId = 1
-- join it to the rest of the table, denoting that we only want nodes
-- where the old version is represented as a new version later
union all
select t.oldVersionId, t.newVersionId from #t as t
inner join s on t.oldVersionId = s.newVersionId
)
--Return the max value from the nodes I collected
select max(s.newVersionId) from s
Here is the solution with CTE - loop thru data while we have next match on oldVersionId = newVersionId:
declare #t table (
oldVersionId int,
newVersionId int )
insert into #t values (1,2)
insert into #t values (2,3)
insert into #t values (3,4)
insert into #t values (10,11)
insert into #t values (11,12)
insert into #t values (14,15)
declare #startVer int
set #startVer = 1
;with s (oldVersionId, newVersionId) as
(
select top 1 oldVersionId, newVersionId from #t
where oldVersionId = #startVer
union all
select t.oldVersionId, t.newVersionId from #t as t
inner join s on t.oldVersionId = s.newVersionId
)
select max(s.newVersionId)
from s
option (maxrecursion 0)
And here is solution without CTE - search for the last record which has newVersionId equals to 1 (1st version) plus the sum of imcremental updates to this version:
select max(t1.newVersionId)
from #t t1
where t1.oldVersionId >= #startVer
and t1.newVersionId = #startVer + (
select sum(newVersionId - oldVersionId)
from #t
where
oldVersionId >= #startVer and
oldVersionId < t1.newVersionId)
This problem can be solved by writing recursive queries to traverse recursive hierarchies in a table.
http://www.mssqltips.com/sqlservertip/1520/recursive-queries-using-common-table-expressions-cte-in-sql-server/
http://msdn.microsoft.com/en-us/library/ms186243.aspx
You effectively need to identify the final node in a linked list--seems to me your best bet would be to use the recursive features of CTEs to get to your 'max' version, but I'm not familiar enough with CTEs to get it working.
The following gets to the right answer, but only because I know how many links this particular dummy table will require beforehand; thus, not ideal.
CREATE TABLE #temp (
oldversionID SMALLINT,
newversionID SMALLINT )
INSERT INTO #temp
VALUES (1,2)
INSERT INTO #temp
VALUES (2,3)
INSERT INTO #temp
VALUES (3,4)
INSERT INTO #temp
VALUES (10,11);
select t1.oldversionID, t3.newversionID from #temp t1
inner join #temp t2
on t1.newversionId = t2.oldversionID
inner join #temp t3
on t2.newversionId = t3.oldversionID
Your problem is pretty much same as How to get the parent given a child in SQL SERVER 2005
I think same CTE will work for you.
Related
I have a list of teams in one table and list of cases in another table. I have to allocate a unique random case number to each one of the members in the team. What is the best way to generate unique random case number for each team member. I have read about NewID() and CRYPT_GEN_RANDOM(4) functions. I tried using them but not getting unique number for each team member. Can some one please help me. Thanks for your time. I am using SQL 2008.
I have a 'Teams' table which has team members, their ids(TM1,TM2 etc.) and their names.
I have another 'Cases' table which has ID numbers like 1,2,3,4 etc. I want to allocate random case to each team member. The desired output should be as below.
Team member Random_case_allocated
TM1 3
TM2 5
TM3 7
TM4 2
TM5 8
TM6 6
I have tried
SELECT TOP 1 id FROM cases
ORDER BY CRYPT_GEN_RANDOM(4)
It is giving the same id for all team members. I want a different case id for each team member. Can someone please help. Thank you.
The TOP(1) ORDER BY NEWID() will not work the way you are trying to get it to work here. The TOP is telling the query engine you are only interested on the first record of the result set. You need to have the NEWID() evaluate for each record. You can force this inside of a window function, such as ROW_NUMBER(). This could optimized I would imagine, however, it was what I could come up with from the top of my head. Please note, this is not nearly a truly random algorithm.
UPDATED With Previous Case Exclusions
DECLARE #User TABLE(UserId INT)
DECLARE #Case TABLE(CaseID INT)
DECLARE #UserCase TABLE (UserID INT, CaseID INT, DateAssigned DATETIME)
DECLARE #CaseCount INT =10
DECLARE #SaveCaseID INT = #CaseCount
DECLARE #UserCount INT = 100
DECLARE #NumberOfUserAllocatedAtStart INT= 85
WHILE(#CaseCount > 0)BEGIN
INSERT #Case VALUES(#CaseCount)
SET #CaseCount = #CaseCount-1
END
DECLARE #RandomCaseID INT
WHILE(#UserCount > 0)BEGIN
INSERT #User VALUES(#UserCount)
SET #UserCount = #UserCount-1
IF(#NumberOfUserAllocatedAtStart > 0 )BEGIN
SET #RandomCaseID = (ABS(CHECKSUM(NewId())) % (#SaveCaseID))+1
INSERT #UserCase SELECT #UserCount,#RandomCaseID,DATEADD(MONTH,-3,GETDATE())
SET #RandomCaseID = (ABS(CHECKSUM(NewId())) % (#SaveCaseID))+1
INSERT #UserCase SELECT #UserCount,#RandomCaseID,DATEADD(MONTH,-5,GETDATE())
SET #RandomCaseID = (ABS(CHECKSUM(NewId())) % (#SaveCaseID))+1
INSERT #UserCase SELECT #UserCount,#RandomCaseID,DATEADD(MONTH,-2,GETDATE())
SET #NumberOfUserAllocatedAtStart=#NumberOfUserAllocatedAtStart-1
END
END
;WITH RowNumberWithNewID AS
(
SELECT
U.UserID, C.CaseID, UserCase_CaseID = UC.CaseID,
RowNumber = ROW_NUMBER() OVER (PARTITION BY U.UserID ORDER BY NEWID())
FROM
#User U
INNER JOIN #Case C ON 1=1
LEFT OUTER JOIN #UserCase UC ON UC.UserID=U.UserID AND UC.CaseID=C.CaseID AND UC.DateAssigned > DATEADD(MONTH, -4, UC.DateAssigned)
WHERE
UC.CaseID IS NULL OR UC.CaseID <> C.CaseID
)
SELECT
UserID,
CaseID,
PreviousCases = STUFF((SELECT ', '+CONVERT(NVARCHAR(10), UC.CaseID) FROM #UserCase UC WHERE UC.UserID=RN.UserID FOR XML PATH('')),1,1,'')
FROM RowNumberWithNewID RN
WHERE
RN.RowNumber=1
Hi I have a view which is used in lots of search queries in my application.
The issue is application queries which use this view is is running very slow.I am investigating this and i found out a particular portion on the view definition which is making it slow.
create view Demoview AS
Select
p.Id as Id,
----------,
STUFF((SELECT ',' + [dbo].[OnlyAlphaNum](colDesc)
FROM dbo.ContactInfoDetails cd
WHERE pp.FormId = f.Id AND ppc.PageId = pp.Id
FOR XML PATH('')), 1, 1, '') AS PhoneNumber,
p.FirstName as Fname,
From
---
This is one of the column in the view.
The scalar function [OnlyAlphaNum] is making it slow,as it stops parallel execution of the query.
The function is as below;
CREATE FUNCTION [dbo].[OnlyAlphaNum]
(
#String VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
WITH SCHEMABINDING
AS
BEGIN
WHILE PATINDEX('%[^A-Z0-9]%', #String) > 0
SET #String = STUFF(#String, PATINDEX('%[^A-Z0-9]%', #String), 1, '')
RETURN #String
END
How can i convert it into an inline function.?
I tried with CASE ,but not successful.I have read that CTE is a good option.
Any idea how to tackle this problem.?
I already did this; you can read more about it here.
The function:
CREATE FUNCTION dbo.alphaNumericOnly8K(#pString varchar(8000))
RETURNS TABLE WITH SCHEMABINDING AS RETURN
/****************************************************************************************
Purpose:
Given a varchar(8000) string or smaller, this function strips all but the alphanumeric
characters that exist in #pString.
Compatibility:
SQL Server 2008+, Azure SQL Database, Azure SQL Data Warehouse & Parallel Data Warehouse
Parameters:
#pString = varchar(8000); Input string to be cleaned
Returns:
AlphaNumericOnly - varchar(8000)
Syntax:
--===== Autonomous
SELECT ca.AlphaNumericOnly
FROM dbo.AlphaNumericOnly(#pString) ca;
--===== CROSS APPLY example
SELECT ca.AlphaNumericOnly
FROM dbo.SomeTable st
CROSS APPLY dbo.AlphaNumericOnly(st.SomeVarcharCol) ca;
Programmer's Notes:
1. Based on Jeff Moden/Eirikur Eiriksson's DigitsOnlyEE function. For more details see:
http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx#bm1629360
2. This is an iTVF (Inline Table Valued Function) that performs the same task as a
scalar user defined function (UDF) accept that it requires the APPLY table operator.
Note the usage examples below and see this article for more details:
http://www.sqlservercentral.com/articles/T-SQL/91724/
The function will be slightly more complicated to use than a scalar UDF but will yeild
much better performance. For example - unlike a scalar UDF, this function does not
restrict the query optimizer's ability generate a parallel query plan. Initial testing
showed that the function generally gets a
3. AlphaNumericOnly runs 2-4 times faster when using make_parallel() (provided that you
have two or more logical CPU's and MAXDOP is not set to 1 on your SQL Instance).
4. This is an iTVF (Inline Table Valued Function) that will be used as an iSF (Inline
Scalar Function) in that it returns a single value in the returned table and should
normally be used in the FROM clause as with any other iTVF.
5. CHECKSUM returns an INT and will return the exact number given if given an INT to
begin with. It's also faster than a CAST or CONVERT and is used as a performance
enhancer by changing the bigint of ROW_NUMBER() to a more appropriately sized INT.
6. Another performance enhancement is using a WHERE clause calculation to prevent
the relatively expensive XML PATH concatentation of empty strings normally
determined by a CASE statement in the XML "loop".
7. Note that AlphaNumericOnly returns an nvarchar(max) value. If you are returning small
numbers consider casting or converting yout values to a numeric data type if you are
inserting the return value into a new table or using it for joins or comparison
purposes.
8. AlphaNumericOnly is deterministic; for more about deterministic and nondeterministic
functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx
Usage Examples:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM samd.alphaNumericOnly8K('xxx123abc999!!!') ao;
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY samd.alphaNumericOnly8K(st.txt);
---------------------------------------------------------------------------------------
Revision History:
Rev 00 - 20150526 - Inital Creation - Alan Burstein
Rev 00 - 20150526 - 3rd line in WHERE clause to correct something that was missed
- Eirikur Eiriksson
Rev 01 - 20180624 - ADDED ORDER BY N; now performing CHECKSUM conversion to INT inside
the final cte (digitsonly) so that ORDER BY N does not get sorted.
****************************************************************************************/
WITH
E1(N) AS
(
SELECT N
FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))x(N)
),
iTally(N) AS
(
SELECT TOP (LEN(ISNULL(#pString,CHAR(32)))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E1 a CROSS JOIN E1 b CROSS JOIN E1 c CROSS JOIN E1 d
)
SELECT AlphaNumericOnly =
(
SELECT SUBSTRING(#pString,CHECKSUM(N),1)
FROM iTally
WHERE
((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 48) & 0x7FFF) < 10
OR ((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 65) & 0x7FFF) < 26
OR ((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 97) & 0x7FFF) < 26
ORDER BY N
FOR XML PATH('')
);
Note the examples in the code comments:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM samd.alphaNumericOnly8K('xxx123abc999!!!') ao;
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY samd.alphaNumericOnly8K(st.txt);
Returns:
AlphaNumericOnly
-------------------
xxx123abc999
txtID OldTxt AlphaNumericOnly
----------- ------------- -----------------
1 !!!A555A!!! A555A
2 NULL NULL
3 AAA.999 AAA999
It's the fastest of it's kind. It runs extra fast with a parallel execution plan. To force a parallel execution plan, grab a copy of make_parallel by Adam Machanic. Then you would run it like this:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM dbo.alphaNumericOnly8K('xxx123abc999!!!') ao
CROSS APPLY dbo.make_parallel();
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY dbo.alphaNumericOnly8K(st.txt)
CROSS APPLY dbo.make_parallel();
Surely there is scope to improve this. test it out.
;WITH CTE AS (
SELECT (CASE WHEN PATINDEX('%[^A-Z0-9]%', D.Name) > 0
THEN STUFF(D.Name, PATINDEX('%[^A-Z0-9]%', D.Name), 1, '')
ELSE D.NAME
END ) NameString
FROM #dept D
UNION ALL
SELECT STUFF(C.NameString, PATINDEX('%[^A-Z0-9]%', C.NameString), 1, '')
FROM CTE C
WHERE PATINDEX('%[^A-Z0-9]%', C.NameString) > 0
)
Select STUFF((SELECT ',' + E.NameString from CTE E
WHERE PATINDEX('%[^A-Z0-9]%', E.NameString) = 0
FOR XML PATH('')), 1, 1, '') AS NAME
I have a table like:
TemplateBody
---------------------------------------------------------------------
1.This is To inform #FirstName# about the issues regarding #Location#
Here the key strings are #FirstName# and #Location# which are distinguished by hash tags.
I have another table with the replacement values:
Variables | TemplateValues
-----------------------------
1.#FirstName# | Joseph William
2.#Location# | Alaska
I need to replace these two key strings with their values in the first table.
There are several ways this can be done. I'll list two ways. Each one has advantages and disadvantages. I would personally use the first one (Dynamic SQL).
1. Dynamic SQL
Advantages: Fast, doesn't require recursion
Disadvantages: Can't be used to update table variables
2. Recursive CTE
Advantages: Allows updates of table variables
Disadvantages: Requires recursion and is memory intensive, recursive CTE's are slow
1.A. Dynamic SQL: Regular tables and Temporary tables.
This example uses a temporary table as the text source:
CREATE TABLE #tt_text(templatebody VARCHAR(MAX));
INSERT INTO #tt_text(templatebody)VALUES
('This is to inform #first_name# about the issues regarding #location#');
CREATE TABLE #tt_repl(variable VARCHAR(256),template_value VARCHAR(8000));
INSERT INTO #tt_repl(variable,template_value)VALUES
('#first_name#','Joseph William'),
('#location#','Alaska');
DECLARE #rep_call NVARCHAR(MAX)='templatebody';
SELECT
#rep_call='REPLACE('+#rep_call+','''+REPLACE(variable,'''','''''')+''','''+REPLACE(template_value,'''','''''')+''')'
FROM
#tt_repl;
DECLARE #stmt NVARCHAR(MAX)='SELECT '+#rep_call+' FROM #tt_text';
EXEC sp_executesql #stmt;
/* Use these statements if you want to UPDATE the source rather than SELECT from it
DECLARE #stmt NVARCHAR(MAX)='UPDATE #tt_text SET templatebody='+#rep_call;
EXEC sp_executesql #stmt;
SELECT * FROM #tt_text;*/
DROP TABLE #tt_repl;
DROP TABLE #tt_text;
1.B. Dynamic SQL: Table variables.
Requires to have the table defined as a specific table type. Example type definition:
CREATE TYPE dbo.TEXT_TABLE AS TABLE(
id INT IDENTITY(1,1) PRIMARY KEY,
templatebody VARCHAR(MAX)
);
GO
Define a table variable of this type, and use it in a Dynamic SQL statement as follows. Note that updating a table variable this way is not possible.
DECLARE #tt_text dbo.TEXT_TABLE;
INSERT INTO #tt_text(templatebody)VALUES
('This is to inform #first_name# about the issues regarding #location#');
DECLARE #tt_repl TABLE(id INT IDENTITY(1,1),variable VARCHAR(256),template_value VARCHAR(8000));
INSERT INTO #tt_repl(variable,template_value)VALUES
('#first_name#','Joseph William'),
('#location#','Alaska');
DECLARE #rep_call NVARCHAR(MAX)='templatebody';
SELECT
#rep_call='REPLACE('+#rep_call+','''+REPLACE(variable,'''','''''')+''','''+REPLACE(template_value,'''','''''')+''')'
FROM
#tt_repl;
DECLARE #stmt NVARCHAR(MAX)='SELECT '+#rep_call+' FROM #tt_text';
EXEC sp_executesql #stmt,N'#tt_text TEXT_TABLE READONLY',#tt_text;
2. Recursive CTE:
The only reasons why you would write this using a recursive CTE is that you intend to update a table variable, or you are not allowed to use Dynamic SQL somehow (eg company policy?).
Note that the default maximum recursion level is 100. If you have more than a 100 replacement variables you should increase this level by adding OPTION(MAXRECURSION 32767) at the end of the query (see Query Hints - MAXRECURSION).
DECLARE #tt_text TABLE(id INT IDENTITY(1,1),templatebody VARCHAR(MAX));
INSERT INTO #tt_text(templatebody)VALUES
('This is to inform #first_name# about the issues regarding #location#');
DECLARE #tt_repl TABLE(id INT IDENTITY(1,1),variable VARCHAR(256),template_value VARCHAR(8000));
INSERT INTO #tt_repl(variable,template_value)VALUES
('#first_name#','Joseph William'),
('#location#','Alaska');
;WITH cte AS (
SELECT
t.id,
l=1,
templatebody=REPLACE(t.templatebody,r.variable,r.template_value)
FROM
#tt_text AS t
INNER JOIN #tt_repl AS r ON r.id=1
UNION ALL
SELECT
t.id,
l=l+1,
templatebody=REPLACE(t.templatebody,r.variable,r.template_value)
FROM
cte AS t
INNER JOIN #tt_repl AS r ON r.id=t.l+1
)
UPDATE
#tt_text
SET
templatebody=cte.templatebody
FROM
#tt_text AS t
INNER JOIN cte ON
cte.id=t.id
WHERE
cte.l=(SELECT MAX(id) FROM #tt_repl);
/* -- if instead you wanted to select the replaced strings, comment out
-- the above UPDATE statement, and uncomment this SELECT statement:
SELECT
templatebody
FROM
cte
WHERE
l=(SELECT MAX(id) FROM #tt_repl);*/
SELECT*FROM #tt_text;
As long as the values for the Variables are unique ('#FirstName#' etc.) you can join each variable to the table containing TemplateBody:
select replace(replace(t.TemplateBody,'#FirstName#',variable.theVariable),'#Location#',variable2.theVariable)
from
[TemplateBodyTable] t
left join
(
select TemplateValues theVariable,Variables
from [VariablesTable] v
) variable on variable.Variables='#FirstName#'
left join
(
select TemplateValues theVariable,Variables
from [VariablesTable] v
) variable2 on variable2.Variables='#Location#'
A common table expression would allow you to loop through your templates and replace all variables in that template using a variables table. If you have a lot of variables, the level of recursion might go beyond the default limit of 100 recursions. You can play with the MAXRECURSION option according to your need.
DECLARE #Templates TABLE(Body nvarchar(max));
INSERT INTO #Templates VALUES ('This is to inform #FirstName# about the issues regarding #Location#');
DECLARE #Variables TABLE(Name nvarchar(50), Value nvarchar(max));
INSERT INTO #Variables VALUES ('#FirstName#', 'Joseph William'),
('#Location#', 'Alaska');
WITH replacing(Body, Level) AS
(
SELECT t.Body, 1 FROM #Templates t
UNION ALL
SELECT REPLACE(t.Body, v.Name, v.Value), t.Level + 1
FROM replacing t INNER JOIN #Variables v ON PATINDEX('%' + v.Name + '%', t.Body) > 0
)
SELECT TOP 1 r.Body
FROM replacing r
WHERE r.Level = (SELECT MAX(Level) FROM replacing)
OPTION (MAXRECURSION 0);
I have the following tables:
tbl_File:
FileID | Filename
-----------------
1 | test.jpg
and
tbl_Tag:
TagID | TagName
---------------
1 | Red
and
tbl_TagFile:
ID | TagID | FileID
-------------------
1 | 1 | 1
I need to pass a non-inclusive query against these tables. For example, imagine a list of checkboxes to select one or more tags, and then a search button. I need to pass the TagID's to the query as a PIPE delimited string, such as "1|2|5|"
The search results need to be non-inclusive, such as if it must meet all the criteria. If 3 tags are selected, the results are to be files that have all 3 tags associated with them.
I think I've made this too complicated, but tried iterating over the tags using charindex and stuff to work my way through the string, but it seems there must be an easier way.
I'd like to do this as a function... Such as
SELECT FileID, Filename
FROM tbl_Files
WHERE dbo.udf_FileExistswithTags(#Tags, FileID) = 1
Any efficient way to do this?
It doesn't sound from your example scenario that the actual "need" is to pass a pipe-delimited string. I would highly suggest abandoning that idea and using a Table Value Parameter in your stored procedure. This has numerous advantages in that you will not hit a datatype limit or a "number of parameters" limit that might occur with very large sets of criteria. Additionally it gets away from any need to run a (potentially very slow) UDF.
Split the string into tokens on the application side, and then insert each token as a row in the TVP. Example below:
Create the TVP type in your database:
CREATE TYPE [dbo].[FileNameType] AS TABLE
(
fileName varchar(1000)
)
On the application side, build your list of filename tokens into a recordset:
private static List<SqlDataRecord> BuildFileNameTokenRecords(IEnumerable<string> tokens)
{
var records = new List<SqlDataRecord>();
foreach (string token in tokens){
var record = new SqlDataRecord(
new SqlMetaData[]
{
new SqlMetaData("fileName", SqlDbType.Varchar),
}
);
records.Add(record);
}
return records;
}
Wherever you run your proc from (rough code here):
var records = BuildFileNameTokenRecords(listofstrings);
var sqlCmd = sqlDb.GetStoredProcCommand("FileExists");
sqlDb.AddInParameter(sqlCmd, "tvpFilenameTokens", SqlDbType.Structured, records);
ExecuteNonQuery(sqlCmd);
Filtering your select statement then simply becomes a matter of joining on the tokens in the table parameter. Something like this:
CREATE PROCEDURE dbo.FileExists
(
-- Put additional parameters here
#tvpFilenameTokens dbo.FileNameType READONLY,
)
AS
BEGIN
SELECT FileID, Filename
FROM tbl_Files INNER JOIN #tvpFilenameTokens
ON tbl_Files.FileID = #tvpFilenameTokens.fileName
END
Here is an option that should scale. All of the functionality is available back to SQL Server 2005. It uses a CTE to separate the portion of the query that finds only the FileIDs that have all of the TagIDs passed in, and then that list of FileIDs is joined to the [File] table to get the details. It also uses an INNER JOIN instead of an IN list to match the TagID's.
Please note that the example below uses a SQLCLR splitter that is freely available in the SQL# library (which I wrote, but this function is in the Free version). The specific splitter used is not the important part; it should just be one that is either SQLCLR, an inline tally-table (like the one used in #wewesthemenace's answer), or is the XML method. Just don't use a splitter based on a WHILE-loop or a recursive CTE.
---- TEST SETUP
DECLARE #File TABLE
(
FileID INT NOT NULL PRIMARY KEY,
[Filename] NVARCHAR(200) NOT NULL
);
DECLARE #TagFile TABLE
(
TagID INT NOT NULL,
FileID INT NOT NULL,
PRIMARY KEY (TagID, FileID)
);
INSERT INTO #File VALUES (1, 'File1.txt');
INSERT INTO #File VALUES (2, 'File2.txt');
INSERT INTO #File VALUES (3, 'File3.txt');
INSERT INTO #TagFile VALUES (1, 1);
INSERT INTO #TagFile VALUES (2, 1);
INSERT INTO #TagFile VALUES (5, 1);
INSERT INTO #TagFile VALUES (1, 2);
INSERT INTO #TagFile VALUES (2, 2);
INSERT INTO #TagFile VALUES (4, 2);
INSERT INTO #TagFile VALUES (1, 3);
INSERT INTO #TagFile VALUES (2, 3);
INSERT INTO #TagFile VALUES (5, 3);
INSERT INTO #TagFile VALUES (6, 3);
---- DONE WITH TEST SETUP
DECLARE #TagsToGet VARCHAR(100); -- this would be the proc input parameter
SET #TagsToGet = '1|2|5';
CREATE TABLE #Tags (TagID INT NOT NULL PRIMARY KEY);
DECLARE #NumTags INT;
INSERT INTO #Tags (TagID)
SELECT split.SplitVal
FROM SQL#.String_Split4k(#TagsToGet, '|', 1) split;
SET #NumTags = ##ROWCOUNT;
;WITH files AS
(
SELECT tf.FileID
FROM #TagFile tf
INNER JOIN #Tags tg
ON tg.TagID = tf.TagID
GROUP BY tf.FileID
HAVING COUNT(*) = #NumTags
)
SELECT fl.*
FROM #File fl
INNER JOIN files
ON files.FileID = fl.FileID
ORDER BY fl.[Filename] ASC;
DROP TABLE #Tags; -- don't need this if code above is placed in a proc
Results:
FileID Filename
1 File1.txt
3 File3.txt
Notes
As much as I love TVPs (and I do, when they are done correctly and used appropriately), I would say that they are a bit much for this type of small scale, single dimensional array scenario. There won't really be any performance gain over using a SQLCLR streaming TVF string splitter but it would require more app code and the additional User-Defined Table Type, which can't be updated without first dropping all procs that reference it. That doesn't happen all of the time, but needs to be considered in terms of long-term maintenance costs.
The JOIN between TagFile and the temporary table populated from the split operation should be much more efficient than using an IN list with a subquery for the split operation. An IN list is short-hand for all of the values in it to be their own OR conditions. Hence the JOIN is a fully set-based approach that lets the Query Optimizer do its thang.
The structure I used for the test #TagFile table only has the two relevant IDs in it: TagID and FileID. It does not have the ID field that I assume is an IDENTITY field on this table. Unless there is a very specific reason for needing that IDENTITY field, I would suggest removing it. It adds to inherent benefit as the combination of TagID and FileID is a natural key (i.e. it is both NOT NULL and Unique). And if the Clustered PK of this table were simply those two fields, the JOIN to the temp table of those split-out TagIDs would be quite fast, even with millions of rows in TagFile.
One reason that this approach works so much better than trying to handle this via a function per FileID (outside of the obvious set-based is better than cursor-based reason) is that the list of TagIDs is the same for all files to be checked. So splitting that out more than one time is a waste of effort.
By not splitting the TagID list inline in the query I am able to capture the number of elements in that list with no additional effort. Hence this saves from needing to do a secondary calculation.
Here is a function called DelimitedSplit8K by Jeff Moden. This is used to split strings of length up to 8000. For more info, read this: http://www.sqlservercentral.com/articles/Tally+Table/72993/
CREATE FUNCTION [dbo].[DelimitedSplit8K](
#pString VARCHAR(8000), --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
#pDelimiter CHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (--10E+1 or 10 rows
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString, t.N, 1) = #pDelimiter
),
cteLen(N1, L1) AS(--==== Return start and length (for use in substring)
SELECT
s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter, #pString, s.N1), 0) - s.N1, 8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
Your query would now be:
DECLARE #pString VARCHAR(8000) = '1|3|5'
SELECT
f.*
FROM tbl_File f
INNER JOIN tbl_TagFile tf ON tf.FileID = f.FileID
WHERE
tf.TagID IN(SELECT CAST(item AS INT) FROM dbo.DelimitedSplit8K(#pString, '|'))
GROUP BY f.FileID, f.FileName
HAVING COUNT(tf.ID) = (LEN(#pString) - LEN(REPLACE(#pString,'|','')) + 1)
The statement below counts the number of TagID in the parameter by counting the occurrence of the delimiter | + 1.
(LEN(#pString) - LEN(REPLACE(#pString,'|','')) + 1)
Here is an option that does not require UDF's.
It can be argued that this is also complicated.
DECLARE #TagList VARCHAR(50)
-- pass in this
SET #TagList = '1|3|6'
SELECT
FinalSet.FileID,
FinalSet.Tag,
FinalSet.TotalMatches
FROM
(
SELECT
tbl_TagFile.FileID,
tbl_TagFile.Tag,
COUNT(*) OVER(PARTITION BY tbl_TagFile.FileID) TotalMatches
FROM
(
SELECT 1 FileID, '1' Tag UNION ALL
SELECT 1 , '2' UNION ALL
SELECT 1 , '3' UNION ALL
SELECT 1 , '6' UNION ALL
SELECT 2 , '1' UNION ALL
SELECT 2 , '3'
) tbl_TagFile
INNER JOIN
(
SELECT tbl_Tag.Tag
FROM
(
SELECT '1' Tag UNION ALL
SELECT '2' UNION ALL
SELECT '3' UNION ALL
SELECT '4' UNION ALL
SELECT '5' UNION ALL
SELECT '6'
) tbl_Tag
WHERE '|' + #TagList + '|' LIKE '%|' + Tag + '|%'
) LimitedTagTable
ON LimitedTagTable.Tag = tbl_TagFile.Tag
) FinalSet
WHERE
FinalSet.TotalMatches = (LEN(#TagList) - LEN(REPLACE(#TagList,'|','')) + 1)
There's some complications in this around data types and indexes and stuff but you can see the concept - you are only getting the records that match your passed in string.
subtable LimitedTagTable is your tag list filtered by your input pipe delimited string
subtable FinalSet joins your limited tag list to your list of files
column TotalMatches works out how many tag matches your file had
Finally this line limits the output to those files that had enough matches:
FinalSet.TotalMatches = (LEN(#TagList) - LEN(REPLACE(#TagList,'|','')) + 1)
Please experiment with different inputs and datasets and see if it suits as I have made a number of assumptions.
I'm answering my own question, in hopes that someone can let me know if/how flawed it is. So far it seems to be working but just early testing.
Function:
ALTER FUNCTION [dbo].[udf_FileExistsByTags]
(
#FileID int
,#Tags nvarchar(max)
)
RETURNS bit
AS
BEGIN
DECLARE #Exists bit = 0
DECLARE #Count int = 0
DECLARE #TagTable TABLE ( FileID int, TagID int )
DECLARE #Tag int
WHILE len(#Tags) > 0
BEGIN
SET #Tag = CAST(LEFT(#Tags, charindex('|', #Tags + '|') -1) as int)
SET #Count = #Count + 1
IF EXISTS (SELECT * FROM tbl_FileTag WHERE FileID = #FileID AND TagID = #Tag )
BEGIN
INSERT INTO #TagTable ( FileID, TagID ) VALUES ( #FileID, #Tag )
END
SET #Tags = STUFF(#Tags, 1, charindex('|', #Tags + '|'), '')
END
SET #Exists = CASE WHEN #Count = (SELECT COUNT(*) FROM #TagTable) THEN 1 ELSE 0 END
RETURN #Exists
END
Then in the query:
SELECT * FROM tbl_File a WHERE dbo.udf_FileExistsByTags(a.FileID, #Tags) = 1
So now I'm looking for errors.
What do you think? Probably not every efficient, however this search will be used only on a periodic basis.
This is my table
BasketId(int) BasketName(varchar) BasketFruits(xml)
1 Gold <FRUITS><FID>1</FID><FID>2</FID><FID>3</FID><FID>4</FID><FID>5</FID><FID>6</FID></FRUITS>
2 Silver <FRUITS><FID>1</FID><FID>2</FID><FID>3</FID><FID>4</FID></FRUITS>
3 Bronze <FRUITS><FID>3</FID><FID>4</FID><FID>5</FID></FRUITS>
I need to search for the basket which has FID values 1 and 3
so that in this case i would get Gold and Silver
Although i've reached to the result where i can search for a SINGLE FID value like 1
using this code:
declare #fruitId varchar(10);
set #fruitId=1;
select * from Baskets
WHERE BasketFruits.exist('//FID/text()[contains(.,sql:variable("#fruitId"))]') = 1
HAD it been T-SQL i would have used the IN Clause like this
SELECT * FROM Baskets where FID in (1,3)
Any help/workaround appreciated...
First option would be to add another exist the where clause.
declare #fruitId1 int;
set #fruitId1=1;
declare #fruitId2 int;
set #fruitId2=3;
select *
from #Test
where
BasketFruits.exist('/FRUITS/FID[.=sql:variable("#fruitId1")]')=1 and
BasketFruits.exist('/FRUITS/FID[.=sql:variable("#fruitId2")]')=1
Another version would be to use both variables in the xquery statement, counting the hits.
select *
from #Test
where BasketFruits.value(
'count(distinct-values(/FRUITS/FID[.=(sql:variable("#fruitId1"),sql:variable("#fruitId2"))]))', 'int') = 2
The two queries above will work just fine if you know how many FID parameters you are going to use when you write the query. If you are in a situation where the number of FID's vary you could use something like this instead.
declare #FIDs xml = '<FID>1</FID><FID>3</FID>'
;with cteParam(FID) as
(
select T.N.value('.', 'int')
from #FIDs.nodes('FID') as T(N)
)
select T.BasketName
from #Test as T
cross apply T.BasketFruits.nodes('/FRUITS/FID') as F(FID)
inner join cteParam as p
on F.FID.value('.', 'int') = P.FID
group by T.BasketName
having count(T.BasketName) = (select count(*) from cteParam)
Build the #FIDs variable as an XML to hold the values you want to use in the query.
You can test the last query here: https://data.stackexchange.com/stackoverflow/q/101600/relational-division-with-xquery
It is a bit more involved than I hoped it would be - but this solution works.
Basically, I'm using a CTE (Common Table Expression) which breaks up the table and cross joins all values from the <FID> nodes to the basket names.
From that CTE, I select those baskets that contain both a value of 1 and 3.
DECLARE #Test TABLE (BasketID INT, BasketName VARCHAR(20), BasketFruits XML)
INSERT INTO #TEST
VALUES(1, 'Gold', '<FRUITS><FID>1</FID><FID>2</FID><FID>3</FID><FID>4</FID><FID>5</FID><FID>6</FID></FRUITS>'),
(2, 'Silver', '<FRUITS><FID>1</FID><FID>2</FID><FID>3</FID><FID>4</FID></FRUITS>'),
(3, 'Bronze', '<FRUITS><FID>3</FID><FID>4</FID><FID>5</FID></FRUITS>')
;WITH IDandFID AS
(
SELECT
t.BasketID,
t.BasketName,
FR.FID.value('(.)[1]', 'int') AS 'FID'
FROM #Test t
CROSS APPLY basketfruits.nodes('/FRUITS/FID') AS FR(FID)
)
SELECT DISTINCT
BasketName
FROM
IDandFID i1
WHERE
EXISTS(SELECT * FROM IDandFID i2 WHERE i1.BasketID = i2.BasketID AND i2.FID = 1)
AND EXISTS(SELECT * FROM IDandFID i3 WHERE i1.BasketID = i3.BasketID AND i3.FID = 3)
Running this query, I do get the expected output of:
BasketName
----------
Gold
Silver
Is this too trivial?
SELECT * FROM Baskets WHERE BasketFruits LIKE '%<FID>1</FID>%' AND BasketFruits LIKE '%<FID>3</FID>%'