SQL Server 2008 split string fails due to ampersand - sql-server

I have created a stored procedure to attempt to replicate the split_string function that is now in SQL Server 2016.
So far I have got this:
CREATE FUNCTION MySplit
(#delimited NVARCHAR(MAX), #delimiter NVARCHAR(100))
RETURNS #t TABLE
(
-- Id column can be commented out, not required for SQL splitting string
id INT IDENTITY(1,1), -- I use this column for numbering split parts
val NVARCHAR(MAX)
)
AS
BEGIN
DECLARE #xml XML
SET #xml = N'<root><r>' + replace(#delimited,#delimiter,'</r><r>') + '</r></root>'
INSERT INTO #t(val)
SELECT
r.value('.','varchar(max)') AS item
FROM
#xml.nodes('//root/r') AS records(r)
RETURN
END
GO
And it does work, but it will not split the text string if any part of it contains an ampersand [ & ].
I have found hundreds of examples of splitting a string, but none seem to deal with special characters.
So using this:
select *
from MySplit('Test1,Test2,Test3', ',')
works ok, but
select *
from MySplit('Test1 & Test4,Test2,Test3', ',')
does not. It fails with
XML parsing: line 1, character 17, illegal name character.
What have I done wrong?
UPDATE
Firstly, thanks for #marcs, for showing me the error of my ways in writing this question.
Secondly, Thanks to all of the help below, especially #PanagiotisKanavos and #MatBailie
As this is throw away code for migrating data from old to new system, I have chosen to use #MatBailie solution, quick and very dirty, but also perfect for this task.
In the future, though, I will be progressing down #PanagiotisKanavos solution.

Edit your function and replace all & as &
This will remove the error. This happens because XML cannot parse & as it's an inbuilt tag.

Create FUNCTION [dbo].[split_stringss](
#delimited NVARCHAR(MAX),
#delimiter NVARCHAR(100)
) RETURNS #t TABLE (id INT IDENTITY(1,1), val NVARCHAR(MAX))
AS
BEGIN
DECLARE #xml XML
DECLARE #var NVARCHAR(MAX)
DECLARE #var1 NVARCHAR(MAX)
set #var1 = Replace(#delimited,'&','&')
SET #xml = N'<t>' + REPLACE(#var1,#delimiter,'</t><t>') + '</t>'
INSERT INTO #t(val)
SELECT r.value('.','varchar(MAX)') as item
FROM #xml.nodes('/t') as records(r)
RETURN
END

First of all, SQL Server 2016 introduced a STRING_SPLIT TVF. You can write CROSS APPLY STRING_SPLIT(thatField,',') as items
In previous versions you still need to create a custom splitting function. There are various techniques. The fastest solution is to use a SQLCLR function.
In some cases, the second fastest is what you used -
convert the text to XML and select the nodes. A well known problem with this splitting technique is that illegal XML characters will break it, as you found out. That's why Aaron Bertrand doesn't consider this a generic splitter.
You can replace invalid characters by their encoded values, eg & with & but you have to be certain that your text will never contain such encodings.
Perhaps you should investigate different techniques, like the Moden function, which can be faster in many situations :
CREATE FUNCTION dbo.SplitStrings_Moden
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
E2(N) AS (SELECT 1 FROM E1 a, E1 b),
E4(N) AS (SELECT 1 FROM E2 a, E2 b),
E42(N) AS (SELECT 1 FROM E4 a, E2 b),
cteTally(N) AS (SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(#List,1)))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E42),
cteStart(N1) AS (SELECT t.N+1 FROM cteTally t
WHERE (SUBSTRING(#List,t.N,1) = #Delimiter OR t.N = 0))
SELECT Item = SUBSTRING(#List, s.N1, ISNULL(NULLIF(CHARINDEX(#Delimiter,#List,s.N1),0)-s.N1,8000))
FROM cteStart s;
Personally I created and use a SQLCLR UDF.
Another option is to avoid splitting altogether and pass table-valued parameters from the client to the server. Or use a microORM like Dapper that can construct an IN (...) clause from a list of values, eg:
var products=connection.Query<Product>("select * from products where id in #ids",new {ids=myIdArray});
An ORM like EF that supports LINQ can also generate an IN clause :
var products = from product in dbContext.Products
where myIdArray.Contains(product.Id)
select product;

Related

Split string to array using delimiter, getting second to last element in SELECT Statement

Heads!
In my database, I have a column that contains the following data (examples):
H-01-01-02-01
BLE-01-03-01
H-02-05-1.1-03
The task is to get the second to last element of the array if you would split that using the "-" character. The strings are of different length.
So this would be the result using the above mentioned data:
02
03
1.1
Basically I'm searching for an equivalent of the following ruby-statement for use in a Select-Statement in SQL-Server:
"BLE-01-03-01".split("-")[-2]
Is this possible in any way in SQL Server? After spending some time searching for a solution, I only found ones that work for the last or first element.
Thanks very much for any clues or solutions!
PS: Version of SQL Server is Microsoft SQL Server 2012
As an alternative you can try this:.
--A mockup table with some test data to simulate your issue
DECLARE #mockupTable TABLE (ID INT IDENTITY, YourColumn VARCHAR(50));
INSERT INTO #mockupTable VALUES
('H-01-01-02-01')
,('BLE-01-03-01')
,('H-02-05-1.1-03');
--The query
SELECT CastedToXml.value('/x[sql:column("CountOfFragments")-1][1]','nvarchar(10)') AS TheWantedFragment
FROM #mockupTable t
CROSS APPLY(SELECT CAST('<x>' + REPLACE(t.YourColumn,'-','</x><x>') + '</x>' AS XML))A(CastedToXml)
CROSS APPLY(SELECT CastedToXml.value('count(/x)','int')) B(CountOfFragments);
The idea in short:
The first APPLY will transform the string to a XML like this
<x>H</x>
<x>01</x>
<x>01</x>
<x>02</x>
<x>01</x>
The second APPLY will xquery into this XML to get the count of fragments. As APPLY will add this as a column to the result set, we can use the value using sql:column() to get the wanted fragment by its position.
As I wrote in my comment - using charindex with reverse.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
Col Varchar(100)
);
INSERT INTO #T (Col) VALUES
('H-01-01-02-01'),
('BLE-01-03-01'),
('H-02-05-1.1-03');
The query:
SELECT Col,
LEFT(RIGHT(Col, AlmostLastDelimiter-1), AlmostLastDelimiter - LastDelimiter - 1) As SecondToLast
FROM #T
CROSS APPLY (SELECT CharIndex('-', Reverse(Col)) As LastDelimiter) As A
CROSS APPLY (SELECT CharIndex('-', Reverse(Col), LastDelimiter+1) As AlmostLastDelimiter) As B
Results:
Col SecondToLast
H-01-01-02-01 02
BLE-01-03-01 03
H-02-05-1.1-03 1.1
Similar to Zohar's solution, but using CTEs instead of CROSS APPLY to prevent redundancy. I personally find this easier to follow, as you can see what happens in each step. Doesn't make it a better solution though ;)
DECLARE #strings TABLE (data VARCHAR(50));
INSERT INTO #strings VALUES ('H-01-01-02-01') , ('BLE-01-03-01'), ('H-02-05-1.1-03');
WITH rev AS (
SELECT
data,
REVERSE(data) AS reversed
FROM
#strings),
first_hyphen AS (
SELECT
data,
reversed,
CHARINDEX('-', reversed) + 1 AS first_pos
FROM
rev),
second_hyphen AS (
SELECT
data,
reversed,
first_pos,
CHARINDEX('-', reversed, first_pos) AS second_pos
FROM
first_hyphen)
SELECT
data,
REVERSE(SUBSTRING(reversed, first_pos, second_pos - first_pos)) AS result
FROM
second_hyphen;
Results:
data result
H-01-01-02-01 02
BLE-01-03-01 03
H-02-05-1.1-03 1.1
Try this
declare #input NVARCHAR(100)
declare #dlmt NVARCHAR(3);
declare #pos INT = 2
SET #input=REVERSE(N'H-02-05-1.1-03');
SET #dlmt=N'-';
SELECT
CAST(N'<x>'
+ REPLACE(
(SELECT REPLACE(#input,#dlmt,'#DLMT#') AS [*] FOR XML PATH(''))
,N'#DLMT#',N'</x><x>'
) + N'</x>' AS XML).value('/x[sql:variable("#pos")][1]','nvarchar(max)');

Scalar function with WHILE loop to inline function

Hi I have a view which is used in lots of search queries in my application.
The issue is application queries which use this view is is running very slow.I am investigating this and i found out a particular portion on the view definition which is making it slow.
create view Demoview AS
Select
p.Id as Id,
----------,
STUFF((SELECT ',' + [dbo].[OnlyAlphaNum](colDesc)
FROM dbo.ContactInfoDetails cd
WHERE pp.FormId = f.Id AND ppc.PageId = pp.Id
FOR XML PATH('')), 1, 1, '') AS PhoneNumber,
p.FirstName as Fname,
From
---
This is one of the column in the view.
The scalar function [OnlyAlphaNum] is making it slow,as it stops parallel execution of the query.
The function is as below;
CREATE FUNCTION [dbo].[OnlyAlphaNum]
(
#String VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
WITH SCHEMABINDING
AS
BEGIN
WHILE PATINDEX('%[^A-Z0-9]%', #String) > 0
SET #String = STUFF(#String, PATINDEX('%[^A-Z0-9]%', #String), 1, '')
RETURN #String
END
How can i convert it into an inline function.?
I tried with CASE ,but not successful.I have read that CTE is a good option.
Any idea how to tackle this problem.?
I already did this; you can read more about it here.
The function:
CREATE FUNCTION dbo.alphaNumericOnly8K(#pString varchar(8000))
RETURNS TABLE WITH SCHEMABINDING AS RETURN
/****************************************************************************************
Purpose:
Given a varchar(8000) string or smaller, this function strips all but the alphanumeric
characters that exist in #pString.
Compatibility:
SQL Server 2008+, Azure SQL Database, Azure SQL Data Warehouse & Parallel Data Warehouse
Parameters:
#pString = varchar(8000); Input string to be cleaned
Returns:
AlphaNumericOnly - varchar(8000)
Syntax:
--===== Autonomous
SELECT ca.AlphaNumericOnly
FROM dbo.AlphaNumericOnly(#pString) ca;
--===== CROSS APPLY example
SELECT ca.AlphaNumericOnly
FROM dbo.SomeTable st
CROSS APPLY dbo.AlphaNumericOnly(st.SomeVarcharCol) ca;
Programmer's Notes:
1. Based on Jeff Moden/Eirikur Eiriksson's DigitsOnlyEE function. For more details see:
http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx#bm1629360
2. This is an iTVF (Inline Table Valued Function) that performs the same task as a
scalar user defined function (UDF) accept that it requires the APPLY table operator.
Note the usage examples below and see this article for more details:
http://www.sqlservercentral.com/articles/T-SQL/91724/
The function will be slightly more complicated to use than a scalar UDF but will yeild
much better performance. For example - unlike a scalar UDF, this function does not
restrict the query optimizer's ability generate a parallel query plan. Initial testing
showed that the function generally gets a
3. AlphaNumericOnly runs 2-4 times faster when using make_parallel() (provided that you
have two or more logical CPU's and MAXDOP is not set to 1 on your SQL Instance).
4. This is an iTVF (Inline Table Valued Function) that will be used as an iSF (Inline
Scalar Function) in that it returns a single value in the returned table and should
normally be used in the FROM clause as with any other iTVF.
5. CHECKSUM returns an INT and will return the exact number given if given an INT to
begin with. It's also faster than a CAST or CONVERT and is used as a performance
enhancer by changing the bigint of ROW_NUMBER() to a more appropriately sized INT.
6. Another performance enhancement is using a WHERE clause calculation to prevent
the relatively expensive XML PATH concatentation of empty strings normally
determined by a CASE statement in the XML "loop".
7. Note that AlphaNumericOnly returns an nvarchar(max) value. If you are returning small
numbers consider casting or converting yout values to a numeric data type if you are
inserting the return value into a new table or using it for joins or comparison
purposes.
8. AlphaNumericOnly is deterministic; for more about deterministic and nondeterministic
functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx
Usage Examples:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM samd.alphaNumericOnly8K('xxx123abc999!!!') ao;
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY samd.alphaNumericOnly8K(st.txt);
---------------------------------------------------------------------------------------
Revision History:
Rev 00 - 20150526 - Inital Creation - Alan Burstein
Rev 00 - 20150526 - 3rd line in WHERE clause to correct something that was missed
- Eirikur Eiriksson
Rev 01 - 20180624 - ADDED ORDER BY N; now performing CHECKSUM conversion to INT inside
the final cte (digitsonly) so that ORDER BY N does not get sorted.
****************************************************************************************/
WITH
E1(N) AS
(
SELECT N
FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))x(N)
),
iTally(N) AS
(
SELECT TOP (LEN(ISNULL(#pString,CHAR(32)))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E1 a CROSS JOIN E1 b CROSS JOIN E1 c CROSS JOIN E1 d
)
SELECT AlphaNumericOnly =
(
SELECT SUBSTRING(#pString,CHECKSUM(N),1)
FROM iTally
WHERE
((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 48) & 0x7FFF) < 10
OR ((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 65) & 0x7FFF) < 26
OR ((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 97) & 0x7FFF) < 26
ORDER BY N
FOR XML PATH('')
);
Note the examples in the code comments:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM samd.alphaNumericOnly8K('xxx123abc999!!!') ao;
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY samd.alphaNumericOnly8K(st.txt);
Returns:
AlphaNumericOnly
-------------------
xxx123abc999
txtID OldTxt AlphaNumericOnly
----------- ------------- -----------------
1 !!!A555A!!! A555A
2 NULL NULL
3 AAA.999 AAA999
It's the fastest of it's kind. It runs extra fast with a parallel execution plan. To force a parallel execution plan, grab a copy of make_parallel by Adam Machanic. Then you would run it like this:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM dbo.alphaNumericOnly8K('xxx123abc999!!!') ao
CROSS APPLY dbo.make_parallel();
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY dbo.alphaNumericOnly8K(st.txt)
CROSS APPLY dbo.make_parallel();
Surely there is scope to improve this. test it out.
;WITH CTE AS (
SELECT (CASE WHEN PATINDEX('%[^A-Z0-9]%', D.Name) > 0
THEN STUFF(D.Name, PATINDEX('%[^A-Z0-9]%', D.Name), 1, '')
ELSE D.NAME
END ) NameString
FROM #dept D
UNION ALL
SELECT STUFF(C.NameString, PATINDEX('%[^A-Z0-9]%', C.NameString), 1, '')
FROM CTE C
WHERE PATINDEX('%[^A-Z0-9]%', C.NameString) > 0
)
Select STUFF((SELECT ',' + E.NameString from CTE E
WHERE PATINDEX('%[^A-Z0-9]%', E.NameString) = 0
FOR XML PATH('')), 1, 1, '') AS NAME

With SQL Server, How can I query a table based on a delimited string as the criteria?

I have the following tables:
tbl_File:
FileID | Filename
-----------------
1 | test.jpg
and
tbl_Tag:
TagID | TagName
---------------
1 | Red
and
tbl_TagFile:
ID | TagID | FileID
-------------------
1 | 1 | 1
I need to pass a non-inclusive query against these tables. For example, imagine a list of checkboxes to select one or more tags, and then a search button. I need to pass the TagID's to the query as a PIPE delimited string, such as "1|2|5|"
The search results need to be non-inclusive, such as if it must meet all the criteria. If 3 tags are selected, the results are to be files that have all 3 tags associated with them.
I think I've made this too complicated, but tried iterating over the tags using charindex and stuff to work my way through the string, but it seems there must be an easier way.
I'd like to do this as a function... Such as
SELECT FileID, Filename
FROM tbl_Files
WHERE dbo.udf_FileExistswithTags(#Tags, FileID) = 1
Any efficient way to do this?
It doesn't sound from your example scenario that the actual "need" is to pass a pipe-delimited string. I would highly suggest abandoning that idea and using a Table Value Parameter in your stored procedure. This has numerous advantages in that you will not hit a datatype limit or a "number of parameters" limit that might occur with very large sets of criteria. Additionally it gets away from any need to run a (potentially very slow) UDF.
Split the string into tokens on the application side, and then insert each token as a row in the TVP. Example below:
Create the TVP type in your database:
CREATE TYPE [dbo].[FileNameType] AS TABLE
(
fileName varchar(1000)
)
On the application side, build your list of filename tokens into a recordset:
private static List<SqlDataRecord> BuildFileNameTokenRecords(IEnumerable<string> tokens)
{
var records = new List<SqlDataRecord>();
foreach (string token in tokens){
var record = new SqlDataRecord(
new SqlMetaData[]
{
new SqlMetaData("fileName", SqlDbType.Varchar),
}
);
records.Add(record);
}
return records;
}
Wherever you run your proc from (rough code here):
var records = BuildFileNameTokenRecords(listofstrings);
var sqlCmd = sqlDb.GetStoredProcCommand("FileExists");
sqlDb.AddInParameter(sqlCmd, "tvpFilenameTokens", SqlDbType.Structured, records);
ExecuteNonQuery(sqlCmd);
Filtering your select statement then simply becomes a matter of joining on the tokens in the table parameter. Something like this:
CREATE PROCEDURE dbo.FileExists
(
-- Put additional parameters here
#tvpFilenameTokens dbo.FileNameType READONLY,
)
AS
BEGIN
SELECT FileID, Filename
FROM tbl_Files INNER JOIN #tvpFilenameTokens
ON tbl_Files.FileID = #tvpFilenameTokens.fileName
END
Here is an option that should scale. All of the functionality is available back to SQL Server 2005. It uses a CTE to separate the portion of the query that finds only the FileIDs that have all of the TagIDs passed in, and then that list of FileIDs is joined to the [File] table to get the details. It also uses an INNER JOIN instead of an IN list to match the TagID's.
Please note that the example below uses a SQLCLR splitter that is freely available in the SQL# library (which I wrote, but this function is in the Free version). The specific splitter used is not the important part; it should just be one that is either SQLCLR, an inline tally-table (like the one used in #wewesthemenace's answer), or is the XML method. Just don't use a splitter based on a WHILE-loop or a recursive CTE.
---- TEST SETUP
DECLARE #File TABLE
(
FileID INT NOT NULL PRIMARY KEY,
[Filename] NVARCHAR(200) NOT NULL
);
DECLARE #TagFile TABLE
(
TagID INT NOT NULL,
FileID INT NOT NULL,
PRIMARY KEY (TagID, FileID)
);
INSERT INTO #File VALUES (1, 'File1.txt');
INSERT INTO #File VALUES (2, 'File2.txt');
INSERT INTO #File VALUES (3, 'File3.txt');
INSERT INTO #TagFile VALUES (1, 1);
INSERT INTO #TagFile VALUES (2, 1);
INSERT INTO #TagFile VALUES (5, 1);
INSERT INTO #TagFile VALUES (1, 2);
INSERT INTO #TagFile VALUES (2, 2);
INSERT INTO #TagFile VALUES (4, 2);
INSERT INTO #TagFile VALUES (1, 3);
INSERT INTO #TagFile VALUES (2, 3);
INSERT INTO #TagFile VALUES (5, 3);
INSERT INTO #TagFile VALUES (6, 3);
---- DONE WITH TEST SETUP
DECLARE #TagsToGet VARCHAR(100); -- this would be the proc input parameter
SET #TagsToGet = '1|2|5';
CREATE TABLE #Tags (TagID INT NOT NULL PRIMARY KEY);
DECLARE #NumTags INT;
INSERT INTO #Tags (TagID)
SELECT split.SplitVal
FROM SQL#.String_Split4k(#TagsToGet, '|', 1) split;
SET #NumTags = ##ROWCOUNT;
;WITH files AS
(
SELECT tf.FileID
FROM #TagFile tf
INNER JOIN #Tags tg
ON tg.TagID = tf.TagID
GROUP BY tf.FileID
HAVING COUNT(*) = #NumTags
)
SELECT fl.*
FROM #File fl
INNER JOIN files
ON files.FileID = fl.FileID
ORDER BY fl.[Filename] ASC;
DROP TABLE #Tags; -- don't need this if code above is placed in a proc
Results:
FileID Filename
1 File1.txt
3 File3.txt
Notes
As much as I love TVPs (and I do, when they are done correctly and used appropriately), I would say that they are a bit much for this type of small scale, single dimensional array scenario. There won't really be any performance gain over using a SQLCLR streaming TVF string splitter but it would require more app code and the additional User-Defined Table Type, which can't be updated without first dropping all procs that reference it. That doesn't happen all of the time, but needs to be considered in terms of long-term maintenance costs.
The JOIN between TagFile and the temporary table populated from the split operation should be much more efficient than using an IN list with a subquery for the split operation. An IN list is short-hand for all of the values in it to be their own OR conditions. Hence the JOIN is a fully set-based approach that lets the Query Optimizer do its thang.
The structure I used for the test #TagFile table only has the two relevant IDs in it: TagID and FileID. It does not have the ID field that I assume is an IDENTITY field on this table. Unless there is a very specific reason for needing that IDENTITY field, I would suggest removing it. It adds to inherent benefit as the combination of TagID and FileID is a natural key (i.e. it is both NOT NULL and Unique). And if the Clustered PK of this table were simply those two fields, the JOIN to the temp table of those split-out TagIDs would be quite fast, even with millions of rows in TagFile.
One reason that this approach works so much better than trying to handle this via a function per FileID (outside of the obvious set-based is better than cursor-based reason) is that the list of TagIDs is the same for all files to be checked. So splitting that out more than one time is a waste of effort.
By not splitting the TagID list inline in the query I am able to capture the number of elements in that list with no additional effort. Hence this saves from needing to do a secondary calculation.
Here is a function called DelimitedSplit8K by Jeff Moden. This is used to split strings of length up to 8000. For more info, read this: http://www.sqlservercentral.com/articles/Tally+Table/72993/
CREATE FUNCTION [dbo].[DelimitedSplit8K](
#pString VARCHAR(8000), --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
#pDelimiter CHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (--10E+1 or 10 rows
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString, t.N, 1) = #pDelimiter
),
cteLen(N1, L1) AS(--==== Return start and length (for use in substring)
SELECT
s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter, #pString, s.N1), 0) - s.N1, 8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
Your query would now be:
DECLARE #pString VARCHAR(8000) = '1|3|5'
SELECT
f.*
FROM tbl_File f
INNER JOIN tbl_TagFile tf ON tf.FileID = f.FileID
WHERE
tf.TagID IN(SELECT CAST(item AS INT) FROM dbo.DelimitedSplit8K(#pString, '|'))
GROUP BY f.FileID, f.FileName
HAVING COUNT(tf.ID) = (LEN(#pString) - LEN(REPLACE(#pString,'|','')) + 1)
The statement below counts the number of TagID in the parameter by counting the occurrence of the delimiter | + 1.
(LEN(#pString) - LEN(REPLACE(#pString,'|','')) + 1)
Here is an option that does not require UDF's.
It can be argued that this is also complicated.
DECLARE #TagList VARCHAR(50)
-- pass in this
SET #TagList = '1|3|6'
SELECT
FinalSet.FileID,
FinalSet.Tag,
FinalSet.TotalMatches
FROM
(
SELECT
tbl_TagFile.FileID,
tbl_TagFile.Tag,
COUNT(*) OVER(PARTITION BY tbl_TagFile.FileID) TotalMatches
FROM
(
SELECT 1 FileID, '1' Tag UNION ALL
SELECT 1 , '2' UNION ALL
SELECT 1 , '3' UNION ALL
SELECT 1 , '6' UNION ALL
SELECT 2 , '1' UNION ALL
SELECT 2 , '3'
) tbl_TagFile
INNER JOIN
(
SELECT tbl_Tag.Tag
FROM
(
SELECT '1' Tag UNION ALL
SELECT '2' UNION ALL
SELECT '3' UNION ALL
SELECT '4' UNION ALL
SELECT '5' UNION ALL
SELECT '6'
) tbl_Tag
WHERE '|' + #TagList + '|' LIKE '%|' + Tag + '|%'
) LimitedTagTable
ON LimitedTagTable.Tag = tbl_TagFile.Tag
) FinalSet
WHERE
FinalSet.TotalMatches = (LEN(#TagList) - LEN(REPLACE(#TagList,'|','')) + 1)
There's some complications in this around data types and indexes and stuff but you can see the concept - you are only getting the records that match your passed in string.
subtable LimitedTagTable is your tag list filtered by your input pipe delimited string
subtable FinalSet joins your limited tag list to your list of files
column TotalMatches works out how many tag matches your file had
Finally this line limits the output to those files that had enough matches:
FinalSet.TotalMatches = (LEN(#TagList) - LEN(REPLACE(#TagList,'|','')) + 1)
Please experiment with different inputs and datasets and see if it suits as I have made a number of assumptions.
I'm answering my own question, in hopes that someone can let me know if/how flawed it is. So far it seems to be working but just early testing.
Function:
ALTER FUNCTION [dbo].[udf_FileExistsByTags]
(
#FileID int
,#Tags nvarchar(max)
)
RETURNS bit
AS
BEGIN
DECLARE #Exists bit = 0
DECLARE #Count int = 0
DECLARE #TagTable TABLE ( FileID int, TagID int )
DECLARE #Tag int
WHILE len(#Tags) > 0
BEGIN
SET #Tag = CAST(LEFT(#Tags, charindex('|', #Tags + '|') -1) as int)
SET #Count = #Count + 1
IF EXISTS (SELECT * FROM tbl_FileTag WHERE FileID = #FileID AND TagID = #Tag )
BEGIN
INSERT INTO #TagTable ( FileID, TagID ) VALUES ( #FileID, #Tag )
END
SET #Tags = STUFF(#Tags, 1, charindex('|', #Tags + '|'), '')
END
SET #Exists = CASE WHEN #Count = (SELECT COUNT(*) FROM #TagTable) THEN 1 ELSE 0 END
RETURN #Exists
END
Then in the query:
SELECT * FROM tbl_File a WHERE dbo.udf_FileExistsByTags(a.FileID, #Tags) = 1
So now I'm looking for errors.
What do you think? Probably not every efficient, however this search will be used only on a periodic basis.

Execute multiple dynamic T-SQL statements and obtain a limited number of unique values while preserving order

I have a SourceTable and a table variable #TQueries containing various T-SQL predicates that target SourceTable.
The expected result is to dynamically generate SELECT statements that return a list of Id's as specified by the predicates in #TQueries. Each dynamically generated SELECT statement also needs to execute in a particular order, and the final set of values needs to be unique and the ordering must be preserved.
Fortunately, there's a limit to how many values need to be retrieved and how many dynamic queries need to be generated. The Id list should contain at most 10 Ids, and we don't expect more than 7 queries.
The following is a sample of this setup, not the actual data/database:
-- Set up some test data, this is quick and dirty just to provide some data to test against
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[SourceTable]') AND type in (N'U'))
BEGIN
-- Create a numbers table, sorta
SELECT TOP 20
IDENTITY(INT,1,1) AS Id,
ABS(CHECKSUM(NewId())) % 100 AS [SomeValue]
INTO [SourceTable]
FROM sysobjects a
END
DECLARE #TQueries TABLE (
[Ordinal] INT,
[WherePredicate] NVARCHAR(MAX),
[OrderByPredicate] NVARCHAR(MAX)
);
-- Simulate SELECTs with different order by that get different data due to varying WHERE clauses and ORDER conditions
INSERT INTO #TQueries VALUES ( 1, N'[Id] IN (6,11,13,7,10,3,15)', '[SomeValue] ASC' ) -- Sort Asc
INSERT INTO #TQueries VALUES ( 2, N'[Id] IN (9,15,14,20,17)', '[SomeValue] DESC' ) -- Sort Desc
INSERT INTO #TQueries VALUES ( 3, N'[Id] IN (20,10,1,16,11,19,9,15,17,6,2,3,13)', 'NEWID()' ) -- Sort Random
My main issue has been avoiding the use of a CURSOR or iterating through the rows one by one. The closest I've come to a set operation that meets this criteria is using a table variable to store the results of each query or a massive CTE.
Suggestions and comments are welcome.
Here's a solution that builds a single statement both to run all the queries and to return the results.
It uses a similar approach as in your answer when iterating over the #TQueries table, i.e. it also uses {...} tokens where column values from #TQuery should go, and it puts the values there with nested REPLACE() calls.
Other than that, it heavily depends on ranking functions, and I'm not sure if doesn't really abuse them. You'd need to test this method before deciding if it's better or worse than the one you've got so far.
DECLARE #QueryTemplate nvarchar(max), #FinalSQL nvarchar(max);
SET #QueryTemplate =
N'SELECT
[Id],
QueryRank = {Ordinal},
RowRank = ROW_NUMBER() OVER (ORDER BY {OrderByPredicate})
FROM [dbo].[SourceTable]
WHERE {WherePredicate}
';
SET #FinalSQL =
N'WITH AllData AS (
' +
SUBSTRING(
(
SELECT
'UNION ALL ' +
REPLACE(REPLACE(REPLACE(#QueryTemplate,
'{Ordinal}' , [Ordinal] ),
'{OrderByPredicate}', [OrderByPredicate]),
'{WherePredicate}' , [WherePredicate] )
FROM #TQueries
ORDER BY [Ordinal]
FOR XML PATH (''), TYPE
).value('.', 'nvarchar(max)'),
11, -- starting just after the first 'UNION ALL '
CAST(0x7FFFFFFF AS int) -- max int; no need to specify the exact length
) +
'),
RankedData AS (
SELECT
[Id],
QueryRank,
RowRank,
ValueRank = ROW_NUMBER() OVER (PARTITION BY [Id] ORDER BY QueryRank)
FROM AllData
)SELECT TOP (#top)
[Id]
FROM RankedData
WHERE ValueRank = 1
ORDER BY
QueryRank,
RowRank
';
PRINT #FinalSQL;
EXECUTE sp_executesql #FinalSQL, N'#top int', 10;
Basically, every subquery gets these auxiliary columns:
QueryRank – a constant value (within the subquery's result set) derived from [Ordinal];
RowRank – a ranking assigned to a row based on the [OrderByPredicate].
The result sets are UNIONed and then every entry of every unique value is again ranked (ValueRank) based on the query ranking.
When pulling the final result set, duplicates are suppressed (by the condition ValueRank = 1), and QueryRank and RowRank are used in the ORDER BY clause to preserve the original row order.
I used EXECUTE sp_executesql #query instead of EXECUTE (#query), because the former allows you to add parameters to the query. In particular, I parametrised the number of results to return (the argument of TOP). But you could certainly concatenate that value into the dynamic script directly, just like other things, if you prefer EXECUTE () over EXECUTE sq_executesql.
If you like, you can try this query at SQL Fiddle. (Note: the SQL Fiddle version replaces the #TQueries table variable with the TQueries table.)
This is what I've managed to piece together cobbled from my original response and improved upon by comments from #AndriyM
DECLARE #sql_prefix NVARCHAR(MAX);
SET #sql_prefix =
N'DECLARE #TResults TABLE (
[Ordinal] INT IDENTITY(1,1),
[ContentItemId] INT
);
DECLARE #max INT, #top INT;
SELECT #max = 10;';
DECLARE #sql_insert_template NVARCHAR(MAX), #sql_body NVARCHAR(MAX);
SET #sql_insert_template =
N'SELECT #top = #max - COUNT(*) FROM #TResults;
INSERT INTO #TResults
SELECT TOP (#top) [Id]
FROM [dbo].[SourceTable]
WHERE
{WherePredicate}
AND NOT EXISTS (
SELECT 1
FROM #TResults AS [tr]
WHERE [tr].[ContentItemId] = [SourceTable].[Id]
)
ORDER BY {OrderByPredicate};';
WITH Query ([Ordinal],[SqlCommand]) AS (
SELECT
[Ordinal],
REPLACE(REPLACE(#sql_insert_template, '{WherePredicate}', [WherePredicate]), '{OrderByPredicate}', [OrderByPredicate])
FROM #TQueries
)
SELECT
#sql_body = #sql_prefix + (
SELECT [SqlCommand]
FROM Query
ORDER BY [Ordinal] ASC
FOR XML PATH(''),TYPE).value('.', 'varchar(max)') + CHAR(13)+CHAR(10)
+N' SELECT * FROM #TResults ORDER BY [Ordinal]';
EXEC(#sql_body);
The basic idea is to use a table variable to hold the results of each query. I create a template for the SQL and replace the values in the template based on what is stored in #TQueries.
Once the entire script is completed I run it with EXEC.

SQL Server split CSV into multiple rows

I realize this question has been asked before, but I can't get it to work for some reason.
I'm using the split function from this SQL Team thread (second post) and the following queries.
--This query converts the interests field from text to varchar
select
cp.id
,cast(cp.interests as varchar(100)) as interests
into #client_profile_temp
from
client_profile cp
--This query is supposed to split the csv ("Golf","food") into multiple rows
select
cpt.id
,split.data
from
#client_profile_temp cpt
cross apply dbo.split(
cpt.interests, ',') as split <--Error is on this line
However I'm getting an
Incorrect syntax near '.'
error where I've marked above.
In the end, I want
ID INTERESTS
000CT00002UA "Golf","food"
to be
ID INTERESTS
000CT00002UA "Golf"
000CT00002UA "food"
I'm using SQL Server 2008 and basing my answer on this StackOverflow question. I'm fairly new to SQL so any other words of wisdom would be appreciated as well.
TABLE
x-----------------x--------------------x
| ID | INTERESTS |
x-----------------x--------------------x
| 000CT00002UA | Golf,food |
| 000CT12303CB | Cricket,Bat |
x------x----------x--------------------x
METHOD 1 : Using XML format
SELECT ID,Split.a.value('.', 'VARCHAR(100)') 'INTERESTS'
FROM
(
-- To change ',' to any other delimeter, just change ',' before '</M><M>' to your desired one
SELECT ID, CAST ('<M>' + REPLACE(INTERESTS, ',', '</M><M>') + '</M>' AS XML) AS Data
FROM TEMP
) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)
SQL FIDDLE
METHOD 2 : Using function dbo.Split
SELECT a.ID, b.items
FROM #TEMP a
CROSS APPLY dbo.Split(a.INTERESTS, ',') b
SQL FIDDLE
And dbo.Split function is here.
CREATE FUNCTION [dbo].[Split](#String varchar(8000), #Delimiter char(1))
returns #temptable TABLE (items varchar(8000))
as
begin
declare #idx int
declare #slice varchar(8000)
select #idx = 1
if len(#String)<1 or #String is null return
while #idx!= 0
begin
set #idx = charindex(#Delimiter,#String)
if #idx!=0
set #slice = left(#String,#idx - 1)
else
set #slice = #String
if(len(#slice)>0)
insert into #temptable(Items) values(#slice)
set #String = right(#String,len(#String) - #idx)
if len(#String) = 0 break
end
return
end
FINAL RESULT
from
#client_profile_temp cpt
cross apply dbo.split(
#client_profile_temp.interests, ',') as split <--Error is on this line
I think the explicit naming of #client_profile_temp after you gave it an alias is a problem, try making that last line:
cpt.interests, ',') as split <--Error is on this line
EDIT You say
I made this change and it didn't change anything
Try pasting the code below (into a new SSMS window)
create table #client_profile_temp
(id int,
interests varchar(500))
insert into #client_profile_temp
values
(5, 'Vodka,Potassium,Trigo'),
(6, 'Mazda,Boeing,Alcoa')
select
cpt.id
,split.data
from
#client_profile_temp cpt
cross apply dbo.split(cpt.interests, ',') as split
See if it works as you expect; I'm using sql server 2008 and that works for me to get the kind of results I think you want.
Any chance when you say "I made the change", you just changed a stored procedure but haven't run it, or changed a script that creates a stored procedure, and haven't run that, something along those lines? As I say, it seems to work for me.
As this is old, it seems the following works in SQL Azure (as of 3/2022)
The big changes being split.value instead of .data or .items as shown above; no as after the function, and lastly string_split is the method.
select Id, split.value
from #reportTmp03 rpt
cross apply string_split(SelectedProductIds, ',') split
Try this:
--This query is supposed to split the csv ("Golf","food") into multiple rows
select
cpt.id
,split.data
from
#client_profile_temp cpt
cross apply dbo.split(cpt.interests, ',') as split <--Error is on this line
You must use table alias instead of table name as soon as you define it.

Resources