remove duplicates from comma or pipeline operator string - sql-server

I have been looking into this for a while now and I cannot find a way to remove duplicate strings from a comma-separated as well as pipeline seperated string in SQL Server.
Given the string
test1,test2,test1|test2,test3|test4,test4|test4
does anyone know how would you return test1,test2,test3,test4?

Approach
The following approach can be used to de-duplicate a delimited list of values.
Use the REPLACE() function to convert different delimiters into the same delimiter.
Use the REPLACE() function to inject XML closing and opening tags to create an XML fragment
Use the CAST(expr AS XML) function to convert the above fragment into the XML data type
Use OUTER APPLY to apply the table-valued function nodes() to split the XML fragment into its constituent XML tags. This returns each XML tag on a separate row.
Extract just the value from the XML tag using the value() function, and returns the value using the specified data type.
Append a comma after the above-mentioned value.
Note that these values are returned on separate rows. The usage of the DISTINCT keyword now removes duplicate rows (i.e. values).
Use the FOR XML PATH('') clause to concatenate the values across multiple rows into a single row.
Query
Putting the above approach in query form:
SELECT DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)') + ','
FROM (
-- This query returns the following in theDataXml column:
-- <tag>test1</tag><tag>test2</tag><tag>test1</tag><tag>test2</tag><tag>test3</tag><tag>test4</tag><tag>test4</tag><tag>test4</tag>
-- i.e. it has turned the original delimited data into an XML fragment
SELECT
DataTable.DataColumn AS DataRaw
, CAST(
'<tag>'
-- First replace commas with pipes to have only a single delimiter
-- Then replace the pipe delimiters with a closing and opening tag
+ replace(replace(DataTable.DataColumn, ',','|'), '|','</tag><tag>')
-- Add a final set of closing tags
+ '</tag>'
AS XML) AS DataXml
FROM ( SELECT 'test1,test2,test1|test2,test3|test4,test4|test4' AS DataColumn) AS DataTable
) AS x
OUTER APPLY DataXml.nodes('tag') AS PivotedTable(PivotedColumn)
-- Running the query without the following line will return the data in separate rows
-- Running the query with the following line returns the rows concatenated, i.e. it returns:
-- test1,test2,test3,test4,
FOR XML PATH('')
Input & Result
Given the input:
test1,test2,test1|test2,test3|test4,test4|test4
The above query will return the result:
test1,test2,test3,test4,
Notice the trailing comma at the end. I'll leave it as an exercise to you to remove that.
EDIT: Count of Duplicates
OP requested in a comment "how do i get t5he count of duplicates as well? in a seperate column".
The simplest way would be to use the above query but remove the last line FOR XML PATH(''). Then, counting all values and distinct values returned by the SELECT expression in the above query (i.e. PivotedTable.PivotedColumn.value('.','nvarchar(max)')). The difference between the count of all values and the count of distinct values is the count of duplicate values.
SELECT
COUNT(PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfAllValues
, COUNT(DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfUniqueValues
-- The difference of the previous two counts is the number of duplicate values
, COUNT(PivotedTable.PivotedColumn.value('.','nvarchar(max)'))
- COUNT(DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfDuplicateValues
FROM (
-- This query returns the following in theDataXml column:
-- <tag>test1</tag><tag>test2</tag><tag>test1</tag><tag>test2</tag><tag>test3</tag><tag>test4</tag><tag>test4</tag><tag>test4</tag>
-- i.e. it has turned the original delimited data into an XML fragment
SELECT
DataTable.DataColumn AS DataRaw
, CAST(
'<tag>'
-- First replace commas with pipes to have only a single delimiter
-- Then replace the pipe delimiters with a closing and opening tag
+ replace(replace(DataTable.DataColumn, ',','|'), '|','</tag><tag>')
-- Add a final set of closing tags
+ '</tag>'
AS XML) AS DataXml
FROM ( SELECT 'test1,test2,test1|test2,test3|test4,test4|test4' AS DataColumn) AS DataTable
) AS x
OUTER APPLY DataXml.nodes('tag') AS PivotedTable(PivotedColumn)
For the same input shown above, the output of this query is:
CountOfAllValues CountOfUniqueValues CountOfDuplicateValues
---------------- ------------------- ----------------------
8 4 4

Solution to your problem is as given below :
DECLARE #Data_String AS VARCHAR(1000), #Result as varchar(1000)=''
SET #Data_String = 'test1,test2,test1|test2,test3|test4,test4|test4'
SET #Data_String = REPLACE(#Data_String,'|',',')
SELECT #Result=#Result+col+',' from(
SELECT DISTINCT t.c.value('.','varchar(100)') col from(
SELECT cast('<A>'+replace(#Data_String,',','</A><A>')+'</A>' as xml)col1)data cross apply col1.nodes('/A') as t(c))Data
SELECT LEFT(#Result,LEN(#Result)-1)
Result
test1,test2,test3,test4

DECLARE #string AS VARCHAR(1000)
SET #string = 'test1,test2,test1|test2,test3|test4,test4|test4'
SET #string = REPLACE(#string,'|',',')
DECLARE #t TABLE (val VARCHAR(MAX))
DECLARE #xml XML
SET #xml = N'<root><r>' + REPLACE(#string, ',', '</r><r>') + '</r></root>'
INSERT INTO #t(val) SELECT r.value('.','VARCHAR(MAX)') as Item FROM #xml.nodes('//root/r') AS RECORDS(r)
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY val ORDER BY val desc) RN
FROM #t)
DELETE FROM cte
WHERE RN > 1

Try Following SQL Script :
declare #List nvarchar(max)='test1,test2,test1|test2,test3|test4,test4|test4';
declare #Delimiter CHAR(1) =','
declare #XML AS XML
declare #result varchar(max)
set #List=Replace(#List,'|',',')
--Select #List
SET #XML = CAST(('<X>'+REPLACE(#List,#Delimiter ,'</X><X>')+'</X>') AS XML)
DECLARE #temp TABLE (Data nvarchar(100))
INSERT INTO #temp
SELECT N.value('.', 'nvarchar(100)') AS Data FROM #XML.nodes('X') AS T(N)
--SELECT distinct * FROM #temp
IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp
Select distinct Data into #temp from #temp
SET #result = ''
select #result = #result + Data + ', ' from #temp
select SUBSTRING(#result, 0, LEN(#result))

I just tried following script working perfectly :
declare #List VARCHAR(MAX)='test1,test2,test1|test2,test3|test4,test4|test4'
declare #Delim CHAR=','
DECLARE #ParsedList TABLE
(
Item VARCHAR(MAX)
)
DECLARE #list1 VARCHAR(MAX), #Pos INT, #rList VARCHAR(MAX)
set #List=Replace(#List,'|',',')
SET #list = LTRIM(RTRIM(#list)) + #Delim
SET #pos = CHARINDEX(#delim, #list, 1)
WHILE #pos > 0
BEGIN
SET #list1 = LTRIM(RTRIM(LEFT(#list, #pos - 1)))
IF #list1 <> ''
INSERT INTO #ParsedList VALUES (CAST(#list1 AS VARCHAR(MAX)))
SET #list = SUBSTRING(#list, #pos+1, LEN(#list))
SET #pos = CHARINDEX(#delim, #list, 1)
END
SELECT #rlist = COALESCE(#rlist+',','') + item
FROM (SELECT DISTINCT Item FROM #ParsedList) t
Select #rlist

Related

SQL Server Conversion failed varchar to int

I have a table (no.1) which has 10 columns. One of them clm01 is integer and not allowed with null values.
There is a second table (no.2) which has many columns. One of them is string type clm02. An example of this column data is 1,2,3.
I'd like to make a query like:
select *
from table1 t1, table2 t2
where t1.clm01 not in (t2.clm2)
For example in table1 I have 5 records with values in clm01 1,2,3,4,5 and in table2 I've got 1 record with value in clm02 = 1,2,3
So I would like with the query to return only the record with the value 4 and 5 in the clm01.
Instead I get:
Conversion failed when converting the varchar value '1,2,3' to data type int
Any ideas?
Use STRING_SPLIT() function to split the comma separated values, if you are using SQL Server 2016.
SELECT *
FROM table1 t1
WHERE t1.clm1 NOT IN (SELECT Value FROM table2 t2
CROSS APPLY STRING_SPLIT(t2.clm2,','))
If you are using any lower versions of SQL server write a UDF to split string and use the function in CROSS APPLY clause.
CREATE FUNCTION [dbo].[SplitString]
(
#string NVARCHAR(MAX),
#delimiter CHAR(1)
)
RETURNS #output TABLE(Value NVARCHAR(MAX)
)
BEGIN
DECLARE #start INT, #end INT
SELECT #start = 1, #end = CHARINDEX(#delimiter, #string)
WHILE #start < LEN(#string) + 1 BEGIN
IF #end = 0
SET #end = LEN(#string) + 1
INSERT INTO #output (Value)
VALUES(SUBSTRING(#string, #start, #end - #start))
SET #start = #end + 1
SET #end = CHARINDEX(#delimiter, #string, #start)
END
RETURN
END
I decided to give you a couple of options but this really is a duplicate question I see pretty often.
There are two main ways of going about the problem.
1) Use LIKE to and compare the strings but you actually have to build strings a little oddly to do it:
SELECT *
FROM
#Table1 t1
WHERE
NOT EXISTS (SELECT *
FROM #Table2 t2
WHERE ',' + t2.clm02 + ',' LIKE '%,' + CAST(t1.clm01 AS VARCHAR(15)) + ',%')
What you see is ,1,2,3, is like %,clm01value,% you must add the delimiter to the strings for this to work properly and you have to cast/convert clm01 to a char datatype. There are drawbacks to this solution but if your data sets are straight forward it could work for you.
2) Split the comma delimited string to rows and then use a left join, not exists, or not in. here is a method to convert your csv to xml and then split
;WITH cteClm02Split AS (
SELECT
clm02
FROM
(SELECT
CAST('<X>' + REPLACE(clm02,',','</X><X>') + '</X>' AS XML) as xclm02
FROM
#Table2) t
CROSS APPLY (SELECT t.n.value('.','INT') clm02
FROM
t.xclm02.nodes('X') as t(n)) ca
)
SELECT t1.*
FROM
#Table1 t1
LEFT JOIN cteClm02Split t2
ON t1.clm01 = t2.clm02
WHERE
t2.clm02 IS NULL
OR use NOT EXISTS with same cte
SELECT t1.*
FROM
#Table1 t1
WHERE
NOT EXISTS (SELECT * FROM cteClm02Split t2 WHERE t1.clm01 = t2.clm02)
There are dozens of other ways to split delimited strings and you can choose whatever way works for you.
Note: I am not showing IN/NOT IN as an answer because I don't recommend the use of it. If you do use it make sure that you are never comparing a NULL in the select etc. Here is another good post concerning performance etc. NOT IN vs NOT EXISTS
here are the table variables that were used:
DECLARE #Table1 AS TABLE (clm01 INT)
DECLARE #Table2 AS TABLE (clm02 VARCHAR(15))
INSERT INTO #Table1 VALUES (1),(2),(3),(4),(5)
INSERT INTO #Table2 VALUES ('1,2,3')

How to use regular expression to remove number in MS SQL Server Management Studio?

I have a field in a table containing different IDs for different programmes like this:
ProgrammeID
-----------
Prog201604L
Prog201503L
Pro2015N
Pro2014N
Programme2010
Programme2011
Each programme ID has its meaning. The number in the mid of the string indicates the time or month. It is obvious that Prog201604L and Prog201503L indicate the same programme but in different years (so do the rest). What I want to do is to remove the numbers so after removal the programmeID will be like:
ProgrammeID
-----------
ProgL
ProgL
ProN
ProN
Programme
Programme
Then later I can aggregate this programmes together.
I am currently using SSMS 2012 not sure if there is a sql statement like RegEx. I have been searching for a long time but the solution online are mainly about Oracle and MySQL. What I found is PATINDEX() and it seems to support regular expression. Can anybody tell me how to create a pattern that suits my situation and what kind of statement I should use?
Thanks in advance
If the Number part is always 6 characters below can be used.
DECLARE #ProgrammeID VARCHAR(50) = 'Prog201604L'
SELECT STUFF(#ProgrammeID, PATINDEX( '%[0-9]%', #ProgrammeID), 6, '')
If the numbers are not fixed... to extend above
CREATE TABLE #Programme ( ProgrammeID VARCHAR(50) )
INSERT INTO #Programme
VALUES
('Prog201604L')
,('Pro2015N')
,('Programme2010')
,('Prog2016L')
,('Pro2N')
,('Prog')
,('2010')
SELECT ProgrammeID,
ISNULL(
STUFF(ProgrammeID,
PATINDEX( '%[0-9]%', ProgrammeID), -- get number start index
IIF(PATINDEX( '%[0-9][a-z]%',ProgrammeID)= 0, PATINDEX( '%[0-9]',ProgrammeID), PATINDEX( '%[0-9][a-z]%',ProgrammeID)) + 1 -- get the last number index
- PATINDEX( '%[0-9]%', ProgrammeID), -- get the number character length
'')
,ProgrammeID) -- Where there are no numbers in the string you will get Null, replace it with actual string
AS [Without Numbers]
FROM #Programme
this will handle cases with varying numbers and even string without number.
Hope this helps
You can create a function and pass the value of each row to function
as (just run this query)
Create Function [dbo].[RemoveNonAlphaCharacters](#Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin
Declare #KeepValues as varchar(50)
Set #KeepValues = '%[^a-z]%'
While PatIndex(#KeepValues, #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex(#KeepValues, #Temp), 1, '')
Return #Temp
End
---Call it like this:
Declare #tbl table (ProgrammeID varchar(20))
insert into #tbl values ('ProgL'),('ProgL'),('ProN'),('ProN'),('Programme'),('Programme')
select * from #tbl
Select dbo.RemoveNonAlphaCharacters(ProgrammeID) from #tbl
How to strip all non-alphabetic characters from string in SQL Server?
Remove numbers from string sql server
One clever option is to take the substring of the ProgrammeID column from the left, until hitting the first number, and concatenate that with the reverse of the substring from the right until hitting the first number:
SELECT
SUBSTRING(ProgrammeID,
1,
PATINDEX('%[0-9]%', ProgrammeID) - 1) +
REVERSE(SUBSTRING(REVERSE(ProgrammeID),
1,
PATINDEX('%[0-9]%', REVERSE(ProgrammeID)) - 1))
FROM yourTable
I have created a user-defined function for SQL Server to remove non-numeric characters in a string expression
We can modify it to remove the opposite, numeric characters from the input string as follows
while patindex('%[0-9]%', #str) > 0
set #str = stuff(#str, patindex('%[0-9]%', #str), 1, '')
return #str
I hope it helps
Alan Burstein wrote an iTVF exactly for this. The function is called PatExclude8K. Here is the function definition (some comments removed):
CREATE FUNCTION dbo.PatExclude8K
(
#String VARCHAR(8000),
#Pattern VARCHAR(50)
)
/*******************************************************************************
Purpose:
Given a string (#String) and a pattern (#Pattern) of characters to remove,
remove the patterned characters from the string.
*******************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH
E1(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N)),
itally(N) AS
(
SELECT TOP(CONVERT(INT,LEN(#String),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E1 T1 CROSS JOIN E1 T2 CROSS JOIN E1 T3 CROSS JOIN E1 T4
)
SELECT NewString =
((
SELECT SUBSTRING(#String,N,1)
FROM iTally
WHERE 0 = PATINDEX(#Pattern,SUBSTRING(#String COLLATE Latin1_General_BIN,N,1))
FOR XML PATH(''),TYPE
).value('.[1]','varchar(8000)'));
GO
And here is how you would use it:
SELECT *
FROM #Programme p
CROSS APPLY dbo.PatExclude8K(p.ProgrammeID, '[0-9]');
Using your sample data, here is the result:
ProgrammeID NewString
-------------------- -----------------
Prog201604L ProgL
Prog201503L ProgL
Pro2015N ProN
Pro2014N ProN
Programme2010 Programme
Programme2011 Programme
I created this solution building on a solution to extracting values from a comma separated list inside a string.
It seems to work find and even be a bit more effective than using while - I will be happy for feedback about that assumption, though.
On on table with 461.358 rows it takes 3 minutes and 27 seconds to do this (0.44 ms per row) (I put it into a function).
select count(*)
from Mytable
where dbo.StripNumeric(inputFromUser) is null
Here's the solutions
For stripping away numeric:
declare #input nvarchar(max) = null
select #input = '1a2 3b4' + char(13) + char(10) + '5(678)*90c'
DECLARE #output nvarchar(max) = '';
WITH cte AS
(
SELECT cast(1 as int) as [index]
UNION ALL
SELECT [index]+ 1 as [index]
from cte
where [index] < len(#input)
)
select #output = iif(PATINDEX('%[0-9]%', substring(#input, [index], 1))= 1, #output, #output + substring(#input, [index], 1))
from cte;
select iif(COALESCE( #output, '') = '', null, ltrim(rtrim(#output)))
For stripping away non-numeric:
declare #input nvarchar(max) = null
select #input = '1a2 3b4' + char(13) + char(10) + '5(678)*90c'
DECLARE #output nvarchar(max) = '';
WITH cte AS
(
SELECT cast(1 as int) as [index]
UNION ALL
SELECT [index]+ 1 as [index]
from cte
where [index] < len(#input) --len(substring(#input, index, 1)) >
)
select #output = iif(PATINDEX('%[0-9]%', substring(#input, [index], 1))= 1, #output + substring(#input, [index], 1), #output)
from cte;
select iif(COALESCE( #output, '') = '', null, ltrim(rtrim(#output)))

Convert comma delimited ID list to Comma delimited prefix List SQL Server

I'm trying to convert a comma delimited ID list to an comma delimited Prefix list.
BKR394859607,MTP293840284,SPN489620586
My goal is to convert these 3 Ids in my IDString in SQL Server to
BKR,MTP,SPN
This string can have an infinite number of Ids
Declare #ActivityID as varchar(MAX) = 'BKR394859607,MTP293840284,SPN489620586'
Declare #ActivityPrefixes as varchar(MAX)
Set #ActivityPrefixes = (function to Convert to comma delimited prefix list)
As mentioned in comment by Sean Lange, you need to perform three steps
Split the string into individual rows
Strip the required characters
Concatenate the rows back to CSV.
Something like this
DECLARE #ActivityID AS VARCHAR(max) =
'BKR394859607,MTP293840284,SPN489620586,GHY489620586'
SELECT Stuff(split_val, 1, 1, '') as Result
FROM (SELECT ','
+ LEFT(split.a.value('.', 'VARCHAR(100)'), 3)
FROM (SELECT Cast ('<M>' + Replace(#ActivityID, ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS apply data.nodes ('/M') AS Split(a)
FOR xml path(''))a(split_val)
Result : BKR,MTP,SPN,GHY
Use the following answer to implement regeular expression functions in T-SQL. Then you can use the following:
First and foremost, read the following articles, they discuss splitting strings (and reasons not to do it), and compare performance of methods to do it if you can't avoid it.
Split strings the right way – or the next best way
Splitting Strings : A Follow-Up
Splitting Strings : Now with less T-SQL
The upshot of these three articles is (in case the links ever become dead):
Avoid delimited lists as strings where possible, if you need to store a list, a table is much better alternative.
If you have to do it, CLR is the most scaleable (and accurate method).
If you can be certain there are no special XML characters to split, then converting the delimited string to XML then using XQuery to get the individual items works well.
Otherwise, building a tally table using cross joining is the best of the rest.
The most versatile method is the last, since not everyone can use CLR, and guarantee no special XML characters, so the split method for that is:
CREATE FUNCTION [dbo].[Split]
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
( WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1), (1)) n (N)),
N2(N) AS (SELECT 1 FROM N1 a CROSS JOIN N1 b),
N3(N) AS (SELECT 1 FROM N2 a CROSS JOIN N2 b),
N4(N) AS (SELECT 1 FROM N3 a CROSS JOIN N3 b),
cteTally(N) AS
( SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(#List,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM n4
),
cteStart(N1) AS
( SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(#List,t.N,1) = #Delimiter OR t.N = 0)
)
SELECT Item = SUBSTRING(#List, s.N1, ISNULL(NULLIF(CHARINDEX(#Delimiter,#List,s.N1),0)-s.N1,8000)),
Position = s.N1,
ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1)
FROM cteStart s
);
Now that you have your Split function, you can split your first string:
DECLARE #ActivityID AS VARCHAR(MAX) = 'BKR394859607,MTP293840284,SPN489620586';
SELECT Item,
NewID = LEFT(Item, 3)
FROM dbo.Split(#ActivityID, ',');
Which gives you:
Item NewID
-----------------------------
BKR394859607 BKR
MTP293840284 MTP
SPN489620586 SPN
Then you can concatenate this back up using FOR XML PATH():
DECLARE #ActivityID AS VARCHAR(MAX) = 'BKR394859607,MTP293840284,SPN489620586';
SELECT STUFF(( SELECT ',' + LEFT(Item, 3)
FROM dbo.Split(#ActivityID, ',')
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'), 1, 1, '');
For more on how this works see this answer.
The optimal solution would probably be to have a user defined table type to store string lists:
CREATE TYPE dbo.StringList AS TABLE (Value VARCHAR(MAX));
Then rather than building a delimited string, build up a table:
DECLARE #Activity dbo.StringList;
INSERT #Activity (Value)
VALUES ('BKR394859607'), ('MTP293840284'), ('SPN489620586');
Then you avoid a painful split, and can manipulate each individual record much mor easily.
If you really did need to get a new delimited string, then you can use the same logic as above:
DECLARE #Activity dbo.StringList;
INSERT #Activity (Value)
VALUES ('BKR394859607'), ('MTP293840284'), ('SPN489620586');
SELECT STUFF(( SELECT ',' + LEFT(Value, 3)
FROM #Activity
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'), 1, 1, '');
Try this:
DECLARE #ActivityID AS VARCHAR(MAX) = 'BKR394859607,MTP293840284,SPN489620586';
DECLARE #ActivityPrefixes AS VARCHAR(MAX);
SET #ActivityPrefixes = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(
#ActivityID,'0', ''), '1',''), '2', ''),'3', ''), '4',''), '5', ''),
'6', ''),'7', ''), '8', ''), '9', '');
SELECT #ActivityPrefixes
If you want to get the first 3 character of each section then try this:
DECLARE #ActivityID AS VARCHAR(MAX) = 'BKR39A859607,M3TP293840284,SP1N4896GG586,BKR394859607,MTP293840284,SPN489620586';
DECLARE #ActivityPrefixes AS VARCHAR(MAX);
SET #ActivityPrefixes = '';
WHILE 1 = 1
BEGIN
DECLARE #i INT;
IF CHARINDEX(',', #ActivityID) = 0
SET #i = 3;
ELSE
SET #i = CHARINDEX(',', #ActivityID) - 1;
IF #ActivityPrefixes = ''
SET #ActivityPrefixes = SUBSTRING(SUBSTRING(#ActivityID, 1, #i), 1,3);
ELSE
SET #ActivityPrefixes = #ActivityPrefixes + ','
+ SUBSTRING(SUBSTRING(#ActivityID, 1, #i), 1, 3);
IF CHARINDEX(',', #ActivityID) = 0
BREAK;
SET #ActivityID = SUBSTRING(#ActivityID,CHARINDEX(',', #ActivityID) + 1,LEN(#ActivityID));
END;
SELECT #ActivityPrefixes;

Calculated Parameter in WHERE

I am trying to use a carriage return separated list of parameters in an IN list of the where statement of my query.
I can turn the list into one comma separated string in the correct format using replace function, however when I put this in the IN list, it returns nothing.
The query below returns the comma separated list as expected.
declare #VarCodes varchar(max)
set #VarCodes = '123-1
123-10
123-100
61
66
67
75'
(select ''''+replace(replace(REPLACE(#VarCodes,char(13),''''+', '+''''),char(32),''),char(10),'')+'''')
'123-1','123-10','123-100','61','66','67','75'
If I paste this text directly in the query below, it returns data as expected.
select vad_variant_code from variant_detail where vad_variant_code in ('123-1','123-10','123-100','61','66','67','75')
If I put the parameter in the in, it returns nothing.
select vad_variant_code from variant_detail where vad_variant_code in ((select ''''+replace(replace(REPLACE(#VarCodes,char(13),''''+', '+''''),char(32),''),char(10),'')+''''))
I am assuming this is because the IN is expecting a comma separated list of strings, where as the replace function is returning one long string?
Can this be achieved?
Try this...
declare #VarCodes varchar(max), #Xml XML;
set #VarCodes = '123-1,123-10,123-100,61,66,67,75'
SET #Xml = N'<root><r>' + replace(#VarCodes, ',','</r><r>') + '</r></root>';
select vad_variant_code from variant_detail
where vad_variant_code in (
select r.value('.','varchar(max)') as item
from #Xml.nodes('//root/r') as records(r)
)
I have this code as a TVF based originally on Jeff Moden's code:
CREATE FUNCTION [dbo].[cSplitter] (#Parameter VARCHAR(MAX))
RETURNS #splitResult TABLE (number INT, [value] VARCHAR(100))
AS
BEGIN
SET #Parameter = ','+#Parameter +',';
WITH cteTally AS
(
SELECT TOP (LEN(#Parameter))
ROW_NUMBER() OVER (ORDER BY t1.Object_ID) AS N
FROM Master.sys.All_Columns t1
CROSS JOIN Master.sys.All_Columns t2
)
INSERT #splitResult
SELECT ROW_NUMBER() OVER (ORDER BY N) AS Number,
SUBSTRING(#Parameter,N+1,CHARINDEX(',',#Parameter,N+1)-N-1) AS [Value]
FROM cteTally
WHERE N < LEN(#Parameter) AND SUBSTRING(#Parameter,N,1) = ','
RETURN
END
With this TVF in my database, my "IN" queries can accept a comma separated list of values like this:
DECLARE #VarCodes VARCHAR(MAX);
SET #VarCodes = '123-1
123-10
123-100
61
66
67
75';
DECLARE #csv VARCHAR(MAX);
SET #csv = REPLACE(REPLACE(REPLACE(#VarCodes, CHAR(13), ','), CHAR(32), ''),
CHAR(10), '');
SELECT vad_variant_code
FROM variant_detail
WHERE EXISTS ( SELECT *
FROM [cSplitter](#csv) AS [cs]
WHERE [cs].[value] = vad_variant_code );

How to split string and save into an array in T-SQL

I am writing a cursor to populate data in new table from main table which contains data in below manner
Item
Colors
Shirt
Red,Blue,Green,Yellow
I want to populate new Table data by fetching the Item and then adding it in row, according to each color it contains
Item
Color
Shirt
Red
Shirt
Blue
Shirt
Green
Shirt
Yellow
I am stuck in how to
Delimit/Split "Colors" string
To save it in an array
To use it in cursor
as I am going to use Nested cursor for this purpose.
Using Sql Server 2005+ and the XML datatype, you can have a look at the following
DECLARE #Table TABLE(
Item VARCHAR(250),
Colors VARCHAR(250)
)
INSERT INTO #Table SELECT 'Shirt','Red,Blue,Green,Yellow'
INSERT INTO #Table SELECT 'Pants','Black,White'
;WITH Vals AS (
SELECT Item,
CAST('<d>' + REPLACE(Colors, ',', '</d><d>') + '</d>' AS XML) XmlColumn
FROM #Table
)
SELECT Vals.Item,
C.value('.','varchar(max)') ColumnValue
FROM Vals
CROSS APPLY Vals.XmlColumn.nodes('/d') AS T(C)
The article Faking Arrays in Transact SQL details SEVERAL techniques to solve this problem, ranging from using the PARSENAME() function (limit to 5 items) to writing CLR functions.
The XML answer is one of the detailed techniques that can be chosen to a specific scenario.
Combining some of the tips, I solved my string split problem like this:
SET NOCOUNT ON;
DECLARE #p NVARCHAR(1000), #len INT;
SET #p = N'value 1,value 2,value 3,value 4,etc';
SET #p = ',' + #p + ',';
SET #len = LEN(#p);
-- Remove this table variable creation if you have a permanent enumeration table
DECLARE #nums TABLE (n int);
INSERT INTO #nums (n)
SELECT A.n FROM
(SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY TableKey) as n FROM dbo.Table) A
WHERE A.n BETWEEN 1 AND #len;
SELECT SUBSTRING(#p , n + 1, CHARINDEX( ',', #p, n + 1 ) - n - 1 ) AS "value"
FROM #nums
WHERE SUBSTRING( #p, n, 1 ) = ',' AND n < #len;
Note that, considering 1000 your string length limit, you must have a table with 1000 or more rows (dbo.Table on the sample tsql) to create the table variable #nums of this sample. On the article, they have a permanent enumeration table.
For those who like to keep it simple:
-- Here is the String Array you want to convert to a Table
declare #StringArray varchar(max)
set #StringArray = 'First item,Second item,Third item';
-- Here is the table which is going to contain the rows of each item in the String array
declare ##mytable table (EachItem varchar(50))
-- Just create a select statement appending UNION ALL to each one of the item in the array
set #StringArray = 'select ''' + replace(#StringArray, ',', ''' union all select ''') + ''''
-- Push the data into your table
insert into ##mytable exec (#StringArray)
-- You now have the data in an an array inside a table that you can join to other objects
select * from ##mytable
I just accomplished something like this to create staging tables to replicate the source tables using the INFORMATION_SCHEMA views on a linked server. But this is a modified version to create the results you are look for. Just remember to remove the last two characters from the Colors column when displaying it.
SELECT
t.Item
, (
SELECT
x.Color + ', ' AS [data()]
FROM
Items x
WHERE
x.Item = t.Item
FOR XML PATH(''), TYPE
).value('.', 'varchar(max)') AS Colors
FROM
Items t
GROUP BY
t.Item

Resources