Calculated Parameter in WHERE - sql-server

I am trying to use a carriage return separated list of parameters in an IN list of the where statement of my query.
I can turn the list into one comma separated string in the correct format using replace function, however when I put this in the IN list, it returns nothing.
The query below returns the comma separated list as expected.
declare #VarCodes varchar(max)
set #VarCodes = '123-1
123-10
123-100
61
66
67
75'
(select ''''+replace(replace(REPLACE(#VarCodes,char(13),''''+', '+''''),char(32),''),char(10),'')+'''')
'123-1','123-10','123-100','61','66','67','75'
If I paste this text directly in the query below, it returns data as expected.
select vad_variant_code from variant_detail where vad_variant_code in ('123-1','123-10','123-100','61','66','67','75')
If I put the parameter in the in, it returns nothing.
select vad_variant_code from variant_detail where vad_variant_code in ((select ''''+replace(replace(REPLACE(#VarCodes,char(13),''''+', '+''''),char(32),''),char(10),'')+''''))
I am assuming this is because the IN is expecting a comma separated list of strings, where as the replace function is returning one long string?
Can this be achieved?

Try this...
declare #VarCodes varchar(max), #Xml XML;
set #VarCodes = '123-1,123-10,123-100,61,66,67,75'
SET #Xml = N'<root><r>' + replace(#VarCodes, ',','</r><r>') + '</r></root>';
select vad_variant_code from variant_detail
where vad_variant_code in (
select r.value('.','varchar(max)') as item
from #Xml.nodes('//root/r') as records(r)
)

I have this code as a TVF based originally on Jeff Moden's code:
CREATE FUNCTION [dbo].[cSplitter] (#Parameter VARCHAR(MAX))
RETURNS #splitResult TABLE (number INT, [value] VARCHAR(100))
AS
BEGIN
SET #Parameter = ','+#Parameter +',';
WITH cteTally AS
(
SELECT TOP (LEN(#Parameter))
ROW_NUMBER() OVER (ORDER BY t1.Object_ID) AS N
FROM Master.sys.All_Columns t1
CROSS JOIN Master.sys.All_Columns t2
)
INSERT #splitResult
SELECT ROW_NUMBER() OVER (ORDER BY N) AS Number,
SUBSTRING(#Parameter,N+1,CHARINDEX(',',#Parameter,N+1)-N-1) AS [Value]
FROM cteTally
WHERE N < LEN(#Parameter) AND SUBSTRING(#Parameter,N,1) = ','
RETURN
END
With this TVF in my database, my "IN" queries can accept a comma separated list of values like this:
DECLARE #VarCodes VARCHAR(MAX);
SET #VarCodes = '123-1
123-10
123-100
61
66
67
75';
DECLARE #csv VARCHAR(MAX);
SET #csv = REPLACE(REPLACE(REPLACE(#VarCodes, CHAR(13), ','), CHAR(32), ''),
CHAR(10), '');
SELECT vad_variant_code
FROM variant_detail
WHERE EXISTS ( SELECT *
FROM [cSplitter](#csv) AS [cs]
WHERE [cs].[value] = vad_variant_code );

Related

iTVF for splitting string into row of substrings based on a specified separator character breaks when received empty value (TSQL)

I have an inline table-valued function, which splits strings into row of substrings based on a specified separator.
It is as follows:
ALTER FUNCTION [dbo].[SplitString]
(#List NVARCHAR(MAX),
#Delim VARCHAR(255))
RETURNS TABLE
AS
RETURN
(SELECT [Value], idx = RANK() OVER (ORDER BY n)
FROM
(SELECT
n = Number,
[Value] = LTRIM(RTRIM(SUBSTRING(#List, [Number],
CHARINDEX(#Delim, #List + #Delim, [Number]) - [Number])))
FROM
(SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects) AS x
WHERE Number <= LEN(#List)
AND SUBSTRING(#Delim + #List, [Number], LEN(#Delim)) = #Delim) AS y
);
GO
Usage:
SELECT value
FROM dbo.SplitString('a|b|c', '|')
returns:
value
a
b
c
But when sending an empty value as the first argument, it doesn't return anything.
For example:
SELECT value FROM dbo.SplitString('','|')
This doesn't return anything.
What modification I need to do to the dbo.SplitString function, so that it returns an empty result set, when an empty string is passed in as first argument?
PS: I can't use the inbuilt STRING_SPLIT function because of compatibility issues.
DelimitedSplit8K_LEAD (above) will always return a row and will be faster.
That said, for learning purposes let's fix your function. If you replace a blank value with your delimiter you will get the results you are looking for. You just need to replace every instance of #list with ISNULL(NULLIF(#List,''),#Delim). Now you have:
ALTER FUNCTION [dbo].[SplitString]
(#List NVARCHAR(MAX),
#Delim VARCHAR(255))
RETURNS TABLE
AS
RETURN
(SELECT [Value], idx = RANK() OVER (ORDER BY n)
FROM
(SELECT
n = Number,
[Value] = LTRIM(RTRIM(SUBSTRING(ISNULL(NULLIF(#List,''),#Delim), [Number],
CHARINDEX(#Delim, ISNULL(NULLIF(#List,''),#Delim) + #Delim, [Number]) - [Number])))
FROM
(SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects) AS x
WHERE Number <= LEN(ISNULL(NULLIF(#List,''),#Delim))
AND SUBSTRING(#Delim + ISNULL(NULLIF(#List,''),#Delim),
[Number], LEN(#Delim)) = #Delim) AS y
);
Now, when you execute:
DECLARE #list VARCHAR(max) = '', #delim VARCHAR(255) = '|'
SELECT *
FROM dbo.SplitString(#list,#delim)
You get:
Value idx
------ ------
1
Thanks #Larnu and #Bernie for all suggestions.
After so much of research, I started iterating and getting expected result.
I achieved this by simple while loop and , string functions of SQL.
CREATE FUNCTION [SplitString]
(
#ActualString VARCHAR(MAX),
#DelimiterCharacter VARCHAR(10)
)
RETURNS #TableRes TABLE (Id INT IDENTITY(1,1),Value VARCHAR(MAX))
AS
BEGIN
DECLARE #SubStr VARCHAR(MAX)
WHILE (CHARINDEX(#DelimiterCharacter ,#ActualString)<>0)
BEGIN
SET #SubStr=SUBSTRING(#ActualString,1,CHARINDEX(#DelimiterCharacter ,#ActualString)-1)
SET #ActualString= STUFF(#ActualString,1,CHARINDEX(#DelimiterCharacter,#ActualString),'')
INSERT INTO #TableRes
SELECT #SubStr
END
INSERT INTO #TableRes
SELECT #ActualString
RETURN
END
This will work for all cases
1)When Actual string is empty string like select * from [dbo].[SplitString]('',',')
2)When Actual string has empty string at end like select * from [dbo].[SplitString]('a,b,',',')

Split string into two columns with delimiter ->

I have the following given string to split into two columns with given From and To format.
Given string:
DECLARE #String VARCHAR(MAX) = 'A->B->C->D'
Expected Result:
From To
-----------
A B
B C
C D
Tried:
DECLARE #String VARCHAR(MAX) = 'A->B->C->D'
SELECT CASE WHEN item LIKE '%-' THEN REPLACE(item,'-','') END AS [From],
CASE WHEN item NOT LIKE '%-' THEN item END AS [To]
FROM dbo.f_Split(#String,'>')
Try this:
DECLARE #String VARCHAR(MAX) = 'A->B->C->D';
DECLARE #StringXML XML = CAST('<a>' + REPLACE(#String, '->', '</a><a>') + '</a>' AS XML);
WITH DataSource ([RowID], [RowValue]) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY T.c ASC)
,T.c.value('.', 'CHAR(1)')
FROM #StringXML.nodes('a') T(c)
)
SELECT DS1.[RowValue] AS [From]
,DS2.[RowValue] AS [TO]
FROM DataSource DS1
INNER JOIN DataSource DS2
ON DS1.[RowID] + 1 = DS2.[RowID];
The idea is to split the values and order them. Then just perform join to the final row set to itself.
You can REPLACE the string before processing it and directly apply joins to get the expected output. Considering the dbo.f_Split function returns column item.
DECLARE #String VARCHAR(MAX) = 'A->B->C->D->E->F->G';
SET #String = REPLACE(#String, '->', '>')
WITH CTE(RowNumber, RowData) AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY S1.item) AS RowNumber,
S1.item AS RowData
FROM dbo.f_Split(#String,'>') S1
)
SELECT
C1.RowData AS [From],
C2.RowData AS [To]
FROM CTE C1
INNER JOIN CTE C2 ON C1.RowNumber + 1 = C2.RowNumber
One more solution using the position and +1:
DECLARE #String VARCHAR(MAX) = 'A->B->C->D->E';
DECLARE #YourStringAsXml XML=CAST('<x>' + REPLACE(#String, '->', '</x><x>') + '</x>' AS XML);
--the query
WITH tally(nr) AS
(
SELECT TOP (#YourStringAsXml.value('count(/x)','int')) ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM master..spt_values
)
SELECT #YourStringAsXml.value('/x[sql:column("nr")][1]','varchar(10)') AS FromNode
,#YourStringAsXml.value('/x[sql:column("nr")+1][1]','varchar(10)') AS ToNode
FROM tally;
The idea in short:
We transform the string to an XML
We use a tally-on-the-fly with a computed TOP() clause to get a list of running numbers (better was - and very handsome anyway - a pyhsical numbers table).
Now we can pick the elements by their position (sql:column()) and the neighbour by simply adding +1 to this position

remove duplicates from comma or pipeline operator string

I have been looking into this for a while now and I cannot find a way to remove duplicate strings from a comma-separated as well as pipeline seperated string in SQL Server.
Given the string
test1,test2,test1|test2,test3|test4,test4|test4
does anyone know how would you return test1,test2,test3,test4?
Approach
The following approach can be used to de-duplicate a delimited list of values.
Use the REPLACE() function to convert different delimiters into the same delimiter.
Use the REPLACE() function to inject XML closing and opening tags to create an XML fragment
Use the CAST(expr AS XML) function to convert the above fragment into the XML data type
Use OUTER APPLY to apply the table-valued function nodes() to split the XML fragment into its constituent XML tags. This returns each XML tag on a separate row.
Extract just the value from the XML tag using the value() function, and returns the value using the specified data type.
Append a comma after the above-mentioned value.
Note that these values are returned on separate rows. The usage of the DISTINCT keyword now removes duplicate rows (i.e. values).
Use the FOR XML PATH('') clause to concatenate the values across multiple rows into a single row.
Query
Putting the above approach in query form:
SELECT DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)') + ','
FROM (
-- This query returns the following in theDataXml column:
-- <tag>test1</tag><tag>test2</tag><tag>test1</tag><tag>test2</tag><tag>test3</tag><tag>test4</tag><tag>test4</tag><tag>test4</tag>
-- i.e. it has turned the original delimited data into an XML fragment
SELECT
DataTable.DataColumn AS DataRaw
, CAST(
'<tag>'
-- First replace commas with pipes to have only a single delimiter
-- Then replace the pipe delimiters with a closing and opening tag
+ replace(replace(DataTable.DataColumn, ',','|'), '|','</tag><tag>')
-- Add a final set of closing tags
+ '</tag>'
AS XML) AS DataXml
FROM ( SELECT 'test1,test2,test1|test2,test3|test4,test4|test4' AS DataColumn) AS DataTable
) AS x
OUTER APPLY DataXml.nodes('tag') AS PivotedTable(PivotedColumn)
-- Running the query without the following line will return the data in separate rows
-- Running the query with the following line returns the rows concatenated, i.e. it returns:
-- test1,test2,test3,test4,
FOR XML PATH('')
Input & Result
Given the input:
test1,test2,test1|test2,test3|test4,test4|test4
The above query will return the result:
test1,test2,test3,test4,
Notice the trailing comma at the end. I'll leave it as an exercise to you to remove that.
EDIT: Count of Duplicates
OP requested in a comment "how do i get t5he count of duplicates as well? in a seperate column".
The simplest way would be to use the above query but remove the last line FOR XML PATH(''). Then, counting all values and distinct values returned by the SELECT expression in the above query (i.e. PivotedTable.PivotedColumn.value('.','nvarchar(max)')). The difference between the count of all values and the count of distinct values is the count of duplicate values.
SELECT
COUNT(PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfAllValues
, COUNT(DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfUniqueValues
-- The difference of the previous two counts is the number of duplicate values
, COUNT(PivotedTable.PivotedColumn.value('.','nvarchar(max)'))
- COUNT(DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfDuplicateValues
FROM (
-- This query returns the following in theDataXml column:
-- <tag>test1</tag><tag>test2</tag><tag>test1</tag><tag>test2</tag><tag>test3</tag><tag>test4</tag><tag>test4</tag><tag>test4</tag>
-- i.e. it has turned the original delimited data into an XML fragment
SELECT
DataTable.DataColumn AS DataRaw
, CAST(
'<tag>'
-- First replace commas with pipes to have only a single delimiter
-- Then replace the pipe delimiters with a closing and opening tag
+ replace(replace(DataTable.DataColumn, ',','|'), '|','</tag><tag>')
-- Add a final set of closing tags
+ '</tag>'
AS XML) AS DataXml
FROM ( SELECT 'test1,test2,test1|test2,test3|test4,test4|test4' AS DataColumn) AS DataTable
) AS x
OUTER APPLY DataXml.nodes('tag') AS PivotedTable(PivotedColumn)
For the same input shown above, the output of this query is:
CountOfAllValues CountOfUniqueValues CountOfDuplicateValues
---------------- ------------------- ----------------------
8 4 4
Solution to your problem is as given below :
DECLARE #Data_String AS VARCHAR(1000), #Result as varchar(1000)=''
SET #Data_String = 'test1,test2,test1|test2,test3|test4,test4|test4'
SET #Data_String = REPLACE(#Data_String,'|',',')
SELECT #Result=#Result+col+',' from(
SELECT DISTINCT t.c.value('.','varchar(100)') col from(
SELECT cast('<A>'+replace(#Data_String,',','</A><A>')+'</A>' as xml)col1)data cross apply col1.nodes('/A') as t(c))Data
SELECT LEFT(#Result,LEN(#Result)-1)
Result
test1,test2,test3,test4
DECLARE #string AS VARCHAR(1000)
SET #string = 'test1,test2,test1|test2,test3|test4,test4|test4'
SET #string = REPLACE(#string,'|',',')
DECLARE #t TABLE (val VARCHAR(MAX))
DECLARE #xml XML
SET #xml = N'<root><r>' + REPLACE(#string, ',', '</r><r>') + '</r></root>'
INSERT INTO #t(val) SELECT r.value('.','VARCHAR(MAX)') as Item FROM #xml.nodes('//root/r') AS RECORDS(r)
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY val ORDER BY val desc) RN
FROM #t)
DELETE FROM cte
WHERE RN > 1
Try Following SQL Script :
declare #List nvarchar(max)='test1,test2,test1|test2,test3|test4,test4|test4';
declare #Delimiter CHAR(1) =','
declare #XML AS XML
declare #result varchar(max)
set #List=Replace(#List,'|',',')
--Select #List
SET #XML = CAST(('<X>'+REPLACE(#List,#Delimiter ,'</X><X>')+'</X>') AS XML)
DECLARE #temp TABLE (Data nvarchar(100))
INSERT INTO #temp
SELECT N.value('.', 'nvarchar(100)') AS Data FROM #XML.nodes('X') AS T(N)
--SELECT distinct * FROM #temp
IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp
Select distinct Data into #temp from #temp
SET #result = ''
select #result = #result + Data + ', ' from #temp
select SUBSTRING(#result, 0, LEN(#result))
I just tried following script working perfectly :
declare #List VARCHAR(MAX)='test1,test2,test1|test2,test3|test4,test4|test4'
declare #Delim CHAR=','
DECLARE #ParsedList TABLE
(
Item VARCHAR(MAX)
)
DECLARE #list1 VARCHAR(MAX), #Pos INT, #rList VARCHAR(MAX)
set #List=Replace(#List,'|',',')
SET #list = LTRIM(RTRIM(#list)) + #Delim
SET #pos = CHARINDEX(#delim, #list, 1)
WHILE #pos > 0
BEGIN
SET #list1 = LTRIM(RTRIM(LEFT(#list, #pos - 1)))
IF #list1 <> ''
INSERT INTO #ParsedList VALUES (CAST(#list1 AS VARCHAR(MAX)))
SET #list = SUBSTRING(#list, #pos+1, LEN(#list))
SET #pos = CHARINDEX(#delim, #list, 1)
END
SELECT #rlist = COALESCE(#rlist+',','') + item
FROM (SELECT DISTINCT Item FROM #ParsedList) t
Select #rlist

Remove some characters from string sql [duplicate]

I've got dirty data in a column with variable alpha length. I just want to strip out anything that is not 0-9.
I do not want to run a function or proc. I have a script that is similar that just grabs the numeric value after text, it looks like this:
Update TableName
set ColumntoUpdate=cast(replace(Columnofdirtydata,'Alpha #','') as int)
where Columnofdirtydata like 'Alpha #%'
And ColumntoUpdate is Null
I thought it would work pretty good until I found that some of the data fields I thought would just be in the format Alpha # 12345789 are not.
Examples of data that needs to be stripped
AB ABCDE # 123
ABCDE# 123
AB: ABC# 123
I just want the 123. It is true that all data fields do have the # prior to the number.
I tried substring and PatIndex, but I'm not quite getting the syntax correct or something. Anyone have any advice on the best way to address this?
See this blog post on extracting numbers from strings in SQL Server. Below is a sample using a string in your example:
DECLARE #textval NVARCHAR(30)
SET #textval = 'AB ABCDE # 123'
SELECT LEFT(SUBSTRING(#textval, PATINDEX('%[0-9.-]%', #textval), 8000),
PATINDEX('%[^0-9.-]%', SUBSTRING(#textval, PATINDEX('%[0-9.-]%', #textval), 8000) + 'X') -1)
Here is an elegant solution if your server supports the TRANSLATE function (on sql server it's available on sql server 2017+ and also sql azure).
First, it replaces any non numeric characters with a # character.
Then, it removes all # characters.
You may need to add additional characters that you know may be present in the second parameter of the TRANSLATE call.
select REPLACE(TRANSLATE([Col], 'abcdefghijklmnopqrstuvwxyz+()- ,#+', '##################################'), '#', '')
You can use stuff and patindex.
stuff(Col, 1, patindex('%[0-9]%', Col)-1, '')
SQL Fiddle
This works well for me:
CREATE FUNCTION [dbo].[StripNonNumerics]
(
#Temp varchar(255)
)
RETURNS varchar(255)
AS
Begin
Declare #KeepValues as varchar(50)
Set #KeepValues = '%[^0-9]%'
While PatIndex(#KeepValues, #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex(#KeepValues, #Temp), 1, '')
Return #Temp
End
Then call the function like so to see the original something next to the sanitized something:
SELECT Something, dbo.StripNonNumerics(Something) FROM TableA
In case if there are some characters possible between digits (e.g. thousands separators), you may try following:
declare #table table (DirtyCol varchar(100))
insert into #table values
('AB ABCDE # 123')
,('ABCDE# 123')
,('AB: ABC# 123')
,('AB#')
,('AB # 1 000 000')
,('AB # 1`234`567')
,('AB # (9)(876)(543)')
;with tally as (select top (100) N=row_number() over (order by ##spid) from sys.all_columns),
data as (
select DirtyCol, Col
from #table
cross apply (
select (select C + ''
from (select N, substring(DirtyCol, N, 1) C from tally where N<=datalength(DirtyCol)) [1]
where C between '0' and '9'
order by N
for xml path(''))
) p (Col)
where p.Col is not NULL
)
select DirtyCol, cast(Col as int) IntCol
from data
Output is:
DirtyCol IntCol
--------------------- -------
AB ABCDE # 123 123
ABCDE# 123 123
AB: ABC# 123 123
AB # 1 000 000 1000000
AB # 1`234`567 1234567
AB # (9)(876)(543) 9876543
For update, add ColToUpdate to select list of the data cte:
;with num as (...),
data as (
select ColToUpdate, /*DirtyCol, */Col
from ...
)
update data
set ColToUpdate = cast(Col as int)
CREATE FUNCTION FN_RemoveNonNumeric (#Input NVARCHAR(512))
RETURNS NVARCHAR(512)
AS
BEGIN
DECLARE #Trimmed NVARCHAR(512)
SELECT #Trimmed = #Input
WHILE PATINDEX('%[^0-9]%', #Trimmed) > 0
SELECT #Trimmed = REPLACE(#Trimmed, SUBSTRING(#Trimmed, PATINDEX('%[^0-9]%', #Trimmed), 1), '')
RETURN #Trimmed
END
GO
SELECT dbo.FN_RemoveNonNumeric('ABCDE# 123')
Pretty late to the party, I found the following which I though worked brilliantialy.. if anyone is still looking
SELECT
(SELECT CAST(CAST((
SELECT SUBSTRING(FieldToStrip, Number, 1)
FROM master..spt_values
WHERE Type='p' AND Number <= LEN(FieldToStrip) AND
SUBSTRING(FieldToStrip, Number, 1) LIKE '[0-9]' FOR XML Path(''))
AS xml) AS varchar(MAX)))
FROM
SourceTable
Here's a version which pulls all digits from a string; i.e. given I'm 35 years old; I was born in 1982. The average family has 2.4 children. this would return 35198224. i.e. it's good where you've got numeric data which may have been formatted as a code (e.g. #123,456,789 / 123-00005), but isn't appropriate if you're looking to pull out specific numbers (i.e. as opposed to digits / just the numeric characters) from the text. Also it only handles digits; so won't return negative signs (-) or periods .).
declare #table table (id bigint not null identity (1,1), data nvarchar(max))
insert #table (data)
values ('hello 123 its 45613 then') --outputs: 12345613
,('1 some other string 98 example 4') --outputs: 1984
,('AB ABCDE # 123') --outputs: 123
,('ABCDE# 123') --outputs: 123
,('AB: ABC# 123') --outputs: 123
; with NonNumerics as (
select id
, data original
--the below line replaces all digits with blanks
, replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(data,'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9','') nonNumeric
from #table
)
--each iteration of the below CTE removes another non-numeric character from the original string, putting the result into the numerics column
, Numerics as (
select id
, replace(original, substring(nonNumeric,1,1), '') numerics
, replace(nonNumeric, substring(nonNumeric,1,1), '') charsToreplace
, len(replace(nonNumeric, substring(nonNumeric,1,1), '')) charsRemaining
from NonNumerics
union all
select id
, replace(numerics, substring(charsToreplace,1,1), '') numerics
, replace(charsToreplace, substring(charsToreplace,1,1), '') charsToreplace
, len(replace(charsToreplace, substring(charsToreplace,1,1), '')) charsRemaining
from Numerics
where charsRemaining > 0
)
--we select only those strings with `charsRemaining=0`; i.e. the rows for which all non-numeric characters have been removed; there should be 1 row returned for every 1 row in the original data set.
select * from Numerics where charsRemaining = 0
This code works by removing all the digits (i.e. the characters we want) from a the given strings by replacing them with blanks. Then it goes through the original string (which includes the digits) removing all of the characters that were left (i.e. the non-numeric characters), thus leaving only the digits.
The reason we do this in 2 steps, rather than just removing all non-numeric characters in the first place is there are only 10 digits, whilst there are a huge number of possible characters; so replacing that small list is relatively fast; then gives us a list of those non-numeric characters which actually exist in the string, so we can then replace that small set.
The method makes use of recursive SQL, using common table expressions (CTEs).
To add on to Ken's answer, this handles commas and spaces and parentheses
--Handles parentheses, commas, spaces, hyphens..
declare #table table (c varchar(256))
insert into #table
values
('This is a test 111-222-3344'),
('Some Sample Text (111)-222-3344'),
('Hello there 111222 3344 / How are you?'),
('Hello there 111 222 3344 ? How are you?'),
('Hello there 111 222 3344. How are you?')
select
replace(LEFT(SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000),
PATINDEX('%[^0-9.-]%', SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000) + 'X') -1),'.','')
from #table
Create function fn_GetNumbersOnly(#pn varchar(100))
Returns varchar(max)
AS
BEGIN
Declare #r varchar(max) ='', #len int ,#c char(1), #x int = 0
Select #len = len(#pn)
while #x <= #len
begin
Select #c = SUBSTRING(#pn,#x,1)
if ISNUMERIC(#c) = 1 and #c <> '-'
Select #r = #r + #c
Select #x = #x +1
end
return #r
End
In your case It seems like the # will always be after teh # symbol so using CHARINDEX() with LTRIM() and RTRIM() would probably perform the best. But here is an interesting method of getting rid of ANY non digit. It utilizes a tally table and table of digits to limit which characters are accepted then XML technique to concatenate back to a single string without the non-numeric characters. The neat thing about this technique is it could be expanded to included ANY Allowed characters and strip out anything that is not allowed.
DECLARE #ExampleData AS TABLE (Col VARCHAR(100))
INSERT INTO #ExampleData (Col) VALUES ('AB ABCDE # 123'),('ABCDE# 123'),('AB: ABC# 123')
DECLARE #Digits AS TABLE (D CHAR(1))
INSERT INTO #Digits (D) VALUES ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9')
;WITH cteTally AS (
SELECT
I = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
#Digits d10
CROSS APPLY #Digits d100
--add more cross applies to cover longer fields this handles 100
)
SELECT *
FROM
#ExampleData e
OUTER APPLY (
SELECT CleansedPhone = CAST((
SELECT TOP 100
SUBSTRING(e.Col,t.I,1)
FROM
cteTally t
INNER JOIN #Digits d
ON SUBSTRING(e.Col,t.I,1) = d.D
WHERE
I <= LEN(e.Col)
ORDER BY
t.I
FOR XML PATH('')) AS VARCHAR(100))) o
Declare #MainTable table(id int identity(1,1),TextField varchar(100))
INSERT INTO #MainTable (TextField)
VALUES
('6B32E')
declare #i int=1
Declare #originalWord varchar(100)=''
WHile #i<=(Select count(*) from #MainTable)
BEGIN
Select #originalWord=TextField from #MainTable where id=#i
Declare #r varchar(max) ='', #len int ,#c char(1), #x int = 0
Select #len = len(#originalWord)
declare #pn varchar(100)=#originalWord
while #x <= #len
begin
Select #c = SUBSTRING(#pn,#x,1)
if(#c!='')
BEGIN
if ISNUMERIC(#c) = 0 and #c <> '-'
BEGIN
Select #r = cast(#r as varchar) + cast(replace((SELECT ASCII(#c)-64),'-','') as varchar)
end
ELSE
BEGIN
Select #r = #r + #c
END
END
Select #x = #x +1
END
Select #r
Set #i=#i+1
END
I have created a function for this
Create FUNCTION RemoveCharacters (#text varchar(30))
RETURNS VARCHAR(30)
AS
BEGIN
declare #index as int
declare #newtexval as varchar(30)
set #index = (select PATINDEX('%[A-Z.-/?]%', #text))
if (#index =0)
begin
return #text
end
else
begin
set #newtexval = (select STUFF ( #text , #index , 1 , '' ))
return dbo.RemoveCharacters(#newtexval)
end
return 0
END
GO
Here is the answer:
DECLARE #t TABLE (tVal VARCHAR(100))
INSERT INTO #t VALUES('123')
INSERT INTO #t VALUES('123S')
INSERT INTO #t VALUES('A123,123')
INSERT INTO #t VALUES('a123..A123')
;WITH cte (original, tVal, n)
AS
(
SELECT t.tVal AS original,
LOWER(t.tVal) AS tVal,
65 AS n
FROM #t AS t
UNION ALL
SELECT tVal AS original,
CAST(REPLACE(LOWER(tVal), LOWER(CHAR(n)), '') AS VARCHAR(100)),
n + 1
FROM cte
WHERE n <= 90
)
SELECT t1.tVal AS OldVal,
t.tval AS NewVal
FROM (
SELECT original,
tVal,
ROW_NUMBER() OVER(PARTITION BY tVal + original ORDER BY original) AS Sl
FROM cte
WHERE PATINDEX('%[a-z]%', tVal) = 0
) t
INNER JOIN #t t1
ON t.original = t1.tVal
WHERE t.sl = 1
You can create SQL CLR scalar function in order to be able to use regular expressions like replace patterns.
Here you can find example of how to create such function.
Having such function will solve the issue with just the following lines:
SELECT [dbo].[fn_Utils_RegexReplace] ('AB ABCDE # 123', '[^0-9]', '');
SELECT [dbo].[fn_Utils_RegexReplace] ('ABCDE# 123', '[^0-9]', '');
SELECT [dbo].[fn_Utils_RegexReplace] ('AB: ABC# 123', '[^0-9]', '');
More important, you will be able to solve more complex issues as the regular expressions will bring a whole new world of options directly in your T-SQL statements.
Use this:
REPLACE(TRANSLATE(SomeString, REPLACE(TRANSLATE(SomeString, '0123456789', '##########'), '#', ''), REPLICATE('#', LEN(REPLACE(TRANSLATE(SomeString, '0123456789', '##########'), '#', '') + 'x') - 1)), '#', '')
Demo:
DROP TABLE IF EXISTS #MyTempTable;
CREATE TABLE #MyTempTable (SomeString VARCHAR(255));
INSERT INTO #MyTempTable
VALUES ('ssss123ssg99d362sdg')
, ('hey 62q&*^(n43')
, (NULL)
, ('')
, ('hi')
, ('123');
SELECT SomeString
, REPLACE(TRANSLATE(SomeString, REPLACE(TRANSLATE(SomeString, '0123456789', '##########'), '#', ''), REPLICATE('#', LEN(REPLACE(TRANSLATE(SomeString, '0123456789', '##########'), '#', '') + 'x') - 1)), '#', '')
FROM #MyTempTable;
DROP TABLE IF EXISTS #MyTempTable;
Results:
SomeString
(No column name)
ssss123ssg99d362sdg
12399362
hey62q&*^(n43
6243
NULL
NULL
hi
123
123
While the OP wanted to "strip out anything that is not 0-9", the post is also tagged with "substring" and "patindex", and the OP mentioned the concern "not quite getting the syntax correct or something". To address that the requirements note that "all data fields do have the # prior to the number" and to provide an answer that addresses the challenges with substring/patindex, consider the following:
/* A sample select */
;WITH SampleValues AS
( SELECT 'AB ABCDE # 123' [Columnofdirtydata]
UNION ALL SELECT 'AB2: ABC# 123')
SELECT
s.Columnofdirtydata,
f1.pos1,
'['+ f2.substr +']' [InspectOutput]
FROM
SampleValues s
CROSS APPLY (SELECT PATINDEX('%# %',s.Columnofdirtydata) [pos1]) f1
CROSS APPLY (SELECT SUBSTRING(s.Columnofdirtydata, f1.pos1 + LEN('#-'),LEN(s.Columnofdirtydata)) [substr]) f2
/* Using update scenario from OP */
UPDATE t1
SET t1.Columntoupdate = CAST(f2.substr AS INT)
FROM
TableName t1
CROSS APPLY (SELECT PATINDEX('%# %',t1.Columnofdirtydata) [pos1]) f1
CROSS APPLY (SELECT SUBSTRING(t1.Columnofdirtydata, f1.pos1 + LEN('#-'),LEN(t1.Columnofdirtydata)) [substr]) f2
Note that my syntax advice for patindex/substring, is to:
consider using APPLY as a way to temporarily alias results from one function for use as parameters in the next. It's not uncommon to (in ETL, for example) need to parse out parameter/position-based substrings in an updatable column of a staging table. If you need to "debug" and potentially fix some parsing logic, this style will help.
consider using LEN('PatternSample') in your substring logic, to account for reusing this pattern or adjusting it when your source data changes (instead of "+ 1"
SUBSTRING() requires a length parameter, but it can be greater than the length of the string. Therefore, if you are getting "the rest of the string" after the pattern, you can just use "The source length"
DECLARE #STR VARCHAR(400)
DECLARE #specialchars VARCHAR(50) = '%[~,#,#,$,%,&,*,(,),!^?:]%'
SET #STR = '1, 45 4,3 68.00-'
WHILE PATINDEX( #specialchars, #STR ) > 0
---Remove special characters using Replace function
SET #STR = Replace(Replace(REPLACE( #STR, SUBSTRING( #STR, PATINDEX( #specialchars, #STR ), 1 ),''),'-',''), ' ','')
SELECT #STR
SELECT REGEXP_REPLACE( col, '[^[:digit:]]', '' ) AS new_col FROM my_table

Convert comma delimited ID list to Comma delimited prefix List SQL Server

I'm trying to convert a comma delimited ID list to an comma delimited Prefix list.
BKR394859607,MTP293840284,SPN489620586
My goal is to convert these 3 Ids in my IDString in SQL Server to
BKR,MTP,SPN
This string can have an infinite number of Ids
Declare #ActivityID as varchar(MAX) = 'BKR394859607,MTP293840284,SPN489620586'
Declare #ActivityPrefixes as varchar(MAX)
Set #ActivityPrefixes = (function to Convert to comma delimited prefix list)
As mentioned in comment by Sean Lange, you need to perform three steps
Split the string into individual rows
Strip the required characters
Concatenate the rows back to CSV.
Something like this
DECLARE #ActivityID AS VARCHAR(max) =
'BKR394859607,MTP293840284,SPN489620586,GHY489620586'
SELECT Stuff(split_val, 1, 1, '') as Result
FROM (SELECT ','
+ LEFT(split.a.value('.', 'VARCHAR(100)'), 3)
FROM (SELECT Cast ('<M>' + Replace(#ActivityID, ',', '</M><M>')
+ '</M>' AS XML) AS Data) AS A
CROSS apply data.nodes ('/M') AS Split(a)
FOR xml path(''))a(split_val)
Result : BKR,MTP,SPN,GHY
Use the following answer to implement regeular expression functions in T-SQL. Then you can use the following:
First and foremost, read the following articles, they discuss splitting strings (and reasons not to do it), and compare performance of methods to do it if you can't avoid it.
Split strings the right way – or the next best way
Splitting Strings : A Follow-Up
Splitting Strings : Now with less T-SQL
The upshot of these three articles is (in case the links ever become dead):
Avoid delimited lists as strings where possible, if you need to store a list, a table is much better alternative.
If you have to do it, CLR is the most scaleable (and accurate method).
If you can be certain there are no special XML characters to split, then converting the delimited string to XML then using XQuery to get the individual items works well.
Otherwise, building a tally table using cross joining is the best of the rest.
The most versatile method is the last, since not everyone can use CLR, and guarantee no special XML characters, so the split method for that is:
CREATE FUNCTION [dbo].[Split]
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
( WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1), (1)) n (N)),
N2(N) AS (SELECT 1 FROM N1 a CROSS JOIN N1 b),
N3(N) AS (SELECT 1 FROM N2 a CROSS JOIN N2 b),
N4(N) AS (SELECT 1 FROM N3 a CROSS JOIN N3 b),
cteTally(N) AS
( SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(#List,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM n4
),
cteStart(N1) AS
( SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(#List,t.N,1) = #Delimiter OR t.N = 0)
)
SELECT Item = SUBSTRING(#List, s.N1, ISNULL(NULLIF(CHARINDEX(#Delimiter,#List,s.N1),0)-s.N1,8000)),
Position = s.N1,
ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1)
FROM cteStart s
);
Now that you have your Split function, you can split your first string:
DECLARE #ActivityID AS VARCHAR(MAX) = 'BKR394859607,MTP293840284,SPN489620586';
SELECT Item,
NewID = LEFT(Item, 3)
FROM dbo.Split(#ActivityID, ',');
Which gives you:
Item NewID
-----------------------------
BKR394859607 BKR
MTP293840284 MTP
SPN489620586 SPN
Then you can concatenate this back up using FOR XML PATH():
DECLARE #ActivityID AS VARCHAR(MAX) = 'BKR394859607,MTP293840284,SPN489620586';
SELECT STUFF(( SELECT ',' + LEFT(Item, 3)
FROM dbo.Split(#ActivityID, ',')
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'), 1, 1, '');
For more on how this works see this answer.
The optimal solution would probably be to have a user defined table type to store string lists:
CREATE TYPE dbo.StringList AS TABLE (Value VARCHAR(MAX));
Then rather than building a delimited string, build up a table:
DECLARE #Activity dbo.StringList;
INSERT #Activity (Value)
VALUES ('BKR394859607'), ('MTP293840284'), ('SPN489620586');
Then you avoid a painful split, and can manipulate each individual record much mor easily.
If you really did need to get a new delimited string, then you can use the same logic as above:
DECLARE #Activity dbo.StringList;
INSERT #Activity (Value)
VALUES ('BKR394859607'), ('MTP293840284'), ('SPN489620586');
SELECT STUFF(( SELECT ',' + LEFT(Value, 3)
FROM #Activity
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'), 1, 1, '');
Try this:
DECLARE #ActivityID AS VARCHAR(MAX) = 'BKR394859607,MTP293840284,SPN489620586';
DECLARE #ActivityPrefixes AS VARCHAR(MAX);
SET #ActivityPrefixes = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(
#ActivityID,'0', ''), '1',''), '2', ''),'3', ''), '4',''), '5', ''),
'6', ''),'7', ''), '8', ''), '9', '');
SELECT #ActivityPrefixes
If you want to get the first 3 character of each section then try this:
DECLARE #ActivityID AS VARCHAR(MAX) = 'BKR39A859607,M3TP293840284,SP1N4896GG586,BKR394859607,MTP293840284,SPN489620586';
DECLARE #ActivityPrefixes AS VARCHAR(MAX);
SET #ActivityPrefixes = '';
WHILE 1 = 1
BEGIN
DECLARE #i INT;
IF CHARINDEX(',', #ActivityID) = 0
SET #i = 3;
ELSE
SET #i = CHARINDEX(',', #ActivityID) - 1;
IF #ActivityPrefixes = ''
SET #ActivityPrefixes = SUBSTRING(SUBSTRING(#ActivityID, 1, #i), 1,3);
ELSE
SET #ActivityPrefixes = #ActivityPrefixes + ','
+ SUBSTRING(SUBSTRING(#ActivityID, 1, #i), 1, 3);
IF CHARINDEX(',', #ActivityID) = 0
BREAK;
SET #ActivityID = SUBSTRING(#ActivityID,CHARINDEX(',', #ActivityID) + 1,LEN(#ActivityID));
END;
SELECT #ActivityPrefixes;

Resources