How to separate Arabic and English text in a string value? - sql-server

I am getting data from a source like this:
Air Passage - First Time Joining تذاكر سفر حضور لأول مره
I need to split this kind of data into two columns, English text should go into one column and Arabic text should go into the other column.
Can any one help me with this please?

One easy solution would be (if possible) format the data to something like this:
Air Passage - First Time Joining | تذاكر سفر حضور لأول مره
And then you just need to do an split by "|".

REGEX
(?P<en>[a-zA-Z-\s]+) (?P<ar>[\w\s]+)
Kiki is nice tool to test for multiple cases (You may need to add more characters to the ranges)
I did remove ^ and $ to be more general case.

USE [HRData]
GO
/* Object: UserDefinedFunction [dbo].[StripVenNameAR] Script Date: 1/14/2014 8:50:31 AM */
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER OFF
GO
CREATE FUNCTION [dbo].[StripVenNameAR] (#InString as NVarChar(4000))
RETURNS smallint AS
BEGIN
Declare #ReturnVal as smallint
Declare #OutString as NVarchar(4000)
Declare #Pos as smallint
Declare #CurChar as NVarChar(1)
SET #ReturnVal = 0
IF LEN(#InString) = 0
SET #ReturnVal = 0
ELSE
BEGIN
Set #Pos = 1
SET #OutString = ' '
WHILE (#Pos <= Len(#InString))
BEGIN
Set #CurChar = SUBSTRING(#InString, #Pos, 1)
if unicode(#CurChar) between 1536 and 1791
SET #ReturnVal = #Pos
Set #Pos = #Pos + 1
if #ReturnVal>=1
Break
else
continue
END
end
RETURN #ReturnVal
end
GO

Related

How do I divide a block of text into individual sentences in SQL server?

I have a block of text with multiple sentences and I want to split them up and show each sentence on a new line. I have tried using CHARINDEX and SUBSTRING but once you get passed the second sentence the code becomes very complex and difficult to repeat. Here I got to two and gave up realising that the code was going to snowball rapidly :
DECLARE #TEXT NVARCHAR(MAX) = 'Has many applications. The price is low. The quality is good. Availability is widespread.'
DECLARE #TEXTLine1 NVARCHAR(MAX) = LEFT(#TEXT,CHARINDEX('.',#TEXT))
DECLARE #TEXTLine2 NVARCHAR(MAX) = SUBSTRING(#TEXT,CHARINDEX('.',#TEXT)+2,CHARINDEX('.',SUBSTRING(#TEXT,CHARINDEX('.',#TEXT)+2,50)))
PRINT #TEXTLine1
PRINT #TEXTLine2
As you can see, I am splitting the sentences based on the full stop. Is there a way to tell SUBSTRING to find the 'nth' instance of a character? This would make the task simple.
use one of the split string functions from here..
Then that would be very easy to do like this..
DECLARE #TEXT NVARCHAR(MAX) = 'Has many applications. The price is low. The quality is good. Availability is widespread.'
select * from
[dbo].[SplitStrings_Numbers](#text,'.')
Output:
Item
Has many applications
The price is low
The quality is good
Availability is widespread
With the help of a parsing function
Declare #TEXT VarChar(max) = 'Price if $15.25 is NOT split. Has many applications. The price is low. The quality is good. Availability is widespread.'
Select * from [dbo].[udf-Str-Parse](#Text,'. ')
Returns
Key_PS Key_Value
1 Price if $15.25 is NOT split
2 Has many applications
3 The price is low
4 The quality is good
5 Availability is widespread.
The UDF
CREATE FUNCTION [dbo].[udf-Str-Parse] (#String varchar(max),#delimeter varchar(10))
--Usage: Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
-- Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
-- Select * from [dbo].[udf-Str-Parse]('id26,id46|id658,id967','|')
Returns #ReturnTable Table (Key_PS int IDENTITY(1,1) NOT NULL , Key_Value varchar(max))
As
Begin
Declare #intPos int,#SubStr varchar(max)
Set #IntPos = CharIndex(#delimeter, #String)
Set #String = Replace(#String,#delimeter+#delimeter,#delimeter)
While #IntPos > 0
Begin
Set #SubStr = Substring(#String, 0, #IntPos)
Insert into #ReturnTable (Key_Value) values (#SubStr)
Set #String = Replace(#String, #SubStr + #delimeter, '')
Set #IntPos = CharIndex(#delimeter, #String)
End
Insert into #ReturnTable (Key_Value) values (#String)
Return
End
While considering my question I came up with this - it seems a bit overkill but I can't think of any other way:
DECLARE #text NVARCHAR(MAX) = 'Has many applications. The price is low. The quality is good. Availability is widespread.'; --Set text.
DECLARE #text2 NVARCHAR(MAX) = LEFT(#text,CHARINDEX('.',#text)); --Extract first sentence.
PRINT #text2; --Print first sentence.
SET #text = RIGHT(#text,LEN(#text)-LEN(#text2)); --Subtract #text2 from #text - will include the space at the begining that was after the first full stop.
WHILE LEN(#text) >0
BEGIN
SET #text = RIGHT(#text, LEN(#text)-1);--Take of the space that after the full stop in previous iteration of #text.
SET #text2 = LEFT(#text,CHARINDEX('.',#text));--Exract the 'new' first sentence.
PRINT #text2;
SET #text = RIGHT(#text,LEN(#text)-LEN(#text2)); --Subtract #text2 from #text - will include the space at the begining that was after the first full stop.
END;
Other suggestions are welcome.
EDIT - John Cappelletti's excellent answer motivated me to improve my own. It now doesn't get caught out by decimals and can also recognise carriage returns. Will have a diffrent application but thought it might be useful to include for anyone looking for either solution.
DECLARE #text NVARCHAR(MAX) =
'I would like to pay £22.99. Has many applications.
£45.00 is good value. The price is low. The quality is good.
Availability is widespread. Good value at £5.00.' --Set text.
DECLARE #text2 NVARCHAR(MAX) = LEFT(#text,CHARINDEX('. ',#text)) --Extract first sentence.
PRINT #text2 --Print first sentence.
SET #text = RIGHT(#text,LEN(#text)-LEN(#text2)) --Subtract #text2 from #text - will include the space at the begining that was after the first full stop.
WHILE LEN(#text) >0
BEGIN
SET #text = RIGHT(#text, LEN(#text)-1)--Take off the space that after the full stop in previous iteration of #text.
SET #text2 = IIF(LEN(LEFT(#text,CHARINDEX('. ',#text)))=0, LEFT(#text,CHARINDEX('.',#text)+LEN(RIGHT(#text, LEN(#text)-LEN(LEFT(#text,CHARINDEX('.',#text)))))),LEFT(#text,CHARINDEX('. ',#text)))--Extract the new first sentence.
PRINT #text2
SET #text = RIGHT(#text,LEN(#text)-LEN(#text2)) --Subtract #text2 from #text - will include the space at the begining that was after the first full stop.
END

Using Wildcards in SQL to delete part of a string [duplicate]

SELECT REPLACE('<strong>100</strong><b>.00 GB', '%^(^-?\d*\.{0,1}\d+$)%', '');
I want to replace any markup between two parts of the number with above regex, but it does not seem to work. I'm not sure if it is regex syntax that's wrong because I tried simpler one such as '%[^0-9]%' just to test but it didn't work either. Does anyone know how can I achieve this?
You can use PATINDEX
to find the first index of the pattern (string's) occurrence. Then use STUFF to stuff another string into the pattern(string) matched.
Loop through each row. Replace each illegal characters with what you want. In your case replace non numeric with blank. The inner loop is if you have more than one illegal character in a current cell that of the loop.
DECLARE #counter int
SET #counter = 0
WHILE(#counter < (SELECT MAX(ID_COLUMN) FROM Table))
BEGIN
WHILE 1 = 1
BEGIN
DECLARE #RetVal varchar(50)
SET #RetVal = (SELECT Column = STUFF(Column, PATINDEX('%[^0-9.]%', Column),1, '')
FROM Table
WHERE ID_COLUMN = #counter)
IF(#RetVal IS NOT NULL)
UPDATE Table SET
Column = #RetVal
WHERE ID_COLUMN = #counter
ELSE
break
END
SET #counter = #counter + 1
END
Caution: This is slow though! Having a varchar column may impact. So using LTRIM RTRIM may help a bit. Regardless, it is slow.
Credit goes to this StackOverFlow answer.
EDIT
Credit also goes to #srutzky
Edit (by #Tmdean)
Instead of doing one row at a time, this answer can be adapted to a more set-based solution. It still iterates the max of the number of non-numeric characters in a single row, so it's not ideal, but I think it should be acceptable in most situations.
WHILE 1 = 1 BEGIN
WITH q AS
(SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
FROM Table)
UPDATE Table
SET Column = STUFF(Column, q.n, 1, '')
FROM q
WHERE Table.ID_Column = q.ID_Column AND q.n != 0;
IF ##ROWCOUNT = 0 BREAK;
END;
You can also improve efficiency quite a lot if you maintain a bit column in the table that indicates whether the field has been scrubbed yet. (NULL represents "Unknown" in my example and should be the column default.)
DECLARE #done bit = 0;
WHILE #done = 0 BEGIN
WITH q AS
(SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
FROM Table
WHERE COALESCE(Scrubbed_Column, 0) = 0)
UPDATE Table
SET Column = STUFF(Column, q.n, 1, ''),
Scrubbed_Column = 0
FROM q
WHERE Table.ID_Column = q.ID_Column AND q.n != 0;
IF ##ROWCOUNT = 0 SET #done = 1;
-- if Scrubbed_Column is still NULL, then the PATINDEX
-- must have given 0
UPDATE table
SET Scrubbed_Column = CASE
WHEN Scrubbed_Column IS NULL THEN 1
ELSE NULLIF(Scrubbed_Column, 0)
END;
END;
If you don't want to change your schema, this is easy to adapt to store intermediate results in a table valued variable which gets applied to the actual table at the end.
Instead of stripping out the found character by its sole position, using Replace(Column, BadFoundCharacter, '') could be substantially faster. Additionally, instead of just replacing the one bad character found next in each column, this replaces all those found.
WHILE 1 = 1 BEGIN
UPDATE dbo.YourTable
SET Column = Replace(Column, Substring(Column, PatIndex('%[^0-9.-]%', Column), 1), '')
WHERE Column LIKE '%[^0-9.-]%'
If ##RowCount = 0 BREAK;
END;
I am convinced this will work better than the accepted answer, if only because it does fewer operations. There are other ways that might also be faster, but I don't have time to explore those right now.
In a general sense, SQL Server does not support regular expressions and you cannot use them in the native T-SQL code.
You could write a CLR function to do that. See here, for example.
For those looking for a performant and easy solution and are willing to enable CLR:
CREATE database TestSQLFunctions
go
use TestSQLFunctions
go
ALTER database TestSQLFunctions set trustworthy on
EXEC sp_configure 'clr enabled', 1
RECONFIGURE WITH OVERRIDE
go
CREATE ASSEMBLY [SQLFunctions]
AUTHORIZATION [dbo]
FROM 0x4D5A90000300000004000000FFFF0000B800000000000000400000000000000000000000000000000000000000000000000000000000000000000000800000000E1FBA0E00B409CD21B8014CCD21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E0D0D0A2400000000000000504500004C0103004BE8B85F0000000000000000E00022200B013000000800000006000000000000C2270000002000000040000000000010002000000002000004000000000000000600000000000000008000000002000000000000030060850000100000100000000010000010000000000000100000000000000000000000702700004F000000004000009803000000000000000000000000000000000000006000000C00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000080000000000000000000000082000004800000000000000000000002E74657874000000C8070000002000000008000000020000000000000000000000000000200000602E72737263000000980300000040000000040000000A0000000000000000000000000000400000402E72656C6F6300000C0000000060000000020000000E00000000000000000000000000004000004200000000000000000000000000000000A4270000000000004800000002000500682000000807000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003A022D02022A020304281000000A2A1E02281100000A2A0042534A4201000100000000000C00000076342E302E33303331390000000005006C00000018020000237E000084020000CC02000023537472696E6773000000005005000004000000235553005405000010000000234755494400000064050000A401000023426C6F620000000000000002000001471500000900000000FA0133001600000100000013000000020000000200000003000000110000000F00000001000000030000000000CA01010000000000060025014F02060092014F02060044001D020F006F02000006006C00E20106000801E2010600D400E20106007901E20106004501E20106005E01E20106008300E2010600580030020600360030020600B700E20106009E00B0010600AA02DB010A00F300FC010A001F00FC010E00C3027E02000000000100000000000100010001001000C3029D0241000100010050200000000096002E001A0001005F2000000000861817020600040000000100BD0200000200F40100000300B102090017020100110017020600190017020A0029001702100031001702100039001702100041001702100049001702100051001702100059001702100061001702150069001702100071001702100079001702100089001702060099002E001A0081001702060020007B0010012E000B002A002E00130033002E001B0052002E0023005B002E002B006D002E0033006D002E003B006D002E0043005B002E004B0073002E0053006D002E005B006D002E0063008B002E006B00B5002E007300C2000480000001000000000000000000000000009D020000040000000000000000000000210016000000000004000000000000000000000021000A00000000000400000000000000000000002100DB010000000000000000003C4D6F64756C653E0053797374656D2E44617461006D73636F726C696200446174614163636573734B696E64005265706C61636500477569644174747269627574650044656275676761626C6541747472696275746500436F6D56697369626C6541747472696275746500417373656D626C795469746C6541747472696275746500417373656D626C7954726164656D61726B417474726962757465005461726765744672616D65776F726B41747472696275746500417373656D626C7946696C6556657273696F6E41747472696275746500417373656D626C79436F6E66696775726174696F6E4174747269627574650053716C46756E6374696F6E41747472696275746500417373656D626C794465736372697074696F6E41747472696275746500436F6D70696C6174696F6E52656C61786174696F6E7341747472696275746500417373656D626C7950726F6475637441747472696275746500417373656D626C79436F7079726967687441747472696275746500417373656D626C79436F6D70616E794174747269627574650052756E74696D65436F6D7061746962696C6974794174747269627574650053797374656D2E52756E74696D652E56657273696F6E696E670053514C46756E6374696F6E732E646C6C0053797374656D0053797374656D2E5265666C656374696F6E007061747465726E004D6963726F736F66742E53716C5365727665722E536572766572002E63746F720053797374656D2E446961676E6F73746963730053797374656D2E52756E74696D652E496E7465726F7053657276696365730053797374656D2E52756E74696D652E436F6D70696C6572536572766963657300446562756767696E674D6F6465730053797374656D2E546578742E526567756C617245787072657373696F6E730053514C46756E6374696F6E73004F626A656374007265706C6163656D656E7400696E70757400526567657800000000000000003A1617E607071B47B964858BCD87458B00042001010803200001052001011111042001010E04200101020600030E0E0E0E08B77A5C561934E0890801000800000000001E01000100540216577261704E6F6E457863657074696F6E5468726F7773010801000200000000001101000C53514C46756E6374696F6E73000005010000000017010012436F7079726967687420C2A920203230323000002901002434346436386231632D393735312D343938612D396665352D32316666333934303738303900000C010007312E302E302E3000004D01001C2E4E45544672616D65776F726B2C56657273696F6E3D76342E352E320100540E144672616D65776F726B446973706C61794E616D65142E4E4554204672616D65776F726B20342E352E32808F010001005455794D6963726F736F66742E53716C5365727665722E5365727665722E446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D342E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038390A4461746141636365737301000000000000982700000000000000000000B2270000002000000000000000000000000000000000000000000000A4270000000000000000000000005F436F72446C6C4D61696E006D73636F7265652E646C6C0000000000FF25002000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001001000000018000080000000000000000000000000000001000100000030000080000000000000000000000000000001000000000048000000584000003C03000000000000000000003C0334000000560053005F00560045005200530049004F004E005F0049004E0046004F0000000000BD04EFFE00000100000001000000000000000100000000003F000000000000000400000002000000000000000000000000000000440000000100560061007200460069006C00650049006E0066006F00000000002400040000005400720061006E0073006C006100740069006F006E00000000000000B0049C020000010053007400720069006E006700460069006C00650049006E0066006F0000007802000001003000300030003000300034006200300000001A000100010043006F006D006D0065006E007400730000000000000022000100010043006F006D00700061006E0079004E0061006D006500000000000000000042000D000100460069006C0065004400650073006300720069007000740069006F006E0000000000530051004C00460075006E006300740069006F006E00730000000000300008000100460069006C006500560065007200730069006F006E000000000031002E0030002E0030002E003000000042001100010049006E007400650072006E0061006C004E0061006D0065000000530051004C00460075006E006300740069006F006E0073002E0064006C006C00000000004800120001004C006500670061006C0043006F007000790072006900670068007400000043006F0070007900720069006700680074002000A90020002000320030003200300000002A00010001004C006500670061006C00540072006100640065006D00610072006B00730000000000000000004A00110001004F0072006900670069006E0061006C00460069006C0065006E0061006D0065000000530051004C00460075006E006300740069006F006E0073002E0064006C006C00000000003A000D000100500072006F0064007500630074004E0061006D00650000000000530051004C00460075006E006300740069006F006E00730000000000340008000100500072006F006400750063007400560065007200730069006F006E00000031002E0030002E0030002E003000000038000800010041007300730065006D0062006C0079002000560065007200730069006F006E00000031002E0030002E0030002E0030000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000C000000C43700000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
WITH PERMISSION_SET = SAFE
go
CREATE FUNCTION RegexReplace(
#input nvarchar(max),
#pattern nvarchar(max),
#replacement nvarchar(max)
) RETURNS nvarchar (max)
AS EXTERNAL NAME SQLFunctions.[SQLFunctions.Regex].Replace;
go
-- outputs This is a test
SELECT dbo.RegexReplace('This is a test 12345','[0-9]','')
Content of the DLL:
I stumbled across this post looking for something else but thought I'd mention a solution I use which is far more efficient - and really should be the default implementation of any function when used with a set based query - which is to use a cross applied table function. Seems the topic is still active so hopefully this is useful to someone.
Example runtime on a few of the answers so far based on running recursive set based queries or scalar function, based on 1m rows test set removing the chars from a random newid, ranges from 34s to 2m05s for the WHILE loop examples and from 1m3s to {forever} for the function examples.
Using a table function with cross apply achieves the same goal in 10s. You may need to adjust it to suit your needs such as the max length it handles.
Function:
CREATE FUNCTION [dbo].[RemoveChars](#InputUnit VARCHAR(40))
RETURNS TABLE
AS
RETURN
(
WITH Numbers_prep(Number) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,Numbers(Number) AS
(
SELECT TOP (ISNULL(LEN(#InputUnit),0))
row_number() OVER (ORDER BY (SELECT NULL))
FROM Numbers_prep a
CROSS JOIN Numbers_prep b
)
SELECT
OutputUnit
FROM
(
SELECT
substring(#InputUnit,Number,1)
FROM Numbers
WHERE substring(#InputUnit,Number,1) like '%[0-9]%'
ORDER BY Number
FOR XML PATH('')
) Sub(OutputUnit)
)
Usage:
UPDATE t
SET column = o.OutputUnit
FROM ##t t
CROSS APPLY [dbo].[RemoveChars](t.column) o
Here is a function I wrote to accomplish this based off of the previous answers.
CREATE FUNCTION dbo.RepetitiveReplace
(
#P_String VARCHAR(MAX),
#P_Pattern VARCHAR(MAX),
#P_ReplaceString VARCHAR(MAX),
#P_ReplaceLength INT = 1
)
RETURNS VARCHAR(MAX)
BEGIN
DECLARE #Index INT;
-- Get starting point of pattern
SET #Index = PATINDEX(#P_Pattern, #P_String);
while #Index > 0
begin
--replace matching charactger at index
SET #P_String = STUFF(#P_String, PATINDEX(#P_Pattern, #P_String), #P_ReplaceLength, #P_ReplaceString);
SET #Index = PATINDEX(#P_Pattern, #P_String);
end
RETURN #P_String;
END;
[Gist][1]
[1]: https://gist.github.com/jkdba/ca13fe8f2a9855c4bdbfd0a5d3dfcda2
Edit:
Originally I had a recursive function here which does not play well with sql server as it has a 32 nesting level limit which would result in an error like the below any time you attempt to make 32+ replacements with the function. Instead of trying to make a server level change to allow more nesting (which could be dangerous like allow never ending loops) switching to a while loop makes a lot more sense.
Maximum stored procedure, function, trigger, or view nesting level exceeded (limit 32).
Wrapping the solution inside a SQL function could be useful if you want to reuse it.
I'm even doing it at the cell level, that's why I'm putting this as a different answer:
CREATE FUNCTION [dbo].[fnReplaceInvalidChars] (#string VARCHAR(300))
RETURNS VARCHAR(300)
BEGIN
DECLARE #str VARCHAR(300) = #string;
DECLARE #Pattern VARCHAR (20) = '%[^a-zA-Z0-9]%';
DECLARE #Len INT;
SELECT #Len = LEN(#String);
WHILE #Len > 0
BEGIN
SET #Len = #Len - 1;
IF (PATINDEX(#Pattern,#str) > 0)
BEGIN
SELECT #str = STUFF(#str, PATINDEX(#Pattern,#str),1,'');
END
ELSE
BEGIN
BREAK;
END
END
RETURN #str
END
A more speedy approach for large strings would look something like this:
CREATE FUNCTION [dbo].[fnReplaceInvalidChars] (#string VARCHAR(MAX))
RETURNS VARCHAR(MAX)
BEGIN
DECLARE #str VARCHAR(MAX) = #string;
DECLARE #Pattern VARCHAR (MAX) = '%[^a-zA-Z0-9]%';
WHILE PATINDEX(#Pattern,#str) > 0
BEGIN
SELECT #str = STUFF(#str, PATINDEX(#Pattern,#str),1,'');
END
RETURN #str
END
I've created this function to clean up a string that contained non numeric characters in a time field. The time contained question marks when they did not added the minutes, something like this 20:??. Function loops through each character and replaces the ? with a 0 :
CREATE FUNCTION [dbo].[CleanTime]
(
-- Add the parameters for the function here
#intime nvarchar(10)
)
RETURNS nvarchar(5)
AS
BEGIN
-- Declare the return variable here
DECLARE #ResultVar nvarchar(5)
DECLARE #char char(1)
-- Add the T-SQL statements to compute the return value here
DECLARE #i int = 1
WHILE #i <= LEN(#intime)
BEGIN
SELECT #char = CASE WHEN substring(#intime,#i,1) like '%[0-9:]%' THEN substring(#intime,#i,1) ELSE '0' END
SELECT #ResultVar = concat(#ResultVar,#char)
set #i = #i + 1
END;
-- Return the result of the function
RETURN #ResultVar
END
I think this solution is faster and simple. I use always CTE/recursive because WHILE is so slow on SQL Server.
I use it in projects I work with and large databases.
/*
Function: dbo.kSql_ReplaceRegExp
Create Date: 20.02.2021
Author: Karcan Ozbal
Description: The given string value will be replaced according to the given regexp/pattern.
Parameter(s): #Value : Value/Text to REPLACE.
#RegExp : The regexp/pattern to be used for REPLACE operation.
Usage: select dbo.kSql_ReplaceRegExp('2T3EST5','%[0-9]%')
Output: 'TEST'
*/
ALTER FUNCTION [dbo].[kSql_ReplaceRegExp](
#Value nvarchar(max),
#RegExp nvarchar(50)
)
RETURNS nvarchar(max)
AS
BEGIN
DECLARE #Result nvarchar(max)
;WITH CTE AS (
SELECT NUM = 1, VALUE = #Value, IDX = PATINDEX(#RegExp, #Value)
UNION ALL
SELECT NUM + 1, VALUE = REPLACE(VALUE, SUBSTRING(VALUE,IDX,1),''), IDX = PATINDEX(#RegExp, REPLACE(VALUE, SUBSTRING(VALUE,IDX,1),''))
FROM CTE
WHERE IDX > 0
)
SELECT TOP(1) #Result = VALUE
FROM CTE
ORDER BY NUM DESC
OPTION (maxrecursion 0)
RETURN #Result
END
If you are doing this just for a parameter coming into a Stored Procedure, you can use the following:
declare #badIndex int
set #badIndex = PatIndex('%[^0-9]%', #Param)
while #badIndex > 0
set #Param = Replace(#Param, Substring(#Param, #badIndex, 1), '')
set #badIndex = PatIndex('%[^0-9]%', #Param)
I thought this was clearer:
ALTER FUNCTION [dbo].[func_ReplaceChars](
#Value nvarchar(max),
#Chars nvarchar(50)
)
RETURNS nvarchar(max)
AS
BEGIN
DECLARE #cLen int = len(#Chars);
DECLARE #curChar int = 0;
WHILE #curChar<#cLen
BEGIN
set #Value = replace(#Value,substring(#Chars,#curChar,1),'');
set #curChar = #curChar + 1;
END;
RETURN #Value
END
I'm using this code similar to several codes above:
DROP FUNCTION [dbo].[fnCleanString]
GO
CREATE FUNCTION [dbo].[fnCleanString] (#input VARCHAR(max), #Pattern
VARCHAR (20))
RETURNS VARCHAR(max)
BEGIN
DECLARE #str VARCHAR(max) = #input;
DECLARE #Len INT;
DECLARE #INDEX INT;
SELECT #Len = LEN(#input);
WHILE #Len > 0
BEGIN
SET #INDEX = PATINDEX(#Pattern,#str);
IF (#INDEX > 0)
BEGIN
SET #str=REPLACE(#str,SUBSTRING(#str,#INDEX, 1), '');
END
ELSE
BEGIN
BREAK;
END
END
RETURN #str
END
You can use it like this:
SELECT CleanName = dbo.[fnCleanString](Name, '%[0-9]%') from YourTable
I think a simpler and faster approach is iterate by each character of the alphabet:
DECLARE #i int
SET #i = 0
WHILE(#i < 256)
BEGIN
IF char(#i) NOT IN ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '.')
UPDATE Table SET Column = replace(Column, char(#i), '')
SET #i = #i + 1
END

Regex pattern inside SQL Replace function?

SELECT REPLACE('<strong>100</strong><b>.00 GB', '%^(^-?\d*\.{0,1}\d+$)%', '');
I want to replace any markup between two parts of the number with above regex, but it does not seem to work. I'm not sure if it is regex syntax that's wrong because I tried simpler one such as '%[^0-9]%' just to test but it didn't work either. Does anyone know how can I achieve this?
You can use PATINDEX
to find the first index of the pattern (string's) occurrence. Then use STUFF to stuff another string into the pattern(string) matched.
Loop through each row. Replace each illegal characters with what you want. In your case replace non numeric with blank. The inner loop is if you have more than one illegal character in a current cell that of the loop.
DECLARE #counter int
SET #counter = 0
WHILE(#counter < (SELECT MAX(ID_COLUMN) FROM Table))
BEGIN
WHILE 1 = 1
BEGIN
DECLARE #RetVal varchar(50)
SET #RetVal = (SELECT Column = STUFF(Column, PATINDEX('%[^0-9.]%', Column),1, '')
FROM Table
WHERE ID_COLUMN = #counter)
IF(#RetVal IS NOT NULL)
UPDATE Table SET
Column = #RetVal
WHERE ID_COLUMN = #counter
ELSE
break
END
SET #counter = #counter + 1
END
Caution: This is slow though! Having a varchar column may impact. So using LTRIM RTRIM may help a bit. Regardless, it is slow.
Credit goes to this StackOverFlow answer.
EDIT
Credit also goes to #srutzky
Edit (by #Tmdean)
Instead of doing one row at a time, this answer can be adapted to a more set-based solution. It still iterates the max of the number of non-numeric characters in a single row, so it's not ideal, but I think it should be acceptable in most situations.
WHILE 1 = 1 BEGIN
WITH q AS
(SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
FROM Table)
UPDATE Table
SET Column = STUFF(Column, q.n, 1, '')
FROM q
WHERE Table.ID_Column = q.ID_Column AND q.n != 0;
IF ##ROWCOUNT = 0 BREAK;
END;
You can also improve efficiency quite a lot if you maintain a bit column in the table that indicates whether the field has been scrubbed yet. (NULL represents "Unknown" in my example and should be the column default.)
DECLARE #done bit = 0;
WHILE #done = 0 BEGIN
WITH q AS
(SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
FROM Table
WHERE COALESCE(Scrubbed_Column, 0) = 0)
UPDATE Table
SET Column = STUFF(Column, q.n, 1, ''),
Scrubbed_Column = 0
FROM q
WHERE Table.ID_Column = q.ID_Column AND q.n != 0;
IF ##ROWCOUNT = 0 SET #done = 1;
-- if Scrubbed_Column is still NULL, then the PATINDEX
-- must have given 0
UPDATE table
SET Scrubbed_Column = CASE
WHEN Scrubbed_Column IS NULL THEN 1
ELSE NULLIF(Scrubbed_Column, 0)
END;
END;
If you don't want to change your schema, this is easy to adapt to store intermediate results in a table valued variable which gets applied to the actual table at the end.
Instead of stripping out the found character by its sole position, using Replace(Column, BadFoundCharacter, '') could be substantially faster. Additionally, instead of just replacing the one bad character found next in each column, this replaces all those found.
WHILE 1 = 1 BEGIN
UPDATE dbo.YourTable
SET Column = Replace(Column, Substring(Column, PatIndex('%[^0-9.-]%', Column), 1), '')
WHERE Column LIKE '%[^0-9.-]%'
If ##RowCount = 0 BREAK;
END;
I am convinced this will work better than the accepted answer, if only because it does fewer operations. There are other ways that might also be faster, but I don't have time to explore those right now.
In a general sense, SQL Server does not support regular expressions and you cannot use them in the native T-SQL code.
You could write a CLR function to do that. See here, for example.
For those looking for a performant and easy solution and are willing to enable CLR:
CREATE database TestSQLFunctions
go
use TestSQLFunctions
go
ALTER database TestSQLFunctions set trustworthy on
EXEC sp_configure 'clr enabled', 1
RECONFIGURE WITH OVERRIDE
go
CREATE ASSEMBLY [SQLFunctions]
AUTHORIZATION [dbo]
FROM 0x4D5A90000300000004000000FFFF0000B800000000000000400000000000000000000000000000000000000000000000000000000000000000000000800000000E1FBA0E00B409CD21B8014CCD21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E0D0D0A2400000000000000504500004C0103004BE8B85F0000000000000000E00022200B013000000800000006000000000000C2270000002000000040000000000010002000000002000004000000000000000600000000000000008000000002000000000000030060850000100000100000000010000010000000000000100000000000000000000000702700004F000000004000009803000000000000000000000000000000000000006000000C00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000080000000000000000000000082000004800000000000000000000002E74657874000000C8070000002000000008000000020000000000000000000000000000200000602E72737263000000980300000040000000040000000A0000000000000000000000000000400000402E72656C6F6300000C0000000060000000020000000E00000000000000000000000000004000004200000000000000000000000000000000A4270000000000004800000002000500682000000807000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003A022D02022A020304281000000A2A1E02281100000A2A0042534A4201000100000000000C00000076342E302E33303331390000000005006C00000018020000237E000084020000CC02000023537472696E6773000000005005000004000000235553005405000010000000234755494400000064050000A401000023426C6F620000000000000002000001471500000900000000FA0133001600000100000013000000020000000200000003000000110000000F00000001000000030000000000CA01010000000000060025014F02060092014F02060044001D020F006F02000006006C00E20106000801E2010600D400E20106007901E20106004501E20106005E01E20106008300E2010600580030020600360030020600B700E20106009E00B0010600AA02DB010A00F300FC010A001F00FC010E00C3027E02000000000100000000000100010001001000C3029D0241000100010050200000000096002E001A0001005F2000000000861817020600040000000100BD0200000200F40100000300B102090017020100110017020600190017020A0029001702100031001702100039001702100041001702100049001702100051001702100059001702100061001702150069001702100071001702100079001702100089001702060099002E001A0081001702060020007B0010012E000B002A002E00130033002E001B0052002E0023005B002E002B006D002E0033006D002E003B006D002E0043005B002E004B0073002E0053006D002E005B006D002E0063008B002E006B00B5002E007300C2000480000001000000000000000000000000009D020000040000000000000000000000210016000000000004000000000000000000000021000A00000000000400000000000000000000002100DB010000000000000000003C4D6F64756C653E0053797374656D2E44617461006D73636F726C696200446174614163636573734B696E64005265706C61636500477569644174747269627574650044656275676761626C6541747472696275746500436F6D56697369626C6541747472696275746500417373656D626C795469746C6541747472696275746500417373656D626C7954726164656D61726B417474726962757465005461726765744672616D65776F726B41747472696275746500417373656D626C7946696C6556657273696F6E41747472696275746500417373656D626C79436F6E66696775726174696F6E4174747269627574650053716C46756E6374696F6E41747472696275746500417373656D626C794465736372697074696F6E41747472696275746500436F6D70696C6174696F6E52656C61786174696F6E7341747472696275746500417373656D626C7950726F6475637441747472696275746500417373656D626C79436F7079726967687441747472696275746500417373656D626C79436F6D70616E794174747269627574650052756E74696D65436F6D7061746962696C6974794174747269627574650053797374656D2E52756E74696D652E56657273696F6E696E670053514C46756E6374696F6E732E646C6C0053797374656D0053797374656D2E5265666C656374696F6E007061747465726E004D6963726F736F66742E53716C5365727665722E536572766572002E63746F720053797374656D2E446961676E6F73746963730053797374656D2E52756E74696D652E496E7465726F7053657276696365730053797374656D2E52756E74696D652E436F6D70696C6572536572766963657300446562756767696E674D6F6465730053797374656D2E546578742E526567756C617245787072657373696F6E730053514C46756E6374696F6E73004F626A656374007265706C6163656D656E7400696E70757400526567657800000000000000003A1617E607071B47B964858BCD87458B00042001010803200001052001011111042001010E04200101020600030E0E0E0E08B77A5C561934E0890801000800000000001E01000100540216577261704E6F6E457863657074696F6E5468726F7773010801000200000000001101000C53514C46756E6374696F6E73000005010000000017010012436F7079726967687420C2A920203230323000002901002434346436386231632D393735312D343938612D396665352D32316666333934303738303900000C010007312E302E302E3000004D01001C2E4E45544672616D65776F726B2C56657273696F6E3D76342E352E320100540E144672616D65776F726B446973706C61794E616D65142E4E4554204672616D65776F726B20342E352E32808F010001005455794D6963726F736F66742E53716C5365727665722E5365727665722E446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D342E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038390A4461746141636365737301000000000000982700000000000000000000B2270000002000000000000000000000000000000000000000000000A4270000000000000000000000005F436F72446C6C4D61696E006D73636F7265652E646C6C0000000000FF25002000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001001000000018000080000000000000000000000000000001000100000030000080000000000000000000000000000001000000000048000000584000003C03000000000000000000003C0334000000560053005F00560045005200530049004F004E005F0049004E0046004F0000000000BD04EFFE00000100000001000000000000000100000000003F000000000000000400000002000000000000000000000000000000440000000100560061007200460069006C00650049006E0066006F00000000002400040000005400720061006E0073006C006100740069006F006E00000000000000B0049C020000010053007400720069006E006700460069006C00650049006E0066006F0000007802000001003000300030003000300034006200300000001A000100010043006F006D006D0065006E007400730000000000000022000100010043006F006D00700061006E0079004E0061006D006500000000000000000042000D000100460069006C0065004400650073006300720069007000740069006F006E0000000000530051004C00460075006E006300740069006F006E00730000000000300008000100460069006C006500560065007200730069006F006E000000000031002E0030002E0030002E003000000042001100010049006E007400650072006E0061006C004E0061006D0065000000530051004C00460075006E006300740069006F006E0073002E0064006C006C00000000004800120001004C006500670061006C0043006F007000790072006900670068007400000043006F0070007900720069006700680074002000A90020002000320030003200300000002A00010001004C006500670061006C00540072006100640065006D00610072006B00730000000000000000004A00110001004F0072006900670069006E0061006C00460069006C0065006E0061006D0065000000530051004C00460075006E006300740069006F006E0073002E0064006C006C00000000003A000D000100500072006F0064007500630074004E0061006D00650000000000530051004C00460075006E006300740069006F006E00730000000000340008000100500072006F006400750063007400560065007200730069006F006E00000031002E0030002E0030002E003000000038000800010041007300730065006D0062006C0079002000560065007200730069006F006E00000031002E0030002E0030002E0030000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000C000000C43700000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
WITH PERMISSION_SET = SAFE
go
CREATE FUNCTION RegexReplace(
#input nvarchar(max),
#pattern nvarchar(max),
#replacement nvarchar(max)
) RETURNS nvarchar (max)
AS EXTERNAL NAME SQLFunctions.[SQLFunctions.Regex].Replace;
go
-- outputs This is a test
SELECT dbo.RegexReplace('This is a test 12345','[0-9]','')
Content of the DLL:
I stumbled across this post looking for something else but thought I'd mention a solution I use which is far more efficient - and really should be the default implementation of any function when used with a set based query - which is to use a cross applied table function. Seems the topic is still active so hopefully this is useful to someone.
Example runtime on a few of the answers so far based on running recursive set based queries or scalar function, based on 1m rows test set removing the chars from a random newid, ranges from 34s to 2m05s for the WHILE loop examples and from 1m3s to {forever} for the function examples.
Using a table function with cross apply achieves the same goal in 10s. You may need to adjust it to suit your needs such as the max length it handles.
Function:
CREATE FUNCTION [dbo].[RemoveChars](#InputUnit VARCHAR(40))
RETURNS TABLE
AS
RETURN
(
WITH Numbers_prep(Number) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,Numbers(Number) AS
(
SELECT TOP (ISNULL(LEN(#InputUnit),0))
row_number() OVER (ORDER BY (SELECT NULL))
FROM Numbers_prep a
CROSS JOIN Numbers_prep b
)
SELECT
OutputUnit
FROM
(
SELECT
substring(#InputUnit,Number,1)
FROM Numbers
WHERE substring(#InputUnit,Number,1) like '%[0-9]%'
ORDER BY Number
FOR XML PATH('')
) Sub(OutputUnit)
)
Usage:
UPDATE t
SET column = o.OutputUnit
FROM ##t t
CROSS APPLY [dbo].[RemoveChars](t.column) o
Here is a function I wrote to accomplish this based off of the previous answers.
CREATE FUNCTION dbo.RepetitiveReplace
(
#P_String VARCHAR(MAX),
#P_Pattern VARCHAR(MAX),
#P_ReplaceString VARCHAR(MAX),
#P_ReplaceLength INT = 1
)
RETURNS VARCHAR(MAX)
BEGIN
DECLARE #Index INT;
-- Get starting point of pattern
SET #Index = PATINDEX(#P_Pattern, #P_String);
while #Index > 0
begin
--replace matching charactger at index
SET #P_String = STUFF(#P_String, PATINDEX(#P_Pattern, #P_String), #P_ReplaceLength, #P_ReplaceString);
SET #Index = PATINDEX(#P_Pattern, #P_String);
end
RETURN #P_String;
END;
[Gist][1]
[1]: https://gist.github.com/jkdba/ca13fe8f2a9855c4bdbfd0a5d3dfcda2
Edit:
Originally I had a recursive function here which does not play well with sql server as it has a 32 nesting level limit which would result in an error like the below any time you attempt to make 32+ replacements with the function. Instead of trying to make a server level change to allow more nesting (which could be dangerous like allow never ending loops) switching to a while loop makes a lot more sense.
Maximum stored procedure, function, trigger, or view nesting level exceeded (limit 32).
Wrapping the solution inside a SQL function could be useful if you want to reuse it.
I'm even doing it at the cell level, that's why I'm putting this as a different answer:
CREATE FUNCTION [dbo].[fnReplaceInvalidChars] (#string VARCHAR(300))
RETURNS VARCHAR(300)
BEGIN
DECLARE #str VARCHAR(300) = #string;
DECLARE #Pattern VARCHAR (20) = '%[^a-zA-Z0-9]%';
DECLARE #Len INT;
SELECT #Len = LEN(#String);
WHILE #Len > 0
BEGIN
SET #Len = #Len - 1;
IF (PATINDEX(#Pattern,#str) > 0)
BEGIN
SELECT #str = STUFF(#str, PATINDEX(#Pattern,#str),1,'');
END
ELSE
BEGIN
BREAK;
END
END
RETURN #str
END
A more speedy approach for large strings would look something like this:
CREATE FUNCTION [dbo].[fnReplaceInvalidChars] (#string VARCHAR(MAX))
RETURNS VARCHAR(MAX)
BEGIN
DECLARE #str VARCHAR(MAX) = #string;
DECLARE #Pattern VARCHAR (MAX) = '%[^a-zA-Z0-9]%';
WHILE PATINDEX(#Pattern,#str) > 0
BEGIN
SELECT #str = STUFF(#str, PATINDEX(#Pattern,#str),1,'');
END
RETURN #str
END
I've created this function to clean up a string that contained non numeric characters in a time field. The time contained question marks when they did not added the minutes, something like this 20:??. Function loops through each character and replaces the ? with a 0 :
CREATE FUNCTION [dbo].[CleanTime]
(
-- Add the parameters for the function here
#intime nvarchar(10)
)
RETURNS nvarchar(5)
AS
BEGIN
-- Declare the return variable here
DECLARE #ResultVar nvarchar(5)
DECLARE #char char(1)
-- Add the T-SQL statements to compute the return value here
DECLARE #i int = 1
WHILE #i <= LEN(#intime)
BEGIN
SELECT #char = CASE WHEN substring(#intime,#i,1) like '%[0-9:]%' THEN substring(#intime,#i,1) ELSE '0' END
SELECT #ResultVar = concat(#ResultVar,#char)
set #i = #i + 1
END;
-- Return the result of the function
RETURN #ResultVar
END
I think this solution is faster and simple. I use always CTE/recursive because WHILE is so slow on SQL Server.
I use it in projects I work with and large databases.
/*
Function: dbo.kSql_ReplaceRegExp
Create Date: 20.02.2021
Author: Karcan Ozbal
Description: The given string value will be replaced according to the given regexp/pattern.
Parameter(s): #Value : Value/Text to REPLACE.
#RegExp : The regexp/pattern to be used for REPLACE operation.
Usage: select dbo.kSql_ReplaceRegExp('2T3EST5','%[0-9]%')
Output: 'TEST'
*/
ALTER FUNCTION [dbo].[kSql_ReplaceRegExp](
#Value nvarchar(max),
#RegExp nvarchar(50)
)
RETURNS nvarchar(max)
AS
BEGIN
DECLARE #Result nvarchar(max)
;WITH CTE AS (
SELECT NUM = 1, VALUE = #Value, IDX = PATINDEX(#RegExp, #Value)
UNION ALL
SELECT NUM + 1, VALUE = REPLACE(VALUE, SUBSTRING(VALUE,IDX,1),''), IDX = PATINDEX(#RegExp, REPLACE(VALUE, SUBSTRING(VALUE,IDX,1),''))
FROM CTE
WHERE IDX > 0
)
SELECT TOP(1) #Result = VALUE
FROM CTE
ORDER BY NUM DESC
OPTION (maxrecursion 0)
RETURN #Result
END
If you are doing this just for a parameter coming into a Stored Procedure, you can use the following:
declare #badIndex int
set #badIndex = PatIndex('%[^0-9]%', #Param)
while #badIndex > 0
set #Param = Replace(#Param, Substring(#Param, #badIndex, 1), '')
set #badIndex = PatIndex('%[^0-9]%', #Param)
I thought this was clearer:
ALTER FUNCTION [dbo].[func_ReplaceChars](
#Value nvarchar(max),
#Chars nvarchar(50)
)
RETURNS nvarchar(max)
AS
BEGIN
DECLARE #cLen int = len(#Chars);
DECLARE #curChar int = 0;
WHILE #curChar<#cLen
BEGIN
set #Value = replace(#Value,substring(#Chars,#curChar,1),'');
set #curChar = #curChar + 1;
END;
RETURN #Value
END
I'm using this code similar to several codes above:
DROP FUNCTION [dbo].[fnCleanString]
GO
CREATE FUNCTION [dbo].[fnCleanString] (#input VARCHAR(max), #Pattern
VARCHAR (20))
RETURNS VARCHAR(max)
BEGIN
DECLARE #str VARCHAR(max) = #input;
DECLARE #Len INT;
DECLARE #INDEX INT;
SELECT #Len = LEN(#input);
WHILE #Len > 0
BEGIN
SET #INDEX = PATINDEX(#Pattern,#str);
IF (#INDEX > 0)
BEGIN
SET #str=REPLACE(#str,SUBSTRING(#str,#INDEX, 1), '');
END
ELSE
BEGIN
BREAK;
END
END
RETURN #str
END
You can use it like this:
SELECT CleanName = dbo.[fnCleanString](Name, '%[0-9]%') from YourTable
I think a simpler and faster approach is iterate by each character of the alphabet:
DECLARE #i int
SET #i = 0
WHILE(#i < 256)
BEGIN
IF char(#i) NOT IN ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '.')
UPDATE Table SET Column = replace(Column, char(#i), '')
SET #i = #i + 1
END

function separate string

I have this question for you.I am working a function which has the whole address in one field.I am trying to separate it.I have started working on the one below and I am having difficulty when I try to work on the zip ,I want to test if there is a zip first at the end and if so I am trying to separate it from the state.Could you please have alook at it?Thanks alot guys as usual I appreciate you support..
declare #var1 varchar(100)='1234 S.Almeda way,Seattle,WA9810'--just an example
,#u int
,#r int
,#var2 varchar(100)
,#var3 varchar(100)
,#Zip varchar(25)
,#var4 varchar(100)=null
set #u = charindex(',', #var1)
set #var2=rtrim(ltrim(substring(#var1, #u+1, 999)))
set #r=CHARINDEX(',',#var2)
set #var3=rtrim(ltrim(substring(#var2, #r+1, 999)))
--set #var4=RIGHT(#var3,5)--not enough
if (len(#var3)>=5 and ISNUMERIC(#var3)=1 )
set #var4=RIGHT(#var3,5)
set rtrim(substring(#var3,1,len(#var3)-5))
else set #var4=''
Here's some sample code you can merge into yours
declare #var1 varchar(100)='1234 S.Almeda way,Seattle,WA9810'--just an example
declare #lastcomma int = len(#var1) - charindex(',', reverse(#var1)+',')
declare #lastPart varchar(100) = substring(#var1, #lastcomma+2, 100)
select #lastPart
declare #zipstart int = patindex('%[0-9]%', #lastpart)
declare #zip varchar(5) = ''
if #zipstart > 0
select #zip = substring(#lastpart, #zipstart, 5), #lastPart = rtrim(substring(#lastpart,1,#zipstart-1))
select #lastpart, #zip
You are obvioulsy looking for a split function, which is not built-in SQL Server.
I've stop reading your code when I saw the name you give to your variables (really bad choices, it should have some kind of meaning)
There are many ways to implement it, I'll pick one randomly from a google search (nah, I'm not ashamed, I don't want to reinvent the wheel)
CREATE FUNCTION dbo.Split(#String varchar(8000), #Delimiter char(1))
returns #temptable TABLE (items varchar(8000))
as
begin
declare #idx int
declare #slice varchar(8000)
select #idx = 1
if len(#String)<1 or #String is null return
while #idx!= 0
begin
set #idx = charindex(#Delimiter,#String)
if #idx!=0
set #slice = left(#String,#idx - 1)
else
set #slice = #String
if(len(#slice)>0)
insert into #temptable(Items) values(#slice)
set #String = right(#String,len(#String) - #idx)
if len(#String) = 0 break
end
return
end
use it this way :
select top 10 * from dbo.split('1234 S.Almeda way,Seattle,WA9810',',')
It'll give you a column with a result in each row
SOURCE : http://blog.logiclabz.com/sql-server/split-function-in-sql-server-to-break-comma-separated-strings-into-table.aspx
You'll find plenty of other example with a quick web search. ;)

TSQL UDF To Split String Every 8 Characters

Someone decided to stuff a bunch of times together into a single column, so the column value might look like this:
08:00 AM01:00 PM
And another column contains the date in the following format;
20070906
I want to write a UDF to normalize this data in a single SQL query, so I can get back 2 rows of datetime type for the above example
2007-09-06 08:00:00.000
2007-09-06 13:00:00.000
The conversion to datetime type is simple...but I need to split the time part every 8 characters to get the individual time out.
Anyone know of an existing UDF to do this?
Try this, it'll split your string into chunks of the specified lenth:
create function SplitString
(
#str varchar(max),
#length int
)
returns #Results table( Result varchar(50) )
AS
begin
declare #s varchar(50)
while len(#str) > 0
begin
set #s = left(#str, #length)
set #str = right(#str, len(#str) - #length)
insert #Results values (#s)
end
return
end
For example:
select * from dbo.SplitString('08:00 AM01:00 PM', 8)
Will give this result:
Result
08:00 AM
01:00 PM
There is a bug in the query above, the below query fixes this.
Also, I have made the returned table contain a sequence column so that it is possible to determine what sequence the split is in:
CREATE function SplitString
(
#str varchar(max),
#length int
)
RETURNS #Results TABLE( Result varchar(50),Sequence INT )
AS
BEGIN
DECLARE #Sequence INT
SET #Sequence = 1
DECLARE #s varchar(50)
WHILE len(#str) > 0
BEGIN
SET #s = left(#str, #length)
INSERT #Results VALUES (#s,#Sequence)
IF(len(#str)<#length)
BREAK
SET #str = right(#str, len(#str) - #length)
SET #Sequence = #Sequence + 1
END
RETURN
END

Resources