SQL Server : extracting number from a string - sql-server

I have to following SQL command to extract only numbers from a string :
UPDATE Oesskattings
SET alfasorteer1 = CASE
WHEN CHARINDEX('-', blokno) > 0
THEN SUBSTRING(blokno + '-', 0, CHARINDEX('-', blokno))
ELSE SUBSTRING(blokno, PATINDEX('%[0-9]%', blokno), LEN(blokno))
END
My problem is when I have a record where blokno is eg 1B (conversion failed where number is followed by character).
How can I improve my code?
Regards

This will pull out all numbers from a string regardless of other characters and sequence.
It uses a Recursive CTE to identify numbers in order, then puts them back together with STUFF XML PATH
DROP TABLE #TMP
CREATE TABLE #TMP(ID INT IDENTITY(1,1),txt VARCHAR(20))
INSERT INTO #TMP VALUES
('q12w--e32w')
,('vfr45tgbnhy67')
,('12wq3&&r5f5')
,('1qw%%23er45t')
,('de32()ws2')
,('desfghj')
;WITH A
AS (
SELECT ID,1 POS
,txt
,SUBSTRING(txt,PATINDEX('%[1-9]%',txt),1) CHR
,RIGHT(txt,LEN(txt)-PATINDEX('%[1-9]%',txt)) REM
FROM #TMP
WHERE PATINDEX('%[1-9]%',txt) > 0
UNION ALL
SELECT ID,POS + 1
,txt
,SUBSTRING(REM,PATINDEX('%[1-9]%',REM),1) CHR
,RIGHT(REM,LEN(REM)-PATINDEX('%[1-9]%',REM)) REM
FROM
A
WHERE
PATINDEX('%[1-9]%',REM) > 0
)
,c AS
(
SELECT
ID,txt
,STUFF(
(SELECT ''+b.chr
FROM a b
WHERE a.ID = b.id
ORDER BY POS
FOR XML PATH('')),1,0,'') AS chrs
FROM
A
)
SELECT DISTINCT
*
FROM
C

What worked for me was to delete all non-numeric characters in the string with :
set alfasorteer1 = substring(blokno, patindex('%[0-9]%', blokno), 1+patindex('%[0-9][^0-9]%', blokno+'x')-patindex('%[0-9]%', blokno))

Related

Regex in SQL Server Replace function

I have a variable with random text, let's say
DECLARE #sNumberFormat NVARCHAR(200) = 'rand{text.here,{999}also-Random9He8re'
I want to replace each 9 in {999} by [0-9]. So in this example I would like to get
'rand{text.here,[0-9][0-9][0-9]also-Random9He8re'
Problem is I never know how many 9 will be placed in brackets, so there can be {99} {9999} ..and go on. I also need to validate if there is any invalid character (not 9) then nothing should be replaced.
I have tried some combinations of REPLACE and PATINDEX functions, but I could not achieve that.
Sans robust regex support, SQL Server's native functions do not give much help here. One approach, a bit hackish, would be to separate the input string into three components:
rand{text.here,
{999}
also-Random9He8re
Next, replace the 9 in the middle target substring with #, or some other character which you don't expect to appear anywhere else in your input string:
rand{text.here,
{###}
also-Random9He8re
Finally, replace the # in the middle substring with [0-9] and then concatenate together to get the final result:
DECLARE #val NVARCHAR(200) = 'rand{text.here,{999}also-Random9He8re'
SELECT REPLACE(
SUBSTRING(#val, 1, CHARINDEX('{9', #val) - 1) +
REPLACE(SUBSTRING(#val,
CHARINDEX('{9', #val) + 1,
CHARINDEX('9}', #val) - CHARINDEX('{9', #val)), '9', '#') +
SUBSTRING(#val, CHARINDEX('9}', #val) + 2, LEN(#val) - CHARINDEX('9}', #val)),
'#', '[0-9]');
So the lazy dev in me suggests this:
SELECT Replace(
Replace(
Replace(
Replace(#input, '{9999}', '[0-9][0-9][0-9][0-9]')
, '{999}', '[0-9][0-9][0-9]')
, '{99}', '[0-9][0-9]')
, '{9}', '[0-9]') AS result
;
You can keep extending as long as you like to perform your (one off?) replacements.
Quick. Simple. Extensible. Hacky.
Sometimes lazy is good enough.
This could be done with CTE series. It works with an arbitrary number of "9" values in square brackets.
Declare #str varchar(max) = 'rand{text.here,{999}also-Random9He8re';
With A As
(Select 1 As Pos
Union All
Select Pos+1 As Pos From A Where Pos < LEN(#str)
),
B As (
Select STRING_AGG(Case When Chr Like '[{9}]' Then Chr Else ' ' End, '') As Chr
From A Cross Apply (Select SUBSTRING(#str,A.Pos,1 )) As T(chr)
),
C As (
Select [value] As pattern,
REPLACE(REPLACE(REPLACE([value], '9', '[0-9]'),'{',''),'}','') As replacement,
ROW_NUMBER() Over (ORDER BY (SELECT NULL)) As Num,
COUNT(*) OVER (ORDER BY (SELECT NULL)) As Cnt
From B Cross Apply STRING_SPLIT(Chr,' ')
Where [value] Like '{%}' And [value] Like '%9%'
),
D As (
Select #str As Result, 1 As Num
Union All
select REPLACE(Result, C.pattern, C.replacement) As Res , D.Num+1 As Num
From D Inner Join C On (D.Num=C.Num)
Where D.Num<=C.Cnt)
Select Top 1 Result
From D
Order by Num Desc
A - Getting a list of character positions in text
B - Getting text with spaces instead of characters other than
'9','{','}'
C- Getting patterns and corresponding replacement values
D - Getting the result using REPLACEMENT function

TSQL: How insert separator between each character in a string

I have a string like this:
Apple
I want to include a separator after each character so the end result will turn out like this:
A,p,p,l,e
In C#, we have one liner method to achieve the above with Regex.Replace('Apple', ".{1}", "$0,");
I can only think of looping each character with charindex to append the separator but seems a little complicated. Is there any elegant way and simpler way to achieve this?
Thanks HABO for the suggestions. I'm able to generate the result that I want using the code but takes a little bit of time to really understand how the code work.
After some searching, I manage to found one useful article to insert empty spaces between each character and it's easier for me to understand.
I modify the code a little to define and include desire separator instead of fixing it to space as the separator:
DECLARE #pos INT = 2 -- location where we want first space
DECLARE #result VARCHAR(100) = 'Apple'
DECLARE #separator nvarchar(5) = ','
WHILE #pos < LEN(#result)+1
BEGIN
SET #result = STUFF(#result, #pos, 0, #separator);
SET #pos = #pos+2;
END
select #result; -- Output: A,p,p,l,e
Reference
In following SQL scripts, I get each character using SUBSTRING() function using with a number table (basically I used spt_values view here for simplicity) and then I concatenate them via two different methods, you can choose one
If you are using SQL Server 2017, we have a new SQL string aggregation function
First script uses string_agg function
declare #str nvarchar(max) = 'Apple'
SELECT
string_agg( substring(#str,number,1) , ',') Within Group (Order By number)
FROM master..spt_values n
WHERE
Type = 'P' and
Number between 1 and len(#str)
If you are working with a previous version, you can use string concatenation using FOR XML Path and SQL Stuff function as follows
declare #str nvarchar(max) = 'Apple'
; with cte as (
SELECT
number,
substring(#str,number,1) as L
FROM master..spt_values n
WHERE
Type = 'P' and
Number between 1 and len(#str)
)
SELECT
STUFF(
(
SELECT
',' + L
FROM cte
order by number
FOR XML PATH('')
), 1, 1, ''
)
Both solution yields the same result, I hope it helps
If you have SQL Server 2017 and a copy of ngrams8k it's ultra simple:
declare #word varchar(100) = 'apple';
select newString = string_agg(token, ',') within group (order by position)
from dbo.ngrams8k(#word,1);
For pre-2017 systems it's almost as simple:
declare #word varchar(100) = 'apple';
select newstring =
( select token + case len(#word)+1-position when 1 then '' else ',' end
from dbo.ngrams8k(#word,1)
order by position
for xml path(''))
One ugly way to do it is to split the string into characters, ideally using a numbers table, and reassemble it with the desired separator.
A less efficient implementation uses recursion in a CTE to split the characters and insert the separator between pairs of characters as it goes:
declare #Sample as VarChar(20) = 'Apple';
declare #Separator as Char = ',';
with Characters as (
select 1 as Position, Substring( #Sample, 1, 1 ) as Character
union all
select Position + 1,
case when Position & 1 = 1 then #Separator else Substring( #Sample, Position / 2 + 1, 1 ) end
from Characters
where Position < 2 * Len( #Sample ) - 1 )
select Stuff( ( select Character + '' from Characters order by Position for XML Path( '' ) ), 1, 0, '' ) as Result;
You can replace the select Stuff... line with select * from Characters; to see what's going on.
Try this
declare #var varchar(50) ='Apple'
;WITH CTE
AS
(
SELECT
SeqNo = 1,
MyStr = #var,
OpStr = CAST('' AS VARCHAR(50))
UNION ALL
SELECT
SeqNo = SeqNo+1,
MyStr = MyStR,
OpStr = CAST(ISNULL(OpStr,'')+SUBSTRING(MyStR,SeqNo,1)+',' AS VARCHAR(50))
FROM CTE
WHERE SeqNo <= LEN(#var)
)
SELECT
OpStr = LEFT(OpStr,LEN(OpStr)-1)
FROM CTE
WHERE SeqNo = LEN(#Var)+1

SQL string before and after certain characters

SELECT NAME
FROM SERVERS
returns:
SDACR.hello.com
SDACR
SDACR\AIR
SDACR.hello.com\WATER
I need the SELECT query for below result:
SDACR
SDACR
SDACR\AIR
SDACR\WATER
Kindly help ! I tried using LEFT and RIGHT functions as below, but not able to get combined output correctly:
SELECT
LEFT(Name, CHARINDEX('.', Name) - 1)
FROM
SERVERS
SELECT
RIGHT(Name, LEN(Name) - CHARINDEX('\', Name))
FROM
SERVERS
It looks like you're just trying to REPLACE a substring of characters in your column. You should try this:
SELECT REPLACE(Name,'.hello.com','') AS ReplacementName
FROM SERVERS
In tsql, you can concatenate values with CONCAT(), or you can simply add strings together with +.
SELECT LEFT(Name, CHARINDEX('.',Name)-1) + RIGHT(Name,LEN(Name)-CHARINDEX('\',Name)) from SERVERS
Also, be careful with doing arithmetic with CHARINDEX(). A value without a '.' or a '\' will return a NULL and you will get an error.
You can use LEFT for this to select everything up to the first period (dot) and add on everything after the last \
declare #servers table ([NAME] varchar(64))
insert into #servers
values
('SDACR.hello.com '),
('SDACR'),
('SDACR\AIR'),
('SDACR.hello.com\WATER')
select
left([NAME],case when charindex('.',[NAME]) = 0 then len([NAME]) else charindex('.',[NAME]) -1 end) +
case when charindex('\',left([NAME],case when charindex('.',[NAME]) = 0 then len([NAME]) else charindex('.',[NAME]) -1 end)) = 0 then right([NAME],charindex('\',reverse([NAME]))) else '' end
from #servers
Throwing my hat in.... Showing how to use Values and APPLY for cleaner code.
-- sample data in an easily consumable format
declare #yourdata table (txt varchar(100));
insert #yourdata values
('SDACR.hello.com'),
('SDACR'),
('SDACR\AIR'),
('SDACR.hello.com\WATER');
-- solution
select
txt,
newTxt =
case
when loc.dot = 0 then txt
when loc.dot > 0 and loc.slash = 0 then substring(txt, 1, loc.dot-1)
else substring(txt, 1, loc.dot-1) + substring(txt, loc.slash, 100)
end
from #yourdata
cross apply (values (charindex('.',txt), (charindex('\',txt)))) loc(dot,slash);
Results
txt newTxt
------------------------------ --------------------
SDACR.hello.com SDACR
SDACR SDACR
SDACR\AIR SDACR\AIR
SDACR.hello.com\WATER SDACR\WATER

Show/extract only the character(s) match in SQL Server query

How can I extract and show only the characters from a string on a column I am searching in SQL Server if the position of the characters varies on the string
Example input:
Mich%ael#
#Scott
Ran%dy
A#nder%son
Output:
%#
#
%
#%
I only able to think of a query like
select
columnname
from
dbo.tablename with (noLock)
where
columnname like '%[%#]%'
but this would not strip and show only the characters I want. I looked at substring() function but this requires knowing the position of the character to be stripped.
If you don't want or can't use a UDF, consider the following:
Declare #YourTable table (SomeField varchar(50))
Insert Into #YourTable values
('Mich%ael#'),
('#Scott'),
('Ran%dy'),
('A#nder%son')
Select A.*
,Stripped = max(B.Value)
From #YourTable A
Cross Apply (
Select Value=Stuff((Select '' + String
From (
Select String= Substring(a.b, v.number+1, 1) From (select A.SomeField b) a
Join master..spt_values v on v.number < len(a.b)
Where v.type = 'P'
) A
Where String in ('%','#') --<<<< This is Your Key Filter
For XML Path ('')),1,0,'')
) B
Group By SomeField
Returns
SomeField Stripped
#Scott #
A#nder%son #%
Mich%ael# %#
Ran%dy %

SQL Server: replace sequence of same characters inside Text Field (TSQL only)

I have a text column varchar(4000) with text:
'aaabbaaacbaaaccc'
and I need to remove all duplicated chars - so only one from sequence left:
'abacbac'
It should not be a function, Procedure or CLR - Regex solution. Only true SQL select.
Currently I think about using recursive WITH clause with replace 'aa'->'a', 'bb'->'b', 'cc'->'c'.
So recursion should cycle until all duplicated sequences of that chars would be replaced.
Do you have another solution, perhaps more performant one?
PS: I searched through this site about different replace examples - they didn't suit to this case.
Assuming a table definition of
CREATE TABLE myTable(rowID INT IDENTITY(1,1), dupedchars NVARCHAR(4000))
and data..
INSERT INTO myTable
SELECT 'aaabbaaacbaaaccc'
UNION
SELECT 'abcdeeeeeffgghhaaabbbjdduuueueu999whwhwwwwwww'
this query meets your criteria
WITH Numbers(n)
AS
( SELECT 1 AS n
UNION ALL
SELECT (n + 1) AS n
FROM Numbers
WHERE n < 4000
)
SELECT rowid,
( SELECT CASE
WHEN SUBSTRING(dupedchars,n2.n,1) = SUBSTRING(dupedchars+' ',n2.n+1,1) THEN ''
ELSE SUBSTRING(dupedchars,n2.n,1)
END AS [text()]
FROM myTable t2,numbers n2
WHERE n2.n <= LEN(dupedchars)
AND t.rowid = t2.rowid
FOR XML path('')
) AS deduped
FROM myTable t
OPTION(MAXRECURSION 4000)
Output
rowid deduped
1 abacbac
2 abcdefghabjdueueu9whwhw

Resources