T-SQL string manipulation, replacement, comparing, pattern matching, regular expressions

T-SQL string manipulation, replacement, comparing, pattern matching, regular expressions - sql-server

I have a short string of alphanumeric characters A-Z and 0-9
Both Characters AND Numbers are included in the string.
I want to strip spaces, and compare each string against a 'pattern' of which it will match only one. The Patterns use A to denote any character A-Z and 9 for any 0-9.
The 6 patterns are:
A99AA
A999AA
A9A9AA
AA99AA
AA999AA
AA9A9AA
I have these in a table with another column, with the correct space in place :-
pattern PatternTrimmed
A9 9AA A99AA
A99 9AA A999AA
A9A 9AA A9A9AA
AA9 9AA AA99AA
AA99 9AA AA999AA
AA9A 9AA AA9A9AA
I am using SQL Server 2005, and I don't want to have 34 replace statements changing each of the characters and numbers to A's and 9's.
Suggestions on how I can achieve this in a short succinct way, please.
Here's what I want to avoid :-
update postcodes set Pattern = replace (Pattern, 'B', 'A')
update postcodes set Pattern = replace (Pattern, 'C', 'A')
update postcodes set Pattern = replace (Pattern, 'D', 'A')
update postcodes set Pattern = replace (Pattern, 'E', 'A')
etc.
and
update postcodes set Pattern = replace (Pattern, '0', '9')
update postcodes set Pattern = replace (Pattern, '1', '9')
update postcodes set Pattern = replace (Pattern, '2', '9')
etc
Basically, I am trying to take a UK postcode typed in at a call centre by an imbecile, and pattern match the entered postcode against one of the 6 above patterns, and work out where to insert the space.

What about something like this:
Declare #table table
(
ColumnToCompare varchar(20),
AmendedValue varchar(20)
)
Declare #patterns table
(
Pattern varchar(20),
TrimmedPattern varchar(20)
)
Insert Into #table (ColumnToCompare)
Select 'BBB87 BBB'
Union all
Select 'J97B B'
union all
select '282 8289'
union all
select 'UW83 7YY'
union all
select 'UW83 7Y0'
Insert Into #patterns
Select 'A9 9AA', 'A99AA'
union all
Select 'A99 9AA', 'A999AA'
union all
Select 'A9A 9AA', 'A9A9AA'
union all
Select 'AA9 9AA', 'AA99AA'
union all
Select 'AA99 9AA', 'AA999AA'
union all
Select 'AA9A 9AA', 'AA9A9AA'
Update #table
Set AmendedValue = Left(Replace(ColumnToCompare, ' ',''), (CharIndex(' ', Pattern)-1)) + space(1) +
SubString(Replace(ColumnToCompare, ' ',''), (CharIndex(' ', Pattern)), (Len(ColumnToCompare) - (CharIndex(' ', Pattern)-1)))
From #table
Cross Join #Patterns
Where PatIndex(Replace((Replace(TrimmedPattern, 'A','[A-Z]')), '9','[0-9]'), Replace(ColumnToCompare, ' ' ,'')) > 0
select * From #table
This part
Left(Replace(ColumnToCompare, ' ',''), (CharIndex(' ', Pattern)-1))
finds the space in the pattern that has been matched and takes the left hand portion of the string being compared.
it then adds a space
+ space(1) +
then this part
SubString(Replace(ColumnToCompare, ' ',''), (CharIndex(' ', Pattern)), (Len(ColumnToCompare) - (CharIndex(' ', Pattern)-1)))
appends the remainder of the string to the new value.

Related

Regex in SQL Server Replace function

I have a variable with random text, let's say
DECLARE #sNumberFormat NVARCHAR(200) = 'rand{text.here,{999}also-Random9He8re'
I want to replace each 9 in {999} by [0-9]. So in this example I would like to get
'rand{text.here,[0-9][0-9][0-9]also-Random9He8re'
Problem is I never know how many 9 will be placed in brackets, so there can be {99} {9999} ..and go on. I also need to validate if there is any invalid character (not 9) then nothing should be replaced.
I have tried some combinations of REPLACE and PATINDEX functions, but I could not achieve that.

Sans robust regex support, SQL Server's native functions do not give much help here. One approach, a bit hackish, would be to separate the input string into three components:
rand{text.here,
{999}
also-Random9He8re
Next, replace the 9 in the middle target substring with #, or some other character which you don't expect to appear anywhere else in your input string:
rand{text.here,
{###}
also-Random9He8re
Finally, replace the # in the middle substring with [0-9] and then concatenate together to get the final result:
DECLARE #val NVARCHAR(200) = 'rand{text.here,{999}also-Random9He8re'
SELECT REPLACE(
SUBSTRING(#val, 1, CHARINDEX('{9', #val) - 1) +
REPLACE(SUBSTRING(#val,
CHARINDEX('{9', #val) + 1,
CHARINDEX('9}', #val) - CHARINDEX('{9', #val)), '9', '#') +
SUBSTRING(#val, CHARINDEX('9}', #val) + 2, LEN(#val) - CHARINDEX('9}', #val)),
'#', '[0-9]');

So the lazy dev in me suggests this:
SELECT Replace(
Replace(
Replace(
Replace(#input, '{9999}', '[0-9][0-9][0-9][0-9]')
, '{999}', '[0-9][0-9][0-9]')
, '{99}', '[0-9][0-9]')
, '{9}', '[0-9]') AS result
;
You can keep extending as long as you like to perform your (one off?) replacements.
Quick. Simple. Extensible. Hacky.
Sometimes lazy is good enough.

This could be done with CTE series. It works with an arbitrary number of "9" values in square brackets.
Declare #str varchar(max) = 'rand{text.here,{999}also-Random9He8re';
With A As
(Select 1 As Pos
Union All
Select Pos+1 As Pos From A Where Pos < LEN(#str)
),
B As (
Select STRING_AGG(Case When Chr Like '[{9}]' Then Chr Else ' ' End, '') As Chr
From A Cross Apply (Select SUBSTRING(#str,A.Pos,1 )) As T(chr)
),
C As (
Select [value] As pattern,
REPLACE(REPLACE(REPLACE([value], '9', '[0-9]'),'{',''),'}','') As replacement,
ROW_NUMBER() Over (ORDER BY (SELECT NULL)) As Num,
COUNT(*) OVER (ORDER BY (SELECT NULL)) As Cnt
From B Cross Apply STRING_SPLIT(Chr,' ')
Where [value] Like '{%}' And [value] Like '%9%'
),
D As (
Select #str As Result, 1 As Num
Union All
select REPLACE(Result, C.pattern, C.replacement) As Res , D.Num+1 As Num
From D Inner Join C On (D.Num=C.Num)
Where D.Num<=C.Cnt)
Select Top 1 Result
From D
Order by Num Desc
A - Getting a list of character positions in text
B - Getting text with spaces instead of characters other than
'9','{','}'
C- Getting patterns and corresponding replacement values
D - Getting the result using REPLACEMENT function

How to replace right most string in SQL

I want to replace rightmost same characters from string.
e.g string is like in this format
"GGENXG00126""XLOXXXXX"
in sql but last consecutive X lenght is not fix.
I already Tried in SQL is
select REPLACE('"GGENXG00126""XLOXXXXX"', 'X', '')
but using this all 'X' is removed. I want to remove only rightmost same characters and output expected "GGENG00126""LO".

You can replace all x with spaces, RTRIM, then undo the replacements:
SELECT '"' + REPLACE(RTRIM(REPLACE(SUBSTRING(str, 2, LEN(str) - 2), 'X', ' ')), ' ', 'X') + '"'
FROM (VALUES
('"GGENXG00126""XLOXXXXX"')
) v(str)
-- "GGENXG00126""XLO"

An alternative idea using PATINDEX and REVERSE, to find the first character that isn't the final character in the string. (Assumes all strings are quoted):
SELECT REVERSE(STUFF(R.ReverseString,1,PATINDEX('%[^' + LEFT(R.ReverseString,1) + ']%',R.ReverseString)-1,'')) + '"'
FROM (VALUES('"GGENXG00126""XLOXXXXX"'))V(YourString)
CROSS APPLY (VALUES(STUFF(REVERSE(V.YourString),1,1,''))) R(ReverseString);

You can try this below option-
DECLARE #InputString VARCHAR(200) = 'GGENXG00126""XLOXXXXX'
SELECT
LEFT(
#InputString,
LEN(#InputString)+
1-
PATINDEX(
'%[^X]%',
REVERSE(#InputString)
)
)
Output is-
GGENXG00126""XLO

T-SQL: How to replace spaces in a string except if they are after a specific character

The situation is as follows:
We have action logs in our database triggered by user events, that saves the events in varchar but in xml format. In some cases the name of the attributes contains spaces like this one:
<UNITDETAILUPDATE NEWUNIT TYPE="DUW 30 01" OLDFAULT_CIRC="HWS" NEWFAULT_CIRC="HWS" OLDOUTGOING R-STATE="R3C" />
I would like to eliminate the spaces from the names of the attributes before parsing to xml(because this way it is not possible of course :))
As you can see there are multiple occurences in the string. A great solution would be something like only replacing the spaces where there is no " character before them, but I have no idea how to achieve this.
Any ideas?
Thank you :)

For a high-performing set-based solution you can grab a copy of ngrams8k and do this:
DECLARE #string varchar(1000) = '<UNITDETAILUPDATE NEWUNIT TYPE="DUW 30 01" OLDFAULT_CIRC="HWS" NEWFAULT_CIRC="HWS" OLDOUTGOING R-STATE="R3C" />';
select newString =
(
select
case when token = ' ' and position > space1 and isQuoted = 0 and p.c <> '"'
then '' else token end
from
(
select ng.*, sum(case when token = '"' then 1 else 0 end) over (order by position)%2
from dbo.ngrams8k(#string, 1) ng
) x(position, token, isQuoted)
cross join (values (charindex(' ', #string))) v(space1)
cross apply (values (substring(#string, position-1,1))) p(c)
order by position
for xml path(''), type
).value('(text())[1]', 'varchar(8000)');
Results
<UNITDETAILUPDATE NEWUNITTYPE="DUW 30 01" OLDFAULT_CIRC="HWS" NEWFAULT_CIRC="HWS" OLDOUTGOINGR-STATE="R3C" />
If you have a SQL Server 2017 you can use string_agg like with ngrams8k like this:
select newString = string_agg(
case when token = ' ' and position > space1 and isQuoted = 0
and substring(#string, position-1,1) <> '"' then '' else token end,'')
from
(
select ng.*, sum(case when token = '"' then 1 else 0 end) over (order by position)%2
from dbo.ngrams8k(#string, 1) ng
) x(position, token, isQuoted)
cross join (values (charindex(' ', #string))) v(space1)
cross apply (values (substring(#string, position-1,1))) p(c);

You could search for good spaces and save them with a placeholder
Declare #var varchar(100) = '<UNITDETAILUPDATE NEWUNIT TYPE="DUW 30 01" OLDFAULT_CIRC="HWS" NEWFAULT_CIRC="HWS" OLDOUTGOING R-STATE="R3C" />'
Select #var = replace(#var,'" ','"|")
Then remove the spaces
Select #var = replace(#var,' ','_')
Then put the good spaces back
Select #var = replace(replace(#var,'|',' '),'UNITDETAILUPDATE_','UNITDETAILUPDATE ')
This could be combined into one ugly replace so that it could be selected across a table. You would probably need to placehold the spaces inside the quotations. Regex is not supported in SQL but sometimes it could be used with 'like'

This "Xml" is awfully bad...
The following approach won't be fast. If you need this more often, you might use another language or tool.
This solutions uses a recursive CTE, which is a hidden RBAR, to build ab the string again, charachter by character, checking for "within quotes":
DECLARE #BadXml NVARCHAR(MAX)='<UNITDETAILUPDATE NEWUNIT TYPE="DUW 30 01" OLDFAULT_CIRC="HWS" NEWFAULT_CIRC="HWS" OLDOUTGOING R-STATE="R3C" />';
WITH recCTE
AS
(
SELECT LTRIM(RTRIM(REPLACE(#BadXml,'" ','"$'))) AS TheString
,1 AS CurrentPos
,CAST('<' AS NVARCHAR(MAX)) AS BuildNew
,-1 AS IsFirstBlank
,-1 AS QuotOpen
UNION ALL
SELECT r.TheString
,r.CurrentPos+1
,r.BuildNew + CASE WHEN chr=' ' AND r.IsFirstBlank=1 AND r.QuotOpen=-1 THEN '_' ELSE chr END
,CASE WHEN r.IsFirstBlank=-1 AND chr=' ' THEN 1 ELSE r.IsFirstBlank END
,CASE WHEN chr='"' THEN r.QuotOpen * (-1) ELSE r.QuotOpen END
FROM recCTE AS r
CROSS APPLY(SELECT SUBSTRING(r.TheString,r.CurrentPos+1,1)) AS A(chr)
WHERE r.CurrentPos<LEN(r.TheString)
)
SELECT TOP 1 IsFirstBlank,QuotOpen, CAST(REPLACE(BuildNew,'"$','" ') AS XML) AS TheXml
FROM recCTE
ORDER BY LEN(BuildNew) DESC
OPTION (MAXRECURSION 1000)
The result
IsFirstBlank QuotOpen TheXml
1 -1 <UNITDETAILUPDATE NEWUNIT_TYPE="DUW 30 01" OLDFAULT_CIRC="HWS" NEWFAULT_CIRC="HWS" OLDOUTGOING_R-STATE="R3C" />
Take away the CAST to xml, the TOP 1 and the ORDER BY to see how it works.

SQL String: Counting Words inside a String

I searched through many of the questions here but all I found with decent answer is for different language like Javascript etc.
I have a simple task in SQL that I can't seem to find a simple way to do.
I just need to count the number of "words" inside a SQL string (a sentence). You can see why "words" is in quotes in my examples. The "words" are delimited by white space.
Sample sentences:
1. I am not your father.
2. Where are your brother,sister,mother?
3. Where are your brother, sister and mother?
4. Who are you?
Desired answer:
1. 5
2. 4
3. 7
4. 3
As you can see, I need to count the "words" disregarding the symbols (I have to treat them as part of the word). So in sample no. 2:
(1)Where (2)are (3)your (4)brother,sister,mother? = 4
I can handle the multiple whitespaces by doing a replace like this:
REPLACE(string, ' ', ' ') -> 2 whitespaces to 1
REPLACE(string, ' ', ' ') -> 3 whitespaces to 1 and so on..
What SQL function can I use to do this? I use SQL Server 2012 but needs a function that works in SQL Server 2008 as well.

Here is one way to do it:
Create and populate sample table (Please save is this step in your future questions)
DECLARE #T AS TABLE
(
id int identity(1,1),
string varchar(100)
)
INSERT INTO #T VALUES
('I am not your father.'),
('Where are your brother,sister,mother?'),
('Where are your brother, sister and mother?'),
('Who are you?')
Use a cte to replace multiple spaces to a single space (Thanks to Gordon Linoff's answer here)
;WITH CTE AS
(
SELECT Id,
REPLACE(REPLACE(REPLACE(string, ' ', '><' -- Note that there are 2 spaces here
), '<>', ''
), '><', ' '
) as string
FROM #T
)
Query the CTE - length of the string - length of the string without spaces + 1:
SELECT id, LEN(string) - LEN(REPLACE(string, ' ', '')) + 1 as CountWords
FROM CTE
Results:
id CountWords
1 5
2 4
3 7
4 3

This is a minor improvement of #ZoharPeled's answer. This can also handle 0 length values:
DECLARE #t AS TABLE(id int identity(1,1), string varchar(100))
INSERT INTO #t VALUES
('I am not your father.'),
('Where are your brother,sister,mother?'),
('Where are your brother, sister and mother?'),
('Who are you?'),
('')
;WITH CTE AS
(
SELECT
Id,
REPLACE(REPLACE(string,' ', '><'), '<>', '') string
FROM #t
)
SELECT
id,
LEN(' '+string)-LEN(REPLACE(string, '><', ' ')) CountWords
FROM CTE

To handle multiple spaces too, use the method shown here
Declare #s varchar(100)
set #s='Who are you?'
set #s=ltrim(rtrim(#s))
while charindex(' ',#s)>0
Begin
set #s=replace(#s,' ',' ')
end
select len(#s)-len(replace(#s,' ',''))+1 as word_count
https://exploresql.com/2018/07/31/how-to-count-number-of-words-in-a-sentence/

I found this query more useful than the first. it omit extra characters and numbers and symbols, so it would count just words within a passage...
drop table if exists #t
create table #t (id int identity(1,1), c1 varchar(2000))
insert into #t (c1)
values
('Alireza Sattarzadeh Farkoush '),
('yes it is the   best .'),
('abc def ghja a the . asw'),
('?>< 123 ...!  z a b'),
('Wallex is   the greatest exchange in the .. world a after binance ...!')
select c1 , Count(*)
from (
select id, c1, value  
from #t t
cross apply (
select rtrim(ltrim(value)) as value from string_split(c1,' ')) a
where len(value) > 1 and value like '%[a-Z]%'
) Final
group by c1

Keep only desired characters and separate with semicolon in T-SQL

The problem:
I have text data imported into the db with a lot of unwanted characters. I need to keep only 4 capital letter strings within the imported text string. Example:
1447;#MIBD (This is a nice name);#2056;#LKRE (Very nice name indeed)
this could be in one column in one row of my table. What I need to extract from the string is:
MIBD and LKRE
And the result should preferably be the desired strings separated with semicolons.
It should be applied to the whole column and I cannot know how many of these 4 upper case letter strings might appear in one row.
Went through all sorts of function like PATINDEX etc. but really do not know how to approach it. thanks for any help!

try this, it assumes that the four char code is always preceded by ;# . As PATINDEX is case insensitive I have added additional check to verify that all the four character are capital.
DECLARE #MyTable Table( ID INT, MyString VARCHAR(8000))
INSERT INTO #MyTable
VALUES
(1, '1447;#MIBD (This is a nice name);#2056;#LKRE (Very nice name indeed)')
,(2, ';#DBCC (This is a nice name);#2056;#LLC (Very nice name indeed) ;#ABCD')
,(3, ';#AaaA;#OPQR;1234 (and) ;#WXYZ')
,(4, ';#abc this empty string without any code')
;WITH CTE AS
(
SELECT ID
,SUBSTRING(MyString, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+2, 4) AS NewString
,STUFF(MyString, 1, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+6, '') AS MyString
FROM #MyTable m
WHERE PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString) > 0
UNION ALL
SELECT ID
,SUBSTRING(MyString, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+2, 4) AS NewString
,STUFF(MyString, 1, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+6, '') AS MyString
FROM CTE c
WHERE PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString) > 0
)
SELECT c.ID,
STUFF(( SELECT '; ' + NewString
FROM CTE c1
WHERE c1.ID = c.ID
AND ASCII(SUBSTRING(NewString, 1, 1)) BETWEEN ASCII('A') AND ASCII('Z') -- first char
AND ASCII(SUBSTRING(NewString, 2, 1)) BETWEEN ASCII('A') AND ASCII('Z') -- second char
AND ASCII(SUBSTRING(NewString, 3, 1)) BETWEEN ASCII('A') AND ASCII('Z') -- third char
AND ASCII(SUBSTRING(NewString, 4, 1)) BETWEEN ASCII('A') AND ASCII('Z') -- fourth char
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)') -- use the value clause to hanlde xml character issue like, &,",>,<
,1,1,'') AS CodeList
FROM CTE c
GROUP BY ID
OPTION (MAXRECURSION 0);

I came to something like this so far:
ALTER FUNCTION CleanData
(
-- Parameters here
#Text AS VARCHAR(4000)
)
RETURNS VARCHAR(4000)
AS
BEGIN
WHILE PATINDEX('%[0-9#;()]%', #Text) > 0
BEGIN
SET #Text = STUFF(#Text, PATINDEX('%[0-9#;()]%', #Text), 1, '')
END
RETURN #Text
END
But what I get is the Initials and the characters in parantheses as the PATINDEX cannot differ between the upper and lower case. Maybe it might be helpful for somebody else

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

T-SQL string manipulation, replacement, comparing, pattern matching, regular expressions - sql-server

Related

Regex in SQL Server Replace function

How to replace right most string in SQL

T-SQL: How to replace spaces in a string except if they are after a specific character

SQL String: Counting Words inside a String

Keep only desired characters and separate with semicolon in T-SQL

Categories

Resources