Regular Expressions in SQL Server - sql-server

I have a table with list of users as below.
christopher.j.sansom
vinay.prabhakar
Guillaume.de.Miribel (Stage 2B); jean-marie.pierron (Stage 3B)
ian.notley; pavan.sethi
Ron.M.Barbeau
jason.angelos
jonathan.l.lange, ramesh.t.murti,
nicole.f.cohen
Can we get the records as below. Need to return comma separated records as new rows.
christopher.j.sansom
vinay.prabhakar
Guillaume.de.Miribel
jean-marie.pierron
ian.notle
pavan.sethi
Ron.M.Barbeau
jason.angelos
jonathan.l.lange
ramesh.t.murti
nicole.f.cohen

See Regex here: https://regex101.com/r/hD2mQ8/1
You can use this pattern:
/(^[\w.-]+)|(?<=; |, )[\w.-]+/ with global and multi-line modifiers to capture the text that you need, but I'm not sure how you would return each one to a new line without seeing your current code.

To do that you need a string splitter query/function.
This is an example, there are other way to do it.
With Normalize AS (
SELECT REPLACE(CONCAT(REPLACE(names, ',', ';'), ';'), ';;', ';') Names
FROM Table1
), Splitter AS (
Select names String
, WordCounter = 0
, NWordStart = 1
, NWordEnd = CHARINDEX(';', names)
, Word = CAST('' as nvarchar(255))
, WordNumber = LEN(names) - LEN(REPLACE(names, ';', '')) + 1
FROM Normalize
UNION ALL
SELECT s.String
, WordCounter = s.WordCounter + 1
, NWordStart = s.NWordEnd + 1
, NWordEnd = COALESCE(NULLIF(CHARINDEX(';', s.String, NWordEnd + 1), 0)
, LEN(s.String) + 1)
, Word = LTRIM(Cast(SubString(String, s.NWordStart, s.NWordEnd - s.NWordStart)
AS nvarchar(255)))
, WordNumber = s.WordNumber
FROM Splitter s
WHERE s.WordCounter + 1 <= s.WordNumber
)
SELECT LEFT(WORD , CHARINDEX(' ', CONCAT(Word, ' ')) - 1) Word
FROM Splitter
WHERE Word <> '';
SQLFiddle Demo
The CTE Normalize change all the separator char to ; to have a single separator for the split.
The CTE Splitter split the string into chunk using the ; as the separator.
The main query remove the stage information searching for the space between the name and the left bracket.

Related

Regex in SQL Server Replace function

I have a variable with random text, let's say
DECLARE #sNumberFormat NVARCHAR(200) = 'rand{text.here,{999}also-Random9He8re'
I want to replace each 9 in {999} by [0-9]. So in this example I would like to get
'rand{text.here,[0-9][0-9][0-9]also-Random9He8re'
Problem is I never know how many 9 will be placed in brackets, so there can be {99} {9999} ..and go on. I also need to validate if there is any invalid character (not 9) then nothing should be replaced.
I have tried some combinations of REPLACE and PATINDEX functions, but I could not achieve that.
Sans robust regex support, SQL Server's native functions do not give much help here. One approach, a bit hackish, would be to separate the input string into three components:
rand{text.here,
{999}
also-Random9He8re
Next, replace the 9 in the middle target substring with #, or some other character which you don't expect to appear anywhere else in your input string:
rand{text.here,
{###}
also-Random9He8re
Finally, replace the # in the middle substring with [0-9] and then concatenate together to get the final result:
DECLARE #val NVARCHAR(200) = 'rand{text.here,{999}also-Random9He8re'
SELECT REPLACE(
SUBSTRING(#val, 1, CHARINDEX('{9', #val) - 1) +
REPLACE(SUBSTRING(#val,
CHARINDEX('{9', #val) + 1,
CHARINDEX('9}', #val) - CHARINDEX('{9', #val)), '9', '#') +
SUBSTRING(#val, CHARINDEX('9}', #val) + 2, LEN(#val) - CHARINDEX('9}', #val)),
'#', '[0-9]');
So the lazy dev in me suggests this:
SELECT Replace(
Replace(
Replace(
Replace(#input, '{9999}', '[0-9][0-9][0-9][0-9]')
, '{999}', '[0-9][0-9][0-9]')
, '{99}', '[0-9][0-9]')
, '{9}', '[0-9]') AS result
;
You can keep extending as long as you like to perform your (one off?) replacements.
Quick. Simple. Extensible. Hacky.
Sometimes lazy is good enough.
This could be done with CTE series. It works with an arbitrary number of "9" values in square brackets.
Declare #str varchar(max) = 'rand{text.here,{999}also-Random9He8re';
With A As
(Select 1 As Pos
Union All
Select Pos+1 As Pos From A Where Pos < LEN(#str)
),
B As (
Select STRING_AGG(Case When Chr Like '[{9}]' Then Chr Else ' ' End, '') As Chr
From A Cross Apply (Select SUBSTRING(#str,A.Pos,1 )) As T(chr)
),
C As (
Select [value] As pattern,
REPLACE(REPLACE(REPLACE([value], '9', '[0-9]'),'{',''),'}','') As replacement,
ROW_NUMBER() Over (ORDER BY (SELECT NULL)) As Num,
COUNT(*) OVER (ORDER BY (SELECT NULL)) As Cnt
From B Cross Apply STRING_SPLIT(Chr,' ')
Where [value] Like '{%}' And [value] Like '%9%'
),
D As (
Select #str As Result, 1 As Num
Union All
select REPLACE(Result, C.pattern, C.replacement) As Res , D.Num+1 As Num
From D Inner Join C On (D.Num=C.Num)
Where D.Num<=C.Cnt)
Select Top 1 Result
From D
Order by Num Desc
A - Getting a list of character positions in text
B - Getting text with spaces instead of characters other than
'9','{','}'
C- Getting patterns and corresponding replacement values
D - Getting the result using REPLACEMENT function

How to replace right most string in SQL

I want to replace rightmost same characters from string.
e.g string is like in this format
"GGENXG00126""XLOXXXXX"
in sql but last consecutive X lenght is not fix.
I already Tried in SQL is
select REPLACE('"GGENXG00126""XLOXXXXX"', 'X', '')
but using this all 'X' is removed. I want to remove only rightmost same characters and output expected "GGENG00126""LO".
You can replace all x with spaces, RTRIM, then undo the replacements:
SELECT '"' + REPLACE(RTRIM(REPLACE(SUBSTRING(str, 2, LEN(str) - 2), 'X', ' ')), ' ', 'X') + '"'
FROM (VALUES
('"GGENXG00126""XLOXXXXX"')
) v(str)
-- "GGENXG00126""XLO"
An alternative idea using PATINDEX and REVERSE, to find the first character that isn't the final character in the string. (Assumes all strings are quoted):
SELECT REVERSE(STUFF(R.ReverseString,1,PATINDEX('%[^' + LEFT(R.ReverseString,1) + ']%',R.ReverseString)-1,'')) + '"'
FROM (VALUES('"GGENXG00126""XLOXXXXX"'))V(YourString)
CROSS APPLY (VALUES(STUFF(REVERSE(V.YourString),1,1,''))) R(ReverseString);
You can try this below option-
DECLARE #InputString VARCHAR(200) = 'GGENXG00126""XLOXXXXX'
SELECT
LEFT(
#InputString,
LEN(#InputString)+
1-
PATINDEX(
'%[^X]%',
REVERSE(#InputString)
)
)
Output is-
GGENXG00126""XLO

Order by Empty String Last on Concatenated Column

I am trying to order a table alphabetically, ascending, with nulls last but am having problems.
The code below produces the following error:
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
select distinct
'item' = othertab..item,
'stockedFor' = tab..stocked_for
+ ', ' + tab..stockedFor2
+ ', '+ tab..stockedFor3
from tab
order by case when stockedFor is null then 1 else 0 end, stockedFor
How can I return stockedFor alphabetically and nulls last?
Just wrap it in another select statement:
select stockedFor
from (
select distinct
'stockedFor' = tab..stocked_for
+ ', ' + tab..stockedFor2
+ ', '+ tab..stockedFor3
from tab
) x
order by case when stockedFor is null then 1 else 0 end, stockedFor
Since you are removing duplicates, a workaround is to use GROUP BY to remove duplicates instead of DISTINCT. The question has changed but the method still applies if putting all columns in the SELECT in the GROUP BY.
For example:
select
'item' = othertab..item,
'stockedFor' = tab..stocked_for
+ ', ' + tab..stockedFor2
+ ', '+ tab..stockedFor3
from tab
GROUP BY othertab..item,
tab..stocked_for
+ ', ' + tab..stockedFor2
+ ', '+ tab..stockedFor3
order by case when stockedFor is null then 1 else 0 end, stockedFor

T-SQL: How to replace spaces in a string except if they are after a specific character

The situation is as follows:
We have action logs in our database triggered by user events, that saves the events in varchar but in xml format. In some cases the name of the attributes contains spaces like this one:
<UNITDETAILUPDATE NEWUNIT TYPE="DUW 30 01" OLDFAULT_CIRC="HWS" NEWFAULT_CIRC="HWS" OLDOUTGOING R-STATE="R3C" />
I would like to eliminate the spaces from the names of the attributes before parsing to xml(because this way it is not possible of course :))
As you can see there are multiple occurences in the string. A great solution would be something like only replacing the spaces where there is no " character before them, but I have no idea how to achieve this.
Any ideas?
Thank you :)
For a high-performing set-based solution you can grab a copy of ngrams8k and do this:
DECLARE #string varchar(1000) = '<UNITDETAILUPDATE NEWUNIT TYPE="DUW 30 01" OLDFAULT_CIRC="HWS" NEWFAULT_CIRC="HWS" OLDOUTGOING R-STATE="R3C" />';
select newString =
(
select
case when token = ' ' and position > space1 and isQuoted = 0 and p.c <> '"'
then '' else token end
from
(
select ng.*, sum(case when token = '"' then 1 else 0 end) over (order by position)%2
from dbo.ngrams8k(#string, 1) ng
) x(position, token, isQuoted)
cross join (values (charindex(' ', #string))) v(space1)
cross apply (values (substring(#string, position-1,1))) p(c)
order by position
for xml path(''), type
).value('(text())[1]', 'varchar(8000)');
Results
<UNITDETAILUPDATE NEWUNITTYPE="DUW 30 01" OLDFAULT_CIRC="HWS" NEWFAULT_CIRC="HWS" OLDOUTGOINGR-STATE="R3C" />
If you have a SQL Server 2017 you can use string_agg like with ngrams8k like this:
select newString = string_agg(
case when token = ' ' and position > space1 and isQuoted = 0
and substring(#string, position-1,1) <> '"' then '' else token end,'')
from
(
select ng.*, sum(case when token = '"' then 1 else 0 end) over (order by position)%2
from dbo.ngrams8k(#string, 1) ng
) x(position, token, isQuoted)
cross join (values (charindex(' ', #string))) v(space1)
cross apply (values (substring(#string, position-1,1))) p(c);
You could search for good spaces and save them with a placeholder
Declare #var varchar(100) = '<UNITDETAILUPDATE NEWUNIT TYPE="DUW 30 01" OLDFAULT_CIRC="HWS" NEWFAULT_CIRC="HWS" OLDOUTGOING R-STATE="R3C" />'
Select #var = replace(#var,'" ','"|")
Then remove the spaces
Select #var = replace(#var,' ','_')
Then put the good spaces back
Select #var = replace(replace(#var,'|',' '),'UNITDETAILUPDATE_','UNITDETAILUPDATE ')
This could be combined into one ugly replace so that it could be selected across a table. You would probably need to placehold the spaces inside the quotations. Regex is not supported in SQL but sometimes it could be used with 'like'
This "Xml" is awfully bad...
The following approach won't be fast. If you need this more often, you might use another language or tool.
This solutions uses a recursive CTE, which is a hidden RBAR, to build ab the string again, charachter by character, checking for "within quotes":
DECLARE #BadXml NVARCHAR(MAX)='<UNITDETAILUPDATE NEWUNIT TYPE="DUW 30 01" OLDFAULT_CIRC="HWS" NEWFAULT_CIRC="HWS" OLDOUTGOING R-STATE="R3C" />';
WITH recCTE
AS
(
SELECT LTRIM(RTRIM(REPLACE(#BadXml,'" ','"$'))) AS TheString
,1 AS CurrentPos
,CAST('<' AS NVARCHAR(MAX)) AS BuildNew
,-1 AS IsFirstBlank
,-1 AS QuotOpen
UNION ALL
SELECT r.TheString
,r.CurrentPos+1
,r.BuildNew + CASE WHEN chr=' ' AND r.IsFirstBlank=1 AND r.QuotOpen=-1 THEN '_' ELSE chr END
,CASE WHEN r.IsFirstBlank=-1 AND chr=' ' THEN 1 ELSE r.IsFirstBlank END
,CASE WHEN chr='"' THEN r.QuotOpen * (-1) ELSE r.QuotOpen END
FROM recCTE AS r
CROSS APPLY(SELECT SUBSTRING(r.TheString,r.CurrentPos+1,1)) AS A(chr)
WHERE r.CurrentPos<LEN(r.TheString)
)
SELECT TOP 1 IsFirstBlank,QuotOpen, CAST(REPLACE(BuildNew,'"$','" ') AS XML) AS TheXml
FROM recCTE
ORDER BY LEN(BuildNew) DESC
OPTION (MAXRECURSION 1000)
The result
IsFirstBlank QuotOpen TheXml
1 -1 <UNITDETAILUPDATE NEWUNIT_TYPE="DUW 30 01" OLDFAULT_CIRC="HWS" NEWFAULT_CIRC="HWS" OLDOUTGOING_R-STATE="R3C" />
Take away the CAST to xml, the TOP 1 and the ORDER BY to see how it works.

How to remove spaces between comma or numbers in T-SQL?

SELECT REPLACE('10,6 7 7,900 11,027,900', ' ', '')
SELECT REPLACE('10,2 27,900 10,6 7 7,900 11,027,900', ' ', '')
Bad Result:
10,677,90011,027,900
10,227,90010,677,90011,027,900
Good Result:
10,677,900 11,027,900
10,227,900 10,677,900 11,027,900
This is an odd requirement. Before this goes downhill, I suggest you normalize your table properly. Anyway, if you're stuck with what you have for now, here is a way to solve your problem.
First, you need a string splitter, to split strings by comma. I use DelimitedSplit8K, written by Jeff Moden and improved by the members of SQL Server Central community.
After splitting the string, check if the value of each item after the space is removed has a length of 3. If yes, concatenate the new string (space removed). Else, concatenate the original item.
WITH Tbl(OriginalString) AS(
SELECT '10,6 7 7,900 11,027,900' UNION ALL
SELECT '10,2 27,900 10,6 7 7,900 11,027,900'
),
TblSplitted(originalString, ItemNumber, Item) AS (
SELECT *
FROM Tbl t
CROSS APPLY dbo.DelimitedSplit8K(t.OriginalString, ',')
)
SELECT *
FROM Tbl t
CROSS APPLY(
SELECT STUFF((
SELECT ',' +
CASE
WHEN LEN(REPLACE(s.Item, ' ', '')) = 3 THEN REPLACE(s.Item, ' ', '')
ELSE s.Item
END
FROM TblSplitted s
WHERE s.originalString = t.OriginalString
ORDER BY s.ItemNumber
FOR XML PATH('')
), 1, 1, '')
) x(NewString);

Resources