I am cleaning up some data and would like to create a patindex that would reject any string contains any character(s) except for A-Za-z0-9./'-# and a space.
This rejects the special chars which should be allowed:
patindex ( '%[^A-Z0-9a-z./'-# ]%',stringtobetested )
Should I be masking the special chars? The bad and/or good chars can appear multiple times in a given string.
So where stringtobetested is abc#D-EF should pass but abc*def should fail.
This should work... it just uses replace to get around your escaping problems.
declare #stringtobetested1 varchar(64) = 'abc#D-EF'
declare #stringtobetested2 varchar(64) = 'abc*def '
select
#stringtobetested1 string1
,replace(replace(replace(replace(#stringtobetested1,'''','#'),' ','#'),'/','#'),'.','#') string1changed
,#stringtobetested2 string2
,replace(replace(replace(replace(#stringtobetested2,'''','#'),' ','#'),'/','#'),'.','#') string2changed
,patindex('%[^A-Z0-9a-z#-]%',replace(replace(replace(replace(#stringtobetested1,'''','#'),' ','#'),'/','#'),'.','#'))
,patindex('%[^A-Z0-9a-z#-]%',replace(replace(replace(replace(#stringtobetested2,'''','#'),' ','#'),'/','#'),'.','#'))
This can all be done with a PATINDEX, you just need some syntax help:
WITH test AS (
SELECT val
FROM (VALUES ('abc#D-EF./ -'''), ('abc*def')) AS t (val)
)
SELECT
input = t.val,
result = IIF(PATINDEX('%[^A-Z0-9a-z./''# -]%', t.val) > 0, 'fails', 'passes')
FROM test t;
First of all, the pattern itself is a string, and strings in T-SQL escape ' by doubling it to ''. Secondly, inside a [^ ] wildcard in a pattern, the - is used to define a character range when it occurs between two characters. By moving it to an end of the wildcard pattern, it is treated literally.
Other escape sequences specific to pattern wildcards can be found in this docs page: Pattern Matching in Search Conditions
Please run more testing for my query, and adjust max string length to fit your requirement.
DECLARE #TestString varchar(64) = 'abc#D-E/*F'
, #MaxStringLen INT = 20
;WITH cte AS(SELECT 1 number
UNION ALL
SELECT number + 1
FROM cte
WHERE number < #MaxStringLen
)
SELECT #TestString AS OriginalString
, CAST(CAST((SELECT SUBSTRING(#TestString, Number, 1)
FROM cte
WHERE Number <= LEN(#TestString) AND
SUBSTRING(#TestString, Number, 1) LIKE '%[A-Z0-9a-z-# ./'']%' FOR XML Path(''))
AS xml) AS varchar(MAX)) AS ConvertedString
, CASE WHEN #TestString = CAST(CAST((SELECT SUBSTRING(#TestString, Number, 1)
FROM cte
WHERE Number <= LEN(#TestString) AND
SUBSTRING(#TestString, Number, 1) LIKE '%[A-Z0-9a-z-# ./'']%' FOR XML Path(''))
AS xml) AS varchar(MAX))
THEN 1 ELSE 0 END IsAllowed
Related
I have a variable with random text, let's say
DECLARE #sNumberFormat NVARCHAR(200) = 'rand{text.here,{999}also-Random9He8re'
I want to replace each 9 in {999} by [0-9]. So in this example I would like to get
'rand{text.here,[0-9][0-9][0-9]also-Random9He8re'
Problem is I never know how many 9 will be placed in brackets, so there can be {99} {9999} ..and go on. I also need to validate if there is any invalid character (not 9) then nothing should be replaced.
I have tried some combinations of REPLACE and PATINDEX functions, but I could not achieve that.
Sans robust regex support, SQL Server's native functions do not give much help here. One approach, a bit hackish, would be to separate the input string into three components:
rand{text.here,
{999}
also-Random9He8re
Next, replace the 9 in the middle target substring with #, or some other character which you don't expect to appear anywhere else in your input string:
rand{text.here,
{###}
also-Random9He8re
Finally, replace the # in the middle substring with [0-9] and then concatenate together to get the final result:
DECLARE #val NVARCHAR(200) = 'rand{text.here,{999}also-Random9He8re'
SELECT REPLACE(
SUBSTRING(#val, 1, CHARINDEX('{9', #val) - 1) +
REPLACE(SUBSTRING(#val,
CHARINDEX('{9', #val) + 1,
CHARINDEX('9}', #val) - CHARINDEX('{9', #val)), '9', '#') +
SUBSTRING(#val, CHARINDEX('9}', #val) + 2, LEN(#val) - CHARINDEX('9}', #val)),
'#', '[0-9]');
So the lazy dev in me suggests this:
SELECT Replace(
Replace(
Replace(
Replace(#input, '{9999}', '[0-9][0-9][0-9][0-9]')
, '{999}', '[0-9][0-9][0-9]')
, '{99}', '[0-9][0-9]')
, '{9}', '[0-9]') AS result
;
You can keep extending as long as you like to perform your (one off?) replacements.
Quick. Simple. Extensible. Hacky.
Sometimes lazy is good enough.
This could be done with CTE series. It works with an arbitrary number of "9" values in square brackets.
Declare #str varchar(max) = 'rand{text.here,{999}also-Random9He8re';
With A As
(Select 1 As Pos
Union All
Select Pos+1 As Pos From A Where Pos < LEN(#str)
),
B As (
Select STRING_AGG(Case When Chr Like '[{9}]' Then Chr Else ' ' End, '') As Chr
From A Cross Apply (Select SUBSTRING(#str,A.Pos,1 )) As T(chr)
),
C As (
Select [value] As pattern,
REPLACE(REPLACE(REPLACE([value], '9', '[0-9]'),'{',''),'}','') As replacement,
ROW_NUMBER() Over (ORDER BY (SELECT NULL)) As Num,
COUNT(*) OVER (ORDER BY (SELECT NULL)) As Cnt
From B Cross Apply STRING_SPLIT(Chr,' ')
Where [value] Like '{%}' And [value] Like '%9%'
),
D As (
Select #str As Result, 1 As Num
Union All
select REPLACE(Result, C.pattern, C.replacement) As Res , D.Num+1 As Num
From D Inner Join C On (D.Num=C.Num)
Where D.Num<=C.Cnt)
Select Top 1 Result
From D
Order by Num Desc
A - Getting a list of character positions in text
B - Getting text with spaces instead of characters other than
'9','{','}'
C- Getting patterns and corresponding replacement values
D - Getting the result using REPLACEMENT function
I am looking for a function that selects English numbers and letters only:
Example:
TEKA תנור ביל דין in HLB-840 P-WH לבן
I want to run a function and get the following result:
TEKA HLB-840 P-WH
I'm using MS SQL Server 2012
What you really need here is regex replacement, which SQL Server does not support. Broadly speaking, you would want to find [^A-Za-z0-9 -]+\s* and then replace with empty string. Here is a demo showing that this works as expected:
Demo
This would output TEKA in HLB-840 P-WH for the input you provided. You might be able to do this in SQL Server using a regex package or UDF. Or, you could do this replacement outside of SQL using any number of tools which support regex (e.g. C#).
SQL-Server is not the right tool for this.
The following might work for you, but there is no guarantee:
declare #yourString NVARCHAR(MAX)=N'TEKA תנור ביל דין in HLB-840 P-WH לבן';
SELECT REPLACE(REPLACE(REPLACE(REPLACE(CAST(#yourString AS VARCHAR(MAX)),'?',''),' ','|~'),'~|',''),'|~',' ');
The idea in short:
A cast of NVARCHAR to VARCHAR will return all characters in your string, which are not known in the given collation, as question marks. The rest is replacements of question marks and multi-blanks.
If your string can include a questionmark, you can replace it first to a non-used character, which you re-replace at the end.
If you string might include either | or ~ you should use other characters for the replacements of multi-blanks.
You can influence this approach by specifying a specific collation, if some characters pass by...
there is no build in function for such purpose, but you can create your own function, should be something like this:
--create function (split string, and concatenate required)
CREATE FUNCTION dbo.CleanStringZZZ ( #string VARCHAR(100))
RETURNS VARCHAR(100)
BEGIN
DECLARE #B VARCHAR(100) = '';
WITH t --recursive part to create sequence 1,2,3... but will better to use existing table with index
AS
(
SELECT n = 1
UNION ALL
SELECT n = n+1 --
FROM t
WHERE n <= LEN(#string)
)
SELECT #B = #B+SUBSTRING(#string, t.n, 1)
FROM t
WHERE SUBSTRING(#string, t.n, 1) != '?' --this is just an example...
--WHERE ASCII(SUBSTRING(#string, t.n, 1)) BETWEEN 32 AND 127 --you can use something like this
ORDER BY t.n;
RETURN #B;
END;
and then you can use this function in your select statement:
SELECT dbo.CleanStringZZZ('TEKA תנור ביל דין in HLB-840 P-WH לבן');
create function dbo.AlphaNumericOnly(#string varchar(max))
returns varchar(max)
begin
While PatIndex('%[^a-z0-9]%', #string) > 0
Set #string = Stuff(#string, PatIndex('%[^a-z0-9]%', #string), 1, '')
return #string
end
I have a string like this:
Apple
I want to include a separator after each character so the end result will turn out like this:
A,p,p,l,e
In C#, we have one liner method to achieve the above with Regex.Replace('Apple', ".{1}", "$0,");
I can only think of looping each character with charindex to append the separator but seems a little complicated. Is there any elegant way and simpler way to achieve this?
Thanks HABO for the suggestions. I'm able to generate the result that I want using the code but takes a little bit of time to really understand how the code work.
After some searching, I manage to found one useful article to insert empty spaces between each character and it's easier for me to understand.
I modify the code a little to define and include desire separator instead of fixing it to space as the separator:
DECLARE #pos INT = 2 -- location where we want first space
DECLARE #result VARCHAR(100) = 'Apple'
DECLARE #separator nvarchar(5) = ','
WHILE #pos < LEN(#result)+1
BEGIN
SET #result = STUFF(#result, #pos, 0, #separator);
SET #pos = #pos+2;
END
select #result; -- Output: A,p,p,l,e
Reference
In following SQL scripts, I get each character using SUBSTRING() function using with a number table (basically I used spt_values view here for simplicity) and then I concatenate them via two different methods, you can choose one
If you are using SQL Server 2017, we have a new SQL string aggregation function
First script uses string_agg function
declare #str nvarchar(max) = 'Apple'
SELECT
string_agg( substring(#str,number,1) , ',') Within Group (Order By number)
FROM master..spt_values n
WHERE
Type = 'P' and
Number between 1 and len(#str)
If you are working with a previous version, you can use string concatenation using FOR XML Path and SQL Stuff function as follows
declare #str nvarchar(max) = 'Apple'
; with cte as (
SELECT
number,
substring(#str,number,1) as L
FROM master..spt_values n
WHERE
Type = 'P' and
Number between 1 and len(#str)
)
SELECT
STUFF(
(
SELECT
',' + L
FROM cte
order by number
FOR XML PATH('')
), 1, 1, ''
)
Both solution yields the same result, I hope it helps
If you have SQL Server 2017 and a copy of ngrams8k it's ultra simple:
declare #word varchar(100) = 'apple';
select newString = string_agg(token, ',') within group (order by position)
from dbo.ngrams8k(#word,1);
For pre-2017 systems it's almost as simple:
declare #word varchar(100) = 'apple';
select newstring =
( select token + case len(#word)+1-position when 1 then '' else ',' end
from dbo.ngrams8k(#word,1)
order by position
for xml path(''))
One ugly way to do it is to split the string into characters, ideally using a numbers table, and reassemble it with the desired separator.
A less efficient implementation uses recursion in a CTE to split the characters and insert the separator between pairs of characters as it goes:
declare #Sample as VarChar(20) = 'Apple';
declare #Separator as Char = ',';
with Characters as (
select 1 as Position, Substring( #Sample, 1, 1 ) as Character
union all
select Position + 1,
case when Position & 1 = 1 then #Separator else Substring( #Sample, Position / 2 + 1, 1 ) end
from Characters
where Position < 2 * Len( #Sample ) - 1 )
select Stuff( ( select Character + '' from Characters order by Position for XML Path( '' ) ), 1, 0, '' ) as Result;
You can replace the select Stuff... line with select * from Characters; to see what's going on.
Try this
declare #var varchar(50) ='Apple'
;WITH CTE
AS
(
SELECT
SeqNo = 1,
MyStr = #var,
OpStr = CAST('' AS VARCHAR(50))
UNION ALL
SELECT
SeqNo = SeqNo+1,
MyStr = MyStR,
OpStr = CAST(ISNULL(OpStr,'')+SUBSTRING(MyStR,SeqNo,1)+',' AS VARCHAR(50))
FROM CTE
WHERE SeqNo <= LEN(#var)
)
SELECT
OpStr = LEFT(OpStr,LEN(OpStr)-1)
FROM CTE
WHERE SeqNo = LEN(#Var)+1
The problem:
I have text data imported into the db with a lot of unwanted characters. I need to keep only 4 capital letter strings within the imported text string. Example:
1447;#MIBD (This is a nice name);#2056;#LKRE (Very nice name indeed)
this could be in one column in one row of my table. What I need to extract from the string is:
MIBD and LKRE
And the result should preferably be the desired strings separated with semicolons.
It should be applied to the whole column and I cannot know how many of these 4 upper case letter strings might appear in one row.
Went through all sorts of function like PATINDEX etc. but really do not know how to approach it. thanks for any help!
try this, it assumes that the four char code is always preceded by ;# . As PATINDEX is case insensitive I have added additional check to verify that all the four character are capital.
DECLARE #MyTable Table( ID INT, MyString VARCHAR(8000))
INSERT INTO #MyTable
VALUES
(1, '1447;#MIBD (This is a nice name);#2056;#LKRE (Very nice name indeed)')
,(2, ';#DBCC (This is a nice name);#2056;#LLC (Very nice name indeed) ;#ABCD')
,(3, ';#AaaA;#OPQR;1234 (and) ;#WXYZ')
,(4, ';#abc this empty string without any code')
;WITH CTE AS
(
SELECT ID
,SUBSTRING(MyString, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+2, 4) AS NewString
,STUFF(MyString, 1, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+6, '') AS MyString
FROM #MyTable m
WHERE PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString) > 0
UNION ALL
SELECT ID
,SUBSTRING(MyString, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+2, 4) AS NewString
,STUFF(MyString, 1, PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString)+6, '') AS MyString
FROM CTE c
WHERE PATINDEX('%;#[A-Z][A-Z][A-Z][A-Z]%',MyString) > 0
)
SELECT c.ID,
STUFF(( SELECT '; ' + NewString
FROM CTE c1
WHERE c1.ID = c.ID
AND ASCII(SUBSTRING(NewString, 1, 1)) BETWEEN ASCII('A') AND ASCII('Z') -- first char
AND ASCII(SUBSTRING(NewString, 2, 1)) BETWEEN ASCII('A') AND ASCII('Z') -- second char
AND ASCII(SUBSTRING(NewString, 3, 1)) BETWEEN ASCII('A') AND ASCII('Z') -- third char
AND ASCII(SUBSTRING(NewString, 4, 1)) BETWEEN ASCII('A') AND ASCII('Z') -- fourth char
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)') -- use the value clause to hanlde xml character issue like, &,",>,<
,1,1,'') AS CodeList
FROM CTE c
GROUP BY ID
OPTION (MAXRECURSION 0);
I came to something like this so far:
ALTER FUNCTION CleanData
(
-- Parameters here
#Text AS VARCHAR(4000)
)
RETURNS VARCHAR(4000)
AS
BEGIN
WHILE PATINDEX('%[0-9#;()]%', #Text) > 0
BEGIN
SET #Text = STUFF(#Text, PATINDEX('%[0-9#;()]%', #Text), 1, '')
END
RETURN #Text
END
But what I get is the Initials and the characters in parantheses as the PATINDEX cannot differ between the upper and lower case. Maybe it might be helpful for somebody else
I want to make unique random alphanumeric sequence to be the primary key for a database table.
Each char in the sequence is either a letter (a-z) or number (0-9)
Examples for what I want :
kl7jd6fgw
zjba3s0tr
a9dkfdue3
I want to make a function that could handle that task!
You can use an uniqueidentifier. This can be generated with the NEWID() function:
SELECT NEWID()
will return something like:
BE228C22-C18A-4B4A-9AD5-1232462F7BA9
It is a very bad idea to use random strings as a primary key.
It will effect performance as well as storage size, and you will be much better of using an int or a bigint with an identity property.
However, generating a random string in SQL maybe useful for other things, and this is why I offer this solution:
Create a table to hold permitted char values.
In my example the permitted chars are 0-9 and A-Z.
CREATE TABLE Chars (C char(1))
DECLARE #i as int = 0
WHILE #i < 10
BEGIN
INSERT INTO Chars (C) VALUES (CAST(#i as Char(1)))
SET #i = #i+1
END
SET #i = 65
WHILE #i < 91
BEGIN
INSERT INTO Chars (C) VALUES (CHAR(#i))
SET #i = #i+1
END
Then use this simple select statement to generate a random string from this table:
SELECT TOP 10 C AS [text()]
FROM Chars
ORDER BY NEWID()
FOR XML PATH('')
The advantages:
You can easily control the allowed characters.
The generation of a new string is a simple select statement and not manipulation on strings.
The disadvantages:
This select results with an ugly name (i.e XML_F52E2B61-18A1-11d1-B105-00805F49916B). This is easily solved by setting the result into a local variable.
Characters will only appear once in every string. This can easily be solved by adding union:
example:
SELECT TOP 10 C AS [text()]
FROM (
SELECT * FROM Chars
UNION ALL SELECT * FROM Chars
) InnerSelect
ORDER BY NEWID()
FOR XML PATH('')
Another option is to use STUFF function instead of As [Text()] to eliminate those pesky XML tags:
SELECT STUFF((
SELECT TOP 100 ''+ C
FROM Chars
ORDER BY NEWID()
FOR XML PATH('')
), 1, 1, '') As RandomString;
This option doesn't have the disadvantage of the ugly column name, and can have an alias directly. Execution plan is a little different but it should not suffer a lot of performance lose.
Play with it yourself in this Sql Fiddle
If there are any more advantages / disadvantages you think of please leave a comment. Thanks.
NewID() Function will generate unique numbers.So i have incremented them with loop and picked up the combination of alpha numeric characters using Charindex and Left functions
;with list as
(
select 1 as id,newid() as val
union all
select id + 1,NEWID()
from list
where id + 1 < 100
)
select ID,left(val, charindex('-', val) - 2) from list
option (maxrecursion 0)
The drawback of NEWID() for this request is it limits the character pool to 0-9 and A-F. To define your own character pool, you have to role a custom solution.
This solution adapted from Generating random strings with T-SQL
--Define list of characters to use in random string
DECLARE #CharPool VARCHAR(255)
SET #CharPool = '0123456789abcdefghijkmnopqrstuvwxyz'
--Store length of CharPool for use later
DECLARE #PoolLength TINYINT
SET #PoolLength = LEN(#CharPool) --36
--Define random string length
DECLARE #StringLength TINYINT
SET #StringLength = 9
--Declare target parameter for random string
DECLARE #RandomString VARCHAR(255)
SET #RandomString = ''
--Loop control variable
DECLARE #LoopCount TINYINT
SET #LoopCount = 0
--For each char in string, choose random char from char pool
WHILE(#LoopCount < #StringLength)
BEGIN
SELECT #RandomString += SUBSTRING(#Charpool, CONVERT(int, RAND() * #PoolLength), 1)
SELECT #LoopCount += 1
END
SELECT #RandomString
http://sqlfiddle.com/#!6/9eecb/4354
I must reiterate, however, that I agree with the others: this is a horrible idea.