How to generate random alphanumeric unique characters with specified length - sql-server

Problem is described below:
Generate a unique alphanumeric characters.
Length of characters should be 32.
Unique numbers may be seeded in the current time to help in the uniqueness of the generated numbers.
Alphabet characters must come from this pool: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Sample Output: 445rpxlKYPkj1pg4q8nAy7Ab91zxZ8v1
I can do this using Java, but will greatly appreciate if you could help me do this on MS SQL or T-SQL.

First, you need to split the string into separate rows. Then, do a SELECT with ORDER BY NEWID() for the random sort. Finally, use FOR XML PATH('') to concatenate them back:
DECLARE #str VARCHAR(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
;WITH E1(N) AS( -- 10 ^ 1 = 10 rows
SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)
),
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), -- 10 ^ 2 = 100 rows
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), -- 10 ^ 4 = 10,000 rows
CteTally(N) AS(
SELECT TOP(LEN(#str)) ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
FROM E4
)
SELECT (
SELECT TOP(32)
SUBSTRING(#str, N, 1)
FROM CteTally t
ORDER BY NEWID()
FOR XML PATH('')
) AS Result
ONLINE DEMO
The above is more of a generic random string generator. You can modify it to suit your need. If the requirement will not change, you can simply use this:
DECLARE #str VARCHAR(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
;WITH E1(N) AS( -- 52 Rows
SELECT 1 FROM( VALUES
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1)
)t(N)
),
CteTally(N) AS(
SELECT ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
FROM E1
)
SELECT (
SELECT TOP(32)
SUBSTRING(#str, N, 1)
FROM CteTally t
ORDER BY NEWID()
FOR XML PATH('')
) AS Result

I made this generic enough to handle any pool of characters and any output length. The core idea is to take a random sequence of bytes and use a base conversion algorithm to convert a long number into a new representation then translated to a string using your desired characters as its "digits".
For your specific scenario we need about 183 bits, or log2(52) x 32, to get to your desired length. Using newid() will generate the unique bit sequence but it will only do so 128 bits at a time and a series of values is simply concatenated until there are enough. Then having a value to operate on, the main loop is essentially the same long division we learned from elementary school. The intermediate calculations are kept in place in the varbinary array and the loop continues only until enough output characters are obtained. Each iteration determines another low order digit in the new base and this can terminate early since they won't change. The algorithm can't guarantee any global uniqueness if the output doesn't consume at least all of one newid(), so make sure log2(len(pool)) x output length is at least 128.
The target base, which is ultimately the length of the character pool, can't be more than 256. I hard-coded a limitation by setting the 128-byte maximum length of #e. For the question #e only needs to be 32 bytes long and it could be adjusted upward or downward as necessary or just defined as varbinary(max). If you need something more truly random you could find another source for the entropy bits like crypt_gen_random(). Since uniqueness appears to be the primary concern this answer fits that requirement. And by the way, repeating characters in the pool will naturally open the door for collisions.
This is fast and generic and it can be easily wrapped up in a function. And a more robust implementation would handle these extra checks.
declare #characterPool varchar(256) =
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
declare #outputLength int = 32;
declare #n int = 0; /* counter */
declare #numLoops int = ceiling(log(len(#characterPool)) / log(2) * #outputLength / 128)
declare #e varbinary(128) = 0x; /* entropy */
while #n < #numLoops
begin
set #e = cast(newid() as binary(16)); set #n += 1;
end
declare #b int; /* byte */
declare #d int; /* dividend */
declare #out varchar(128) = '';
declare #outputBase int = len(#characterPool);
declare #entropyBytes int = len(#e);
declare #m int = 0;
while #m < #outputLength
begin
set #b = 0; set #d = 0; set #n = 0;
while #n < #entropyBytes /* big-endian */
begin
set #b = (#b - #d * #outputBase) * 256 + cast(substring(#e, #n + 1, 1) as int);
set #d = #b / #outputBase;
set #e = cast(stuff(#e, #n + 1, 1, cast(#d as binary(1))) as varbinary(128));
set #n += 1;
end
set #out = substring(#characterPool, #b - #d * #outputBase + 1, 1) + #out;
set #m += 1;
end
select #out as "UniqueString"
http://rextester.com/EYAK79470
As one simple test of the algorithm you could just assign a known value in hexadecimal format and confirm that the output (using 012345678ABCDEF as the character pool) is the same hexadecimal value. In the same way this obviously works with base64, binary and octal.
Update: The main loop can be made faster by not having to iterate over more bytes than necessary. I don't know how crypt_gen_random() compares to newid() in terms of speed or CPU usage so this change might not even be a net positive so I'll just note it as an alternative to explore. You will want to keep the bytes from newid on the little end and attach the rest to the front.
declare #e varbinary(1024) = cast(newid() as binary(16));
declare #padBytes int = ceiling(log(len(#characterPool)) / log(2) * #outputLength) - 128;
if #padBytes > 0 set #e = crypt_gen_random(#padBytes) + #e; /* big end plus little end */

Related

TSQL remap big number to smaller, but keep Identity

I'm trying to load a huge number into Field1 INT which can hold only max=2,147,483,647, according to it I can't change DDL, so tried to find adhoc solution to cut out single digit from the middle of this number and then add check for uniqueness.
This numbers are in the format like: 29000001234, so I mean to keep this format with zeros in the middle to easy recognizing. I don't want to introduce any new columns/tables into this task, as limited in freedom there, this is 3rd party schema.
Can anybody suggest better solution, how to remap/keep all numbers under that limit; this is my draft:
DECLARE #fl FLOAT = 29000001234
DECLARE #I INT
SELECT #i = (SUBSTRING(CAST(CAST(#fl AS BIGINT) AS VARCHAR(18)),1,4) +
SUBSTRING(CAST(CAST(#fl AS BIGINT) AS VARCHAR(18)),7,LEN(CAST(CAST(#fl AS BIGINT) AS VARCHAR(18)))) )
select #i;
But if you really want to remove the middle digits, here's another approach:
DECLARE #fl FLOAT = 29000001234
DECLARE #I INT
DECLARE #StringFloat as varchar(80)
SET #StringFloat = CONVERT(varchar(80), CAST(#fl AS bigint))
SET #I = CAST( CONCAT(LEFT( #StringFloat, 4 ), RIGHT( #StringFloat, 5 )) as int )
SELECT #i;
I think arithmetic operations should be less expensive than string operations, so you should use them instead:
DECLARE #fl FLOAT = 29000001234
DECLARE #flBig BIGINT = #fl
DECLARE #i INT
SET #i = (#flBig / 1000000000) * 10000000 + (#flBig % 100000000)
select #i; --> 290001234
Provided example assumes the first part of the number will have a maximum of two digits (i.e. 29 in your case) and that you want to allow larger number in the left part (up to 999999).
NOTE: parentheses are redundant, as division and multiplication have the same priority and modulo operator has higher precedence over addition. I have used them just to highlight the parts of the computation.
You can't do that without any arithmetic overflow, or with out losing your original data.
If you have a limitation in columns of your destination table or query, use multiple rows:
declare #c bigint = 29000001234;
declare #s bigint = 1000000000; -- Separator value
;with cte(partNo, partValue) as (
select 1, #c % #s
union all
select partNo + 1, (#c / power(#s, partNo)) % #s
from cte
where (#c / power(#s, partNo)) > 0
)
select partValue
from cte;
Seems like a strange situation, not sure why you need to go to all the trouble of converting a big number to a string and then randomly remove a digit. Some more information about why or what the real goal is would be helpful.
That said, maybe it would be easier to just subtract a constant amount from these values? e.g.:
DECLARE #fl FLOAT = 29000001234
DECLARE #I INT
DECLARE #OFFSET BIGINT = 29000000000
SET #I = CAST(#fl AS BIGINT)-#OFFSET
SELECT #I
Which gives you an INT of 1234 as the result using your example.
The following creation drops increasingly wide blocks of digits from the original third party value and returns the results that fit in an INT. The results could be outer joined with the existing data to find a suitable new value.
declare #ThirdPartyValue as BigInt = 29000001234;
declare #MaxInt as BigInt = 2147483647;
declare #TPV as VarChar(19) = Cast( #ThirdPartyValue as VarChar(19) );
declare #TPVLen as Int = Len( #TPV );
with
-- 0 through 9.
Digits as (
select Digit from ( values (0), (1), (2), (3), (4), (5), (6), (7), (8), (9) ) as Digits( Digit ) ),
-- 0 through #TPVLen .
Positions as (
select Ten_1.Digit * 10 + Ten_0.Digit as Number
from Digits as Ten_0 cross join Digits as Ten_1
where Ten_1.Digit * 10 + Ten_0.Digit <= #TPVLen ),
-- 1 through #TPVLen - 1 .
Widths as (
select Number
from Positions
where 0 < Number and Number < #TPVLen ),
-- Try dropping Width digits at Position from #TPV .
AlteredTPVs as (
select P.Number as Position, W.Number as Width,
Stuff( #TPV, P.Number, W.Number, '' ) as AlteredTPV
from Positions as P cross join Widths as W
where P.Number + W.Number <= #TPVLen )
-- See which results fit in an Int .
select Position, Width, AlteredTPV, Cast( AlteredTPV as BigInt ) as AlteredTPVBigInt
from AlteredTPVs
where Cast( AlteredTPV as BigInt ) <= #MaxInt -- Comment out this line to see all results.
order by Width, Position
It could be more clever about returning only distinct new values.
This general idea could be used to hunt down blocks of zeroes or other suitable patterns to arrive at a set of values to be tested against the existing data.

Avg of float inconsistency

The select returns right at 23,000 rows
The except will return between 60 to 200 rows (and not the same rows)
The except should return 0 as it is select a except select a
PK: [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
[tf] is a float and and I get float is not exact
But I naively thought avg(float) would be repeatable
Avg(float) does appear to be repeatable
What is the solution?
TF is between 0 and 1 and I only need like 5 significant digits
I just need avg(TF) to be the same number run to run
Decimal(9,8) gives me enough precision and if I cast to decimal(9,8) the except properly returns 0
I can change [TF] to decimal(9,8) but it will be bit of work and lot of regression testing as some of the test that use [tf] take over a day to run
Is change [TF] to decimal(9,8) the best solution?
SELECT [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
, avg([FTSindexWordOnce].[tf]) AS [avgTFraw]
FROM [docSVenum1]
JOIN [docFieldLock]
ON [docFieldLock].[sID] = [docSVenum1].[sID]
AND [docFieldLock].[fieldID] = [docSVenum1].[enumID]
AND [docFieldLock].[lockID] IN (4, 5) /* secLvl docAdm */
JOIN [FTSindexWordOnce]
ON [FTSindexWordOnce].[sID] = [docSVenum1].[sID]
GROUP BY [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
except
SELECT [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
, avg([FTSindexWordOnce].[tf]) AS [avgTFraw]
FROM [docSVenum1]
JOIN [docFieldLock]
ON [docFieldLock].[sID] = [docSVenum1].[sID]
AND [docFieldLock].[fieldID] = [docSVenum1].[enumID]
AND [docFieldLock].[lockID] IN (4, 5) /* secLvl docAdm */
JOIN [FTSindexWordOnce]
ON [FTSindexWordOnce].[sID] = [docSVenum1].[sID]
GROUP BY [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
order by [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
In this case tf is term frequency of tf-idf
tf normalization is subjective and does not require much precision
Avg(tf) needs to be consistent from select to select or the results are not consistent
In a single select with joins I need a consistent avg(tf)
Going with decimal and a low precision for tf got consistent results
This is very similiar to: SELECT SUM(...) is non-deterministic when adding the column-values of datatype float.
The problem is that with inaccurate datatype (FLOAT/REAL) the order of of arithmetic operations on floating point matters. Demo from connect:
DECLARE #fl FLOAT = 100000000000000000000
DECLARE #i SMALLINT = 0
WHILE (#i < 100)
BEGIN
SET #fl = #fl + CONVERT(float, 5000)
SET #i = #i + 1
END
SET #fl = #fl - 100000000000000000000
SELECT CONVERT(NVARCHAR(40), #fl, 2)
-- 0.000000000000000e+000
DECLARE #fl FLOAT = 0
DECLARE #i SMALLINT = 0
WHILE (#i < 100)
BEGIN
SET #fl = #fl + CONVERT(float, 5000)
SET #i = #i + 1
END
SET #fl = #fl + 100000000000000000000
SET #fl = #fl - 100000000000000000000
SELECT #fl
-- 507904
LiveDemo
Possible solutions:
CAST all arguments to accurate datatype like DECIMAL/NUMERIC
alter table and change FLOAT to DECIMAL
you can try to force query optimizer to calculate the sum with the same order.
The good news is that when a stable query result matters to your
application, you can force the order to be the same by preventing
parallelism with OPTION (MAXDOP 1).
It looks like intial link is dead. WebArchive

Convert 32 bit binary string of 1's and 0's to Signed Decimal Number in SQL Server

I am using SQL Server 2012 Express. I have a string of 1's and 0's 32 bits in length.
01010010000100010111001101110011
How would I convert that to a Signed Decimal Number in a SQL script?
Currently I use a Web Tool online for my answer, and my current searching is not leading me to the answer I need.
You can perform the conversion with a single T-SQL statement if you use a Tally table:
;WITH Tally(i) AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS i
FROM (VALUES (0), (0), (0), (0), (0), (0), (0), (0)) a(n)
CROSS JOIN (VALUES (0), (0), (0), (0)) b(n)
)
SELECT SUM(t.v) AS DecimalNumber
FROM (
SELECT POWER(CAST(SUBSTRING(x.d, i, 1) AS DECIMAL(10,0)) * 2, 32 - i)
FROM (VALUES ('01010010000100010111001101110011')) x(d)
CROSS JOIN Tally) AS t(v)
Explanation:
Tally is a table expression returning all values from 1-32.
Using these values we can extract every single digit out of the binary string using SUBSTRING.
With the use of POWER mathematical function we can convert every separate binary digit to decimal.
Using SUM we can add up all separate decimal numbers to get the expected result.
Demo here
You can try like below -
DECLARE #Binary VARCHAR(100) = '01010010000100010111001101110011';
DECLARE #characters CHAR(36),
#result BIGINT,
#index SMALLINT,
#base BIGINT;
SELECT #characters = '0123456789abcdefghijklmnopqrstuvwxyz',
#result = 0,
#index = 0,
#base = 2;
WHILE #index < LEN(#Binary)
BEGIN
SELECT #result = #result + POWER(#base, #index) * (CHARINDEX(SUBSTRING(#Binary, LEN(#Binary) - #index, 1), #characters) - 1);
SET #index = #index + 1;
END
SELECT #result;
This will help you to convert from any base ( I used#base as 2 for binary) to base 10. Started from far right and moved to the left until we run out of digits. The conversion is the (base ^ index) * digit.
in Addition to Giorgos Betsos's reply
see ITVF function below:
CREATE FUNCTION [dbo].[udf_BinaryToDecimal]
(
#Binary VARCHAR(31)
)
RETURNS TABLE AS RETURN
WITH Tally (n) AS
(
--32 Rows
SELECT TOP (LEN (#Binary)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1
FROM (VALUES (0),(0),(0),(0)) a(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
)
SELECT
SUM(SUBSTRING(REVERSE(#Binary),n+1,1) * POWER(2,n)) TenBase
FROM Tally
/*How to Use*/
SELECT TenBase
FROM udf_BinaryToDecimal ('01010010000100010111')
/*Result -> 336151*/
I had to add this code to the end of Abhishek's Code Example to get a Signed number I needed.
DECLARE #MyNewExValue INT
IF #result > 2147483647
BEGIN
SET #result = #result - (2147483648 * 2)
END
SET #MyNewExValue = #result
SELECT #MyNewExValue
This solution will work with binary strings of any (bigint) length:
DECLARE #input varchar(max) =
'01010010000100010111001101110011'
;WITH N(V) AS
(
SELECT
ROW_NUMBER()over(ORDER BY (SELECT 1))
FROM
(VALUES(1),(1),(1),(1))M(a),
(VALUES(1),(1),(1),(1))L(a),
(VALUES(1),(1),(1),(1))K(a)
)
SELECT SUM(SUBSTRING(REVERSE(#input),V,1)*POWER(CAST(2 as BIGINT), V-1))
FROM N
WHERE V <= LEN(#input)
(source)
It should be noted the most upvoted solution above only works with binary strings exactly 32 bits in length, and a string of '00000000000000000000000000000000' returns 1.

How to make unique random alphanumeric sequence in SQL Server

I want to make unique random alphanumeric sequence to be the primary key for a database table.
Each char in the sequence is either a letter (a-z) or number (0-9)
Examples for what I want :
kl7jd6fgw
zjba3s0tr
a9dkfdue3
I want to make a function that could handle that task!
You can use an uniqueidentifier. This can be generated with the NEWID() function:
SELECT NEWID()
will return something like:
BE228C22-C18A-4B4A-9AD5-1232462F7BA9
It is a very bad idea to use random strings as a primary key.
It will effect performance as well as storage size, and you will be much better of using an int or a bigint with an identity property.
However, generating a random string in SQL maybe useful for other things, and this is why I offer this solution:
Create a table to hold permitted char values.
In my example the permitted chars are 0-9 and A-Z.
CREATE TABLE Chars (C char(1))
DECLARE #i as int = 0
WHILE #i < 10
BEGIN
INSERT INTO Chars (C) VALUES (CAST(#i as Char(1)))
SET #i = #i+1
END
SET #i = 65
WHILE #i < 91
BEGIN
INSERT INTO Chars (C) VALUES (CHAR(#i))
SET #i = #i+1
END
Then use this simple select statement to generate a random string from this table:
SELECT TOP 10 C AS [text()]
FROM Chars
ORDER BY NEWID()
FOR XML PATH('')
The advantages:
You can easily control the allowed characters.
The generation of a new string is a simple select statement and not manipulation on strings.
The disadvantages:
This select results with an ugly name (i.e XML_F52E2B61-18A1-11d1-B105-00805F49916B). This is easily solved by setting the result into a local variable.
Characters will only appear once in every string. This can easily be solved by adding union:
example:
SELECT TOP 10 C AS [text()]
FROM (
SELECT * FROM Chars
UNION ALL SELECT * FROM Chars
) InnerSelect
ORDER BY NEWID()
FOR XML PATH('')
Another option is to use STUFF function instead of As [Text()] to eliminate those pesky XML tags:
SELECT STUFF((
SELECT TOP 100 ''+ C
FROM Chars
ORDER BY NEWID()
FOR XML PATH('')
), 1, 1, '') As RandomString;
This option doesn't have the disadvantage of the ugly column name, and can have an alias directly. Execution plan is a little different but it should not suffer a lot of performance lose.
Play with it yourself in this Sql Fiddle
If there are any more advantages / disadvantages you think of please leave a comment. Thanks.
NewID() Function will generate unique numbers.So i have incremented them with loop and picked up the combination of alpha numeric characters using Charindex and Left functions
;with list as
(
select 1 as id,newid() as val
union all
select id + 1,NEWID()
from list
where id + 1 < 100
)
select ID,left(val, charindex('-', val) - 2) from list
option (maxrecursion 0)
The drawback of NEWID() for this request is it limits the character pool to 0-9 and A-F. To define your own character pool, you have to role a custom solution.
This solution adapted from Generating random strings with T-SQL
--Define list of characters to use in random string
DECLARE #CharPool VARCHAR(255)
SET #CharPool = '0123456789abcdefghijkmnopqrstuvwxyz'
--Store length of CharPool for use later
DECLARE #PoolLength TINYINT
SET #PoolLength = LEN(#CharPool) --36
--Define random string length
DECLARE #StringLength TINYINT
SET #StringLength = 9
--Declare target parameter for random string
DECLARE #RandomString VARCHAR(255)
SET #RandomString = ''
--Loop control variable
DECLARE #LoopCount TINYINT
SET #LoopCount = 0
--For each char in string, choose random char from char pool
WHILE(#LoopCount < #StringLength)
BEGIN
SELECT #RandomString += SUBSTRING(#Charpool, CONVERT(int, RAND() * #PoolLength), 1)
SELECT #LoopCount += 1
END
SELECT #RandomString
http://sqlfiddle.com/#!6/9eecb/4354
I must reiterate, however, that I agree with the others: this is a horrible idea.

get character only string from another string in sql server

I am looking for solution to get a character based string extracted from another string.
I need only first 4 "characters only" from another string.
The restriction here is that "another" string may contain spaces, special characters, numbers etc and may be less than 4 characters.
For example - I should get
"NAGP" if source string is "Nagpur District"
"ILLF" if source string is "Ill Fated"
"RAJU" if source string is "RA123 *JU23"
"MAC" if source string is "MAC"
Any help is greatly appreciated.
Thanks for sharing your time and wisdom.
You can use the answer in the question and add substring method to get your value of desired length
How to strip all non-alphabetic characters from string in SQL Server?
i.e.
Create Function [dbo].[RemoveNonAlphaCharacters](#Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin
Declare #KeepValues as varchar(50)
Set #KeepValues = '%[^a-z]%'
While PatIndex(#KeepValues, #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex(#KeepValues, #Temp), 1, '')
Return #Temp
End
use it like
Select SUBSTRING(dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl'), 1, 4);
Here SUBSTRING is used to get string of length 4 from the returned value.
^([a-zA-Z])[^a-zA-Z\n]*([a-zA-Z])?[^a-zA-Z\n]*([a-zA-Z])?[^a-zA-Z\n]*([a-zA-Z])?
You can try this.Grab the captures or groups.See demo.
http://regex101.com/r/rQ6mK9/42
A bit late to the party here, but as a general rule I despise all functions with BEGIN .. END, they almost never perform well, and since this covers all scalar functions (until Microsoft implement inline scalar expressions), as such whenever I see one I look for an alternative that offers similar reusability. In this case the query can be converted to an inline table valued function:
CREATE FUNCTION dbo.RemoveNonAlphaCharactersTVF (#String NVARCHAR(1000), #Length INT)
RETURNS TABLE
AS
RETURN
( WITH E1 (N) AS
( SELECT 1
FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) n (N)
),
E2 (N) AS (SELECT 1 FROM E1 CROSS JOIN E1 AS E2),
N (Number) AS (SELECT TOP (LEN(#String)) ROW_NUMBER() OVER(ORDER BY E1.N) FROM E2 CROSS JOIN E1)
SELECT Result = ( SELECT TOP (ISNULL(#Length, 1000)) SUBSTRING(#String, n.Number, 1)
FROM N
WHERE SUBSTRING(#String, n.Number, 1) LIKE '[a-Z]'
ORDER BY Number
FOR XML PATH('')
)
);
All this does is use a list of numbers to expand the string out into columns, e.g. RA123 *JU23T becomes:
Letter
------
R
A
1
2
3
*
J
U
2
3
T
The rows that are not alphanumeric are then removed by the where clause:
WHERE SUBSTRING(#String, n.Number, 1) LIKE '[a-Z]'
Leaving
Letter
------
R
A
J
U
T
The #Length parameter then limits the characters (in your case this would be 4), then the string is rebuilt using XML concatenation. I would usually use FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') for xml concatenation to allow for xml characters, but since I know there are none I haven't bothered as it is additional overhead.
Running some tests on this with a sample table of 1,000,000 rows:
CREATE TABLE dbo.T (String NVARCHAR(1000));
INSERT T (String)
SELECT TOP 1000000 t.String
FROM (VALUES ('Nagpur District'), ('Ill Fated'), ('RA123 *JU23'), ('MAC')) t (String)
CROSS JOIN sys.all_objects a
CROSS JOIN sys.all_objects B
ORDER BY a.object_id;
Then comparing the scalar and the inline udfs (called as follows):
SELECT COUNT(SUBSTRING(dbo.RemoveNonAlphaCharacters(t.String), 1, 4))
FROM T;
SELECT COUNT(tvf.Result)
FROM T
CROSS APPLY dbo.RemoveNonAlphaCharactersTVF (t.String, 4) AS tvf;
Over 15 test runs (probably not enough for an accurate figure, but enough to paint the picture) the average execution time for the scalar UDF was 11.824s, and for the inline TVF was 1.658, so approximately 85% faster.

Resources