Comparing and replacing characters in a string in SQL Server - sql-server

I have a string say 'Hel#1*oO'
Input string -- Hel#1*oO
I want to create a function that will parse through the string 'Hel#1*oO' and replace all characters other than alphanumeric with #.
Basically I want to use regex as [^A-Za-z0-9]. So that other than these characters everything will be replaced with #
The Output will be -- Hel#1#oO
We have REGEX_REPLACE() in Oracle that does the same functionality but I need to get this functionality in SQL Server.
What set of functions can be used to achieve this.
Thanks for the help!

As you may have found out, T-SQL has no Regex support and thus no support for Regex replacement. You can achieve Regex support with CLR functions if needed, however, I'm not going to cover that here as there are a wealth of resources out there for that already if you want to go down that route.
Assuming, however, you are on a fully supported version of SQL Server, you can use a Tally to break the string into individual characters, and then reaggregate the string with STRING_AGG (if you aren't on a fully supported version, you'll need to use the "old" FOR XML PATH method).
This gives you something like this:
DECLARE #String nvarchar(4000) = N'Hel#1*oO',
#Pattern nvarchar(100) = N'[^A-Za-z0-9]',
#ReplacementCharacter nvarchar(1) = '#';
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT TOP(LEN(#String))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3, N N4)
SELECT STRING_AGG(CASE WHEN V.C LIKE #Pattern THEN #ReplacementCharacter ELSE V.C END,'') WITHIN GROUP (ORDER BY T.I)
FROM Tally T
CROSS APPLY (VALUES(SUBSTRING(#String,T.I,1)))V(C);
If you wanted to, you could convert this to an inline table-value function, and then use that against a column (or value):
CREATE OR ALTER FUNCTION dbo.PatternCharacterReplace (#String nvarchar(4000), #Pattern nvarchar(100), #ReplacementCharacter nvarchar(1))
RETURNS table
AS RETURN
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT TOP(LEN(#String))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3, N N4) --4096 rows; For a varchar(8000) or MAX you would need more rows for such lengths
SELECT STRING_AGG(CASE WHEN V.C LIKE #Pattern THEN #ReplacementCharacter ELSE V.C END,'') WITHIN GROUP (ORDER BY T.I) AS ReplacedString
FROM Tally T
CROSS APPLY (VALUES(SUBSTRING(#String,T.I,1)))V(C);
GO
SELECT *
FROM (VALUES(N'Hel#1*oO'),('H0w 4re y0u? :)'))V(S)
CROSS APPLY dbo.PatternCharacterReplace(V.S,N'[^A-Za-z0-9]',N'#') PCR;
Note that for the function, you may need to create multiple versions for nvarchar and varchar (and possibly explicitly ones for MAX length ones too)
Again, as mentioned, if you need true Regex replacement functionality, you'll need to look into CLR or do the operation outside of SQL Server.

Related

Regular Expressions in SQL Server to validate fax numbers

I'm trying to use RegEx to validate phone numbers saved in a SQL Server 2016 Table. In this table there are thousands of fax numbers which stored as different formats. Ex : 800-123-4567, 800/123-4567, 800#123/4567 etc. Now I'm wanting to use RegEx to validate the phone numbers also output them in a way without any special characters or spaces in between. Ex : 8001234567.
Here's what I have tried which does not seems to work for some reason. If anyone out there could correct me what I'm doing wrong here, I would really appreciate it.
DECLARE #expres VARCHAR(50) = '%[/,-,\,#]%'
DECLARE #cmpfax as CHAR(100);
SELECT #cmpfax = cmp_fax FROM cicmpy WHERE LTRIM(RTRIM(cmp_code)) = '100373' AND cmp_fax IS NOT NULL;
SELECT REPLACE(#cmpfax, #expres, '#');
Here's my dbfiddle for the above code which I've tested.
This is a variation on the answer I linked earlier, however, as the OP is on 2016 RTM, they can't use STRING_AGG. This, therefore, uses the "old" FOR XML PATH (and STUFF) method to reaggregate the characters.
CREATE OR ALTER FUNCTION [fn].[PatternCharacterReplace_XML] (#String varchar(8000), #Pattern varchar(100), #ReplacementCharacter varchar(1))
RETURNS table
AS RETURN
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT TOP(LEN(#String))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3, N N4)
SELECT (SELECT CASE WHEN V.C LIKE #Pattern THEN #ReplacementCharacter ELSE V.C END
FROM Tally T
CROSS APPLY (VALUES(SUBSTRING(#String,T.I,1)))V(C)
ORDER BY T.I
FOR XML PATH(''),TYPE).value('(./text())[1]','varchar(8000)') AS ReplacedString;
Then you get the values and remove the characters with something like this:
SELECT YT.Fax
PCRX.ReplacedString AS NewFax
FROM dbo.YourTable YT
CROSS APPLY fn.PatternCharacterReplace_XML(YT.Fax, '[^0-9]', '') PCRX
WHERE YT.Fax LIKE '%[^0-9]%';

SQL - Add new column with outputs as values

Just wondering how I might go about adding the ouputted results as a new column to an exsisting table.
What I'm tryng to do is extract the date from a string which is in another column. I have the below code to do this:
Code
CREATE FUNCTION dbo.udf_GetNumeric
(
#strAlphaNumeric VARCHAR(256)
)
RETURNS VARCHAR(256)
AS
BEGIN
DECLARE #intAlpha INT
SET #intAlpha = PATINDEX('%[^0-9]%', #strAlphaNumeric)
BEGIN
WHILE #intAlpha > 0
BEGIN
SET #strAlphaNumeric = STUFF(#strAlphaNumeric, #intAlpha, 1, '' )
SET #intAlpha = PATINDEX('%[^0-9]%', #strAlphaNumeric )
END
END
RETURN ISNULL(#strAlphaNumeric,0)
END
GO
Now use the function as
SELECT dbo.udf_GetNumeric(column_name)
from table_name
The issue is that I want the result to be placed in a new column in an exsisting table. I have tried the below code but no luck.
ALTER TABLE [Data_Cube_Data].[dbo].[DB_Test]
ADD reportDated nvarchar NULL;
insert into [DB].[dbo].[DB_Test](reportDate)
SELECT
(SELECT dbo.udf_GetNumeric(FileNamewithDate) from [DB].[dbo].[DB_Test])
The syntax should be an UPDATE, not an INSERT, because you want to update existing rows, not insert new ones:
UPDATE Data_Cube_Data.dbo.DB_Test -- you don't need square bracket noise
SET reportDate = dbo.udf_GetNumeric(FileNamewithDate);
But yeah, I agree with the others, the function looks like the result of a "how can I make this object the least efficient thing in my entire database?" contest. Here's a better alternative:
-- better, set-based TVF with no while loop
CREATE FUNCTION dbo.tvf_GetNumeric
(#strAlphaNumeric varchar(256))
RETURNS TABLE
AS
RETURN
(
WITH cte(n) AS
(
SELECT TOP (256) n = ROW_NUMBER() OVER (ORDER BY ##SPID)
FROM sys.all_objects
)
SELECT output = COALESCE(STRING_AGG(
SUBSTRING(#strAlphaNumeric, n, 1), '')
WITHIN GROUP (ORDER BY n), '')
FROM cte
WHERE SUBSTRING(#strAlphaNumeric, n, 1) LIKE '%[0-9]%'
);
Then the query is:
UPDATE t
SET t.reportDate = tvf.output
FROM dbo.DB_Test AS t
CROSS APPLY dbo.tvf_GetNumeric(t.FileNamewithDate) AS tvf;
Example db<>fiddle that shows this has the same behavior as your existing function.
The function
As i mentioned in the comments, I would strongly suggest rewriting the function, it'll perform terribly. Multi-line table value function can perform poorly, and you also have a WHILE which will perform awfully. SQL is a set based language, and so you should be using set based methods.
There are a couple of alternatives though:
Inlinable Scalar Function
SQL Server 2019 can inline function, so you could inline the above. I do, however, assume that your value can only contain the characters A-z and 0-9. if it can contain other characters, such as periods (.), commas (,), quotes (") or even white space ( ), or your not on 2019 then don't use this:
CREATE OR ALTER FUNCTION dbo.udf_GetNumeric (#strAlphaNumeric varchar(256))
RETURNS varchar(256) AS
BEGIN
RETURN TRY_CONVERT(int,REPLACE(TRANSLATE(LOWER(#strAlphaNumeric),'abcdefghigclmnopqrstuvwxyz',REPLICATE('|',26)),'|',''));
END;
GO
SELECT dbo.udf_GetNumeric('abs132hjsdf');
The LOWER is there in case you are using a case sensitive collation.
Inline Table Value Function
This is the better solution in my mind, and doesn't have the caveats of the above.
It uses a Tally to split the data into individual characters, and then only reaggregate the characters that are a digit. Note that I assume you are using SQL Server 2017+ here:
DROP FUNCTION udf_GetNumeric; --Need to drop as it's a scalar function at the moment
GO
CREATE OR ALTER FUNCTION dbo.udf_GetNumeric (#strAlphaNumeric varchar(256))
RETURNS table AS
RETURN
WITH N AS (
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS(
SELECT TOP (LEN(#strAlphaNumeric))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3, N N4)
SELECT STRING_AGG(CASE WHEN V.C LIKE '[0-9]' THEN V.C END,'') WITHIN GROUP (ORDER BY T.I) AS strNumeric
FROM Tally T
CROSS APPLY (VALUES(SUBSTRING(#strAlphaNumeric,T.I,1)))V(C);
GO
SELECT *
FROM dbo.udf_GetNumeric('abs132hjsdf');
Your table
You define reportDated as nvarchar; this means nvarchar(1). Your function, however, returns a varchar(256); this will rarely fit in an nvarchar(1).
Define the column properly:
ALTER TABLE [dbo].[DB_Test] ADD reportDated varchar(256) NULL;
If you've already created the column then do the following:
ALTER TABLE [dbo].[DB_Test] ALTER COLUMN reportDated varchar(256) NULL;
I note, however, that the column is called "dated", which implies a date value, but it's a (n)varchar; that sounds like a flaw.
Updating the column
Use an UPDATE statement. Depending on the solution this would one of the following:
--Scalar function
UPDATE [dbo].[DB_Test]
SET reportDated = dbo.udf_GetNumeric(FileNamewithDate);
--Table Value Function
UPDATE DBT
SET reportDated = GN.strNumeric
FROM [dbo].[DB_Test] DBT
CROSS APPLY dbo.udf_GetNumeric(FileNamewithDate);

How do I enable ordinals from the STRING_SPLIT function in MSSQL

I'm trying to use the STRING_SPLIT function in Microsoft SQL Server 2019. The function works, if I only put in two arguments, but since I want to extract a specific element from the string, I would like to enable ordinals.
When I add the third argument to the STRING_SPLIT function it returns
Msg 8144, Level 16, State 3, Line 5 Procedure or function STRING_SPLIT
has too many arguments specified.
I don't understand what I'm doing wrong, since hovering over the STRING_SPLIT function clearly states that the function can take a third argument as an int.
My SQL code is as follows
SELECT *
FROM STRING_SPLIT('[Control Structure].Root.NP_02.ABC01_02_03.Applications.Prototype.Control Modules.ABC060V.ABC060VXFR2','.',1)
WHERE ORDINAL = 4
You can't enable it, since it is not available in SQL Server 2019 (and is almost certainly not going to be back-ported there).
The problem is that SSMS has IntelliSense / tooltips coded without conditional logic based on version, and the code is ahead of the engine. Currently the functionality is only available in Azure SQL Database, Managed Instance, and Synapse.
From the documentation:
The enable_ordinal argument and ordinal output column are currently only supported in Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics (serverless SQL pool only).
Some more background:
Trusting STRING_SPLIT() order in Azure SQL Database
What you can do instead is create your own inline table-valued UDF that provides the same type of ordinal output (and make it return the same output as STRING_SPLIT to make it easy to change later). There are many variations on this; here's one:
CREATE FUNCTION dbo.SplitStrings_Ordered
(
#List nvarchar(max),
#Delimiter nvarchar(255)
)
RETURNS TABLE
AS
RETURN (SELECT value = Item ,
ordinal = ROW_NUMBER() OVER (ORDER BY Number),
FROM (SELECT Number, Item = SUBSTRING(#List, Number,
CHARINDEX(#Delimiter, #List + #Delimiter, Number) - Number)
FROM (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2) AS n(Number)
WHERE Number <= CONVERT(INT, LEN(#List))
AND SUBSTRING(#Delimiter + #List, Number, LEN(#Delimiter)) = #Delimiter
) AS y);
GO
Another simpler way would be to use JSON, which I forgot I even wrote recently in this tip:
CREATE FUNCTION dbo.SplitStrings_Ordered
(
#List nvarchar(max),
#Delimiter nchar(1)
)
RETURNS table WITH SCHEMABINDING
AS
RETURN
(
SELECT value, ordinal = [key]
FROM OPENJSON(N'["' + REPLACE(#List, #Delimiter, N'","') + N'"]') AS x
);
GO
Also, if you're just trying to get the last ordinal in a (1-)4-part name and each part is <= 128 characters, you can use PARSENAME():
DECLARE #str nvarchar(512) = N'here is one.here is two.and three.and four';
SELECT p1 = PARSENAME(#str, 4),
p2 = PARSENAME(#str, 3),
p3 = PARSENAME(#str, 2),
p4 = PARSENAME(#str, 1);
Output:
p1
p2
p3
p4
here is one
here is two
and three
and four
Example db<>fiddle
We can sort of cheat our way around ordinal as our order by using the current order instead. Keep in mind that the default order for STRING_SPLIT is non-deterministic:
STRING_SPLIT() reference
The output rows might be in any order. The order is not guaranteed to match the order of the substrings in the input string. You can override the final sort order by using an ORDER BY clause on the SELECT statement, for example, ORDER BY value or ORDER BY ordinal.
DECLARE #object as nvarchar(500) = 'test_string_split_order_string'
select
value,
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RowNum
from STRING_SPLIT(#object, '_')
SQL Server's XML/XQuery allow to implement very easy tokenization of a string.
XML/XQuery data model is based on ordered sequences.
It allows to retrieve any token based on its position is a string of tokens.
SQL
DECLARE #tokens VARCHAR(256) = '[Control Structure].Root.NP_02.ABC01_02_03.Applications.Prototype.Control Modules.ABC060V.ABC060VXFR2'
, #separator CHAR(1) = '.'
, #pos INT = 4;
SELECT c.value('(/root/r[sql:variable("#pos")]/text())[1]', 'VARCHAR(100)') AS token
FROM (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(#tokens, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t(c);
Output
+-------------+
| token |
+-------------+
| ABC01_02_03 |
+-------------+
yet another way (with ';' as delimiter)
create function dbo.split_string_ord
(
#sentence nvarchar(max)
)
returns table
as
return(
with first_word(ordinal,word,sentence) as (
Select
1 as ordinal,
substring(#sentence+';',1,charindex(';',#sentence+';',1)-1) as word,
substring(#sentence+';',charindex(';',#sentence+';',1)+1,LEN(#sentence+';')-charindex(';',#sentence+';',1)+1) as sentence
union all
Select
ordinal + 1 as ordinal,
substring(sentence,1,charindex(';',sentence,1)-1) as word,
substring(sentence,charindex(';',sentence,1)+1,LEN(sentence)-charindex(';',sentence,1)+1) as sentence
from
first_word
where
sentence != ''
)
Select
ordinal,
word
from
first_word
)
;

SQL Server 2008 split string fails due to ampersand

I have created a stored procedure to attempt to replicate the split_string function that is now in SQL Server 2016.
So far I have got this:
CREATE FUNCTION MySplit
(#delimited NVARCHAR(MAX), #delimiter NVARCHAR(100))
RETURNS #t TABLE
(
-- Id column can be commented out, not required for SQL splitting string
id INT IDENTITY(1,1), -- I use this column for numbering split parts
val NVARCHAR(MAX)
)
AS
BEGIN
DECLARE #xml XML
SET #xml = N'<root><r>' + replace(#delimited,#delimiter,'</r><r>') + '</r></root>'
INSERT INTO #t(val)
SELECT
r.value('.','varchar(max)') AS item
FROM
#xml.nodes('//root/r') AS records(r)
RETURN
END
GO
And it does work, but it will not split the text string if any part of it contains an ampersand [ & ].
I have found hundreds of examples of splitting a string, but none seem to deal with special characters.
So using this:
select *
from MySplit('Test1,Test2,Test3', ',')
works ok, but
select *
from MySplit('Test1 & Test4,Test2,Test3', ',')
does not. It fails with
XML parsing: line 1, character 17, illegal name character.
What have I done wrong?
UPDATE
Firstly, thanks for #marcs, for showing me the error of my ways in writing this question.
Secondly, Thanks to all of the help below, especially #PanagiotisKanavos and #MatBailie
As this is throw away code for migrating data from old to new system, I have chosen to use #MatBailie solution, quick and very dirty, but also perfect for this task.
In the future, though, I will be progressing down #PanagiotisKanavos solution.
Edit your function and replace all & as &
This will remove the error. This happens because XML cannot parse & as it's an inbuilt tag.
Create FUNCTION [dbo].[split_stringss](
#delimited NVARCHAR(MAX),
#delimiter NVARCHAR(100)
) RETURNS #t TABLE (id INT IDENTITY(1,1), val NVARCHAR(MAX))
AS
BEGIN
DECLARE #xml XML
DECLARE #var NVARCHAR(MAX)
DECLARE #var1 NVARCHAR(MAX)
set #var1 = Replace(#delimited,'&','&')
SET #xml = N'<t>' + REPLACE(#var1,#delimiter,'</t><t>') + '</t>'
INSERT INTO #t(val)
SELECT r.value('.','varchar(MAX)') as item
FROM #xml.nodes('/t') as records(r)
RETURN
END
First of all, SQL Server 2016 introduced a STRING_SPLIT TVF. You can write CROSS APPLY STRING_SPLIT(thatField,',') as items
In previous versions you still need to create a custom splitting function. There are various techniques. The fastest solution is to use a SQLCLR function.
In some cases, the second fastest is what you used -
convert the text to XML and select the nodes. A well known problem with this splitting technique is that illegal XML characters will break it, as you found out. That's why Aaron Bertrand doesn't consider this a generic splitter.
You can replace invalid characters by their encoded values, eg & with & but you have to be certain that your text will never contain such encodings.
Perhaps you should investigate different techniques, like the Moden function, which can be faster in many situations :
CREATE FUNCTION dbo.SplitStrings_Moden
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
E2(N) AS (SELECT 1 FROM E1 a, E1 b),
E4(N) AS (SELECT 1 FROM E2 a, E2 b),
E42(N) AS (SELECT 1 FROM E4 a, E2 b),
cteTally(N) AS (SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(#List,1)))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E42),
cteStart(N1) AS (SELECT t.N+1 FROM cteTally t
WHERE (SUBSTRING(#List,t.N,1) = #Delimiter OR t.N = 0))
SELECT Item = SUBSTRING(#List, s.N1, ISNULL(NULLIF(CHARINDEX(#Delimiter,#List,s.N1),0)-s.N1,8000))
FROM cteStart s;
Personally I created and use a SQLCLR UDF.
Another option is to avoid splitting altogether and pass table-valued parameters from the client to the server. Or use a microORM like Dapper that can construct an IN (...) clause from a list of values, eg:
var products=connection.Query<Product>("select * from products where id in #ids",new {ids=myIdArray});
An ORM like EF that supports LINQ can also generate an IN clause :
var products = from product in dbContext.Products
where myIdArray.Contains(product.Id)
select product;

get character only string from another string in sql server

I am looking for solution to get a character based string extracted from another string.
I need only first 4 "characters only" from another string.
The restriction here is that "another" string may contain spaces, special characters, numbers etc and may be less than 4 characters.
For example - I should get
"NAGP" if source string is "Nagpur District"
"ILLF" if source string is "Ill Fated"
"RAJU" if source string is "RA123 *JU23"
"MAC" if source string is "MAC"
Any help is greatly appreciated.
Thanks for sharing your time and wisdom.
You can use the answer in the question and add substring method to get your value of desired length
How to strip all non-alphabetic characters from string in SQL Server?
i.e.
Create Function [dbo].[RemoveNonAlphaCharacters](#Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin
Declare #KeepValues as varchar(50)
Set #KeepValues = '%[^a-z]%'
While PatIndex(#KeepValues, #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex(#KeepValues, #Temp), 1, '')
Return #Temp
End
use it like
Select SUBSTRING(dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl'), 1, 4);
Here SUBSTRING is used to get string of length 4 from the returned value.
^([a-zA-Z])[^a-zA-Z\n]*([a-zA-Z])?[^a-zA-Z\n]*([a-zA-Z])?[^a-zA-Z\n]*([a-zA-Z])?
You can try this.Grab the captures or groups.See demo.
http://regex101.com/r/rQ6mK9/42
A bit late to the party here, but as a general rule I despise all functions with BEGIN .. END, they almost never perform well, and since this covers all scalar functions (until Microsoft implement inline scalar expressions), as such whenever I see one I look for an alternative that offers similar reusability. In this case the query can be converted to an inline table valued function:
CREATE FUNCTION dbo.RemoveNonAlphaCharactersTVF (#String NVARCHAR(1000), #Length INT)
RETURNS TABLE
AS
RETURN
( WITH E1 (N) AS
( SELECT 1
FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) n (N)
),
E2 (N) AS (SELECT 1 FROM E1 CROSS JOIN E1 AS E2),
N (Number) AS (SELECT TOP (LEN(#String)) ROW_NUMBER() OVER(ORDER BY E1.N) FROM E2 CROSS JOIN E1)
SELECT Result = ( SELECT TOP (ISNULL(#Length, 1000)) SUBSTRING(#String, n.Number, 1)
FROM N
WHERE SUBSTRING(#String, n.Number, 1) LIKE '[a-Z]'
ORDER BY Number
FOR XML PATH('')
)
);
All this does is use a list of numbers to expand the string out into columns, e.g. RA123 *JU23T becomes:
Letter
------
R
A
1
2
3
*
J
U
2
3
T
The rows that are not alphanumeric are then removed by the where clause:
WHERE SUBSTRING(#String, n.Number, 1) LIKE '[a-Z]'
Leaving
Letter
------
R
A
J
U
T
The #Length parameter then limits the characters (in your case this would be 4), then the string is rebuilt using XML concatenation. I would usually use FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') for xml concatenation to allow for xml characters, but since I know there are none I haven't bothered as it is additional overhead.
Running some tests on this with a sample table of 1,000,000 rows:
CREATE TABLE dbo.T (String NVARCHAR(1000));
INSERT T (String)
SELECT TOP 1000000 t.String
FROM (VALUES ('Nagpur District'), ('Ill Fated'), ('RA123 *JU23'), ('MAC')) t (String)
CROSS JOIN sys.all_objects a
CROSS JOIN sys.all_objects B
ORDER BY a.object_id;
Then comparing the scalar and the inline udfs (called as follows):
SELECT COUNT(SUBSTRING(dbo.RemoveNonAlphaCharacters(t.String), 1, 4))
FROM T;
SELECT COUNT(tvf.Result)
FROM T
CROSS APPLY dbo.RemoveNonAlphaCharactersTVF (t.String, 4) AS tvf;
Over 15 test runs (probably not enough for an accurate figure, but enough to paint the picture) the average execution time for the scalar UDF was 11.824s, and for the inline TVF was 1.658, so approximately 85% faster.

Resources