Convert Unicode string containing Arabic characters to ASCII in SQL Server - sql-server

How to convert string '۱۳۹۴' to '1394'?
I try change collation but does not work.
Please note that I read data from external device in C# .

i have tried to solve problem after search on internet i came to the conclusion the best way to solve this problem is function
ALTER FUNCTION [dbo].[udf_ReplaceArabicNumbers]
(#str NVARCHAR(1000))
RETURNS NVARCHAR(2000)
AS
BEGIN
DECLARE #i INT = 1
WHILE #i<=LEN(#str)
BEGIN
DECLARE #val NVARCHAR(1)
SET #val = SUBSTRING(#str, #i, 1)
DECLARE #newchar NVARCHAR(1)
SET #newchar = CASE(#val)
WHEN N'۱' THEN 1
WHEN N'۲' THEN 2
WHEN N'۳' THEN 3
WHEN N'۴' THEN 4
WHEN N'۵' THEN 5
WHEN N'۶' THEN 6
WHEN N'۷' THEN 7
WHEN N'۸' THEN 8
WHEN N'۹' THEN 9
WHEN N'۰' THEN 0
END
SET #str = REPLACE(#str, #val, #newchar)
SET #i+=1;
END
RETURN #str
END
and call to this function
select [dbo].[udf_ReplaceArabicNumbers] (N'۱۳۹۴')
i refer this site http://unicode-table.com/en/
with the help of UNICODE we can get HTML-Code and use in our Program
select '&#' + cast (UNICODE(N'۱')as nvarchar(10)) + ';',
'&#' + cast (UNICODE(N'۳')as nvarchar(10)) + ';',
'&#' + cast (UNICODE(N'۹')as nvarchar(10)) + ';',
'&#' + cast (UNICODE(N'۴')as nvarchar(10)) + ';'
and result would be

Based on the properties of the unicode code points numbers, you could use something like this:
DECLARE #ArabicNumber NVARCHAR(4)
SET #ArabicNumber=N'۱۳۹۴'
SELECT
LEFT(CONVERT(NVARCHAR(4),CONVERT(VARBINARY(8),
CONVERT(BIGINT,CONVERT(VARBINARY(8),CONVERT(NCHAR(4),#ArabicNumber)))
& CONVERT(VARBINARY(8),REPLICATE(0x0F00,4))
^ CONVERT(VARBINARY(8),REPLICATE(0x3000,4))
)),LEN(#ArabicNumber))
This works if the input string contains only numbers and it is limited to 4 characters, to fit in a bigint, for the bitwise operations. For longer strings, you should use a WHILE loop to process each character.

Related

What is the encode(<columnName>, 'escape') PostgreSQL equivalent in SQL Server?

In the same vein as this question, what is the equivalent in SQL Server to the following Postgres statement?
select encode(some_field, 'escape') from only some_table
As you were told already, SQL-Server is not the best with such issues.
The most important advise to avoid such issues is: Use the appropriate data type to store your values. Storing binary data as a HEX-string is running against this best practice. But there are some workarounds:
I use the HEX-string taken from the linked question:
DECLARE #str VARCHAR(100)='0x61736461640061736461736400';
--here I use dynamically created SQL to get the HEX-string as a real binary:
DECLARE #convBin VARBINARY(MAX);
DECLARE #cmd NVARCHAR(MAX)=N'SELECT #bin=' + #str;
EXEC sp_executeSql #cmd
,N'#bin VARBINARY(MAX) OUTPUT'
,#bin=#convBin OUTPUT;
--This real binary can be converted to a VARCHAR(MAX).
--Be aware, that in this case the input contains 00 as this is an array.
--It is possible to split the input at the 00s, but this is going to far...
SELECT #convBin AS HexStringAsRealBinary
,CAST(#convBin AS VARCHAR(MAX)) AS CastedToString; --You will see the first "asda" only
--If your HEX-string is not longer than 10 bytes there is an undocumented function:
--You'll see, that the final AA is cut away, while a shorter string would be filled with zeros.
SELECT sys.fn_cdc_hexstrtobin('0x00112233445566778899AA')
SELECT CAST(sys.fn_cdc_hexstrtobin(#str) AS VARCHAR(100));
UPDATE: An inlinable approach
The following recursive CTE will read the HEX-string character by character.
Furthermore it will group the result and return two rows in this case.
This solution is very specific to the given input.
DECLARE #str VARCHAR(100)='0x61736461640061736461736400';
WITH recCTE AS
(
SELECT 1 AS position
,1 AS GroupingKey
,SUBSTRING(#str,3,2) AS HEXCode
,CHAR(SUBSTRING(sys.fn_cdc_hexstrtobin('0x' + SUBSTRING(#str,3,2)),1,1)) AS TheLetter
UNION ALL
SELECT r.position+1
,r.GroupingKey + CASE WHEN SUBSTRING(#str,2+(r.position)*2+1,2)='00' THEN 1 ELSE 0 END
,SUBSTRING(#str,2+(r.position)*2+1,2)
,CHAR(SUBSTRING(sys.fn_cdc_hexstrtobin('0x' + SUBSTRING(#str,2+(r.position)*2+1,2)),1,1)) AS TheLetter
FROM recCTE r
WHERE position<LEN(#str)/2
)
SELECT r.GroupingKey
,(
SELECT x.TheLetter AS [*]
FROM recCTE x
WHERE x.GroupingKey=r.GroupingKey
AND x.HEXCode<>'00'
AND LEN(x.HEXCode)>0
ORDER BY x.position
FOR XML PATH(''),TYPE
).value('.','varchar(max)')
FROM recCTE r
GROUP BY r.GroupingKey;
The result
1 asdad
2 asdasd
Hint: Starting with SQL Server 2017 there is STRING_AGG(), which would reduce the final SELECT...
If you need this functionality, it's going to be up to you to implement it. Assuming you just need the escape variant, you can try to implement it as a T-SQL UDF. But pulling strings apart, working character by character and building up a new string just isn't a T-SQL strength. You'd be looking at a WHILE loop to count over the length of the input byte length, SUBSTRING to extract the individual bytes, and CHAR to directly convert the bytes that don't need to be octal encoded.1
If you're going to start down this route (and especially if you want to support the other formats), I'd be looking at using the CLR support in SQL Server, to create the function in a .NET language (C# usually preferred) and use the richer string manipulation functionality there.
Both of the above assume that what you're really wanting is to replicate the escape format of encode. If you just want "take this binary data and give me a safe string to represent it", just use CONVERT to get the binary hex encoded.
1Here's my attempt at it. I'd suggest a lot of testing and tweaking before you use it in anger:
create function Postgresql_encode_escape (#input varbinary(max))
returns varchar(max)
as
begin
declare #i int
declare #len int
declare #out varchar(max)
declare #chr int
select #i = 1, #out = '',#len = DATALENGTH(#input)
while #i <= #len
begin
set #chr = SUBSTRING(#input,#i,1)
if #chr > 31 and #chr < 128
begin
set #out = #out + CHAR(#chr)
end
else
begin
set #out = #out + '\' +
RIGHT('000' + CONVERT(varchar(3),
(#chr / 64)*100 +
((#chr / 8)%8)*10 +
(#chr % 8))
,3)
end
set #i = #i + 1
end
return #out
end

T-SQL Find length of word within a string

With PATINDEX I can find the first occourence of a pattern in a string, say a number - in the string there is several matches to my pattern
My question is how can I find the end position of the first occourence of that pattern in a string?
DECLARE #txt VARCHAR(255)
SET #txt = 'this is a string 30486240 and the string is still going 30485 and this is the end'
PRINT SUBSTRING(#txt,PATINDEX('%[0-9]%',#txt),8)
My problem is, I dont want to put in the 8 in manually, I want to find the length of the first number
Using SQL Server 2012
Try this, it should return the first number from your text:
DECLARE #txt VARCHAR(255)
SET #txt = 'this is a string 30486240 and the string is still going 30485 and this is the end'
DECLARE #startIndex INTEGER
SELECT #startIndex = PATINDEX('%[0-9]%',#txt)
DECLARE #remainingString NVARCHAR(MAX)
SELECT #remainingString = substring(#txt, #startIndex, LEN(#txt) - #startIndex)
DECLARE #endingIndex INTEGER
SELECT #endingIndex = PATINDEX('%[a-zA-Z]%', #remainingString) - 1
SELECT RTRIM(SUBSTRING(#txt, #startIndex, #endingIndex))
This query will work as long as you don't have letters "embedded" in your numbers, like 30486a24b0
Here is one solution when you don't know the length of the substring:
SELECT Left(
SubString(#Data, PatIndex('%[0-9.-]%', #Data), 8000),
PatIndex('%[^0-9.-]%', SubString(#Data, PatIndex('%[0-9.-]%', #Data), 8000) + 'X')-1)
Source: http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/extracting-numbers-with-sql-server/
I had to run through the exercise multiple times and kept thinking the blog post was wrong, before noticing the caret in the second PATINDEX.

SQL server convert hex string to varbinary

I have a string column that represents hex values, for example -
'274', '1A7', '3D1' and so on.
Now I need to convert these values to their integer values, so that '10' will be converted to 16, for example.
The code I use:
SELECT CONVERT(int, CONVERT(varbinary, '0x' + case when replicate('0', len(myHex) / 2) + myHex = '0' then '00' else replicate('0', len(myHex) / 2) + myHex end, 1))
I'm actually padding the string with a zero or two to make it's length even, and adding the '0x' prefix. However some (random) rows fail.
Is there another way to convert the values?
Thanks.
please give feedback
so that i can improve my answer
Here is one way to do it:
//create function fn_HexToIntnt(#str varchar(16))
//returns bigint as begin
select #str=upper(#str)
declare #i int, #len int, #char char(1), #output bigint
select #len=len(#str)
,#i=#len
,#output=case
when #len>0
then 0
end
while (#i>0)
begin
select #char=substring(#str,#i,1), #output=#output
+(ASCII(#char)
-(case
when #char between ‘A’ and ‘F’
then 55
else
case
when #char between ’0′ and ’9′
then 48 end
end))
*power(16.,#len-#i)
,#i=#i-1
end
return #output
end
or
SELECT CONVERT(INT, 0×00000100)
SELECT CONVERT(VARBINARY(8), 256)

Search Entire Database To find extended ascii codes in sql

We have issues with extended ascii codes getting in our database (128-155)
Is there anyway to search the entire database and display the results of any of these characters that may be in there and where they are located within the tables and columns.
Hope that makes sense.
I have the script to search entire DB, but having trouble with opening line.
DECLARE #SearchStr nvarchar(100)
SET #SearchStr != between char(32) and char(127)
I have this originally that works, but I need to extend the range I'm looking for.
SET #SearchStr = '|' + char(9) + '|' + char(10) + '|' + char(13)
Thanks
It's very unclear what your data looks like, but this might help you to get started:
declare #TestData table (String nvarchar(100))
insert into #TestData select N'abc'
insert into #TestData select N'def'
insert into #TestData select char(128)
insert into #TestData select char(155)
declare #SearchPattern nvarchar(max) = N'%['
declare #i int = 128
while #i <= 155
begin
set #SearchPattern += char(#i)
set #i += 1
end
set #SearchPattern += N']%'
select #SearchPattern
select String
from #TestData
where String like #SearchPattern
Of course you'll need to add some code to loop over every table and column that you want to query (see this question), and it's possible that this code will behave differently on different collations.
... where dodgyColumn is your column with questionable data ....
WHERE(patindex('%[' + char(127) + '-' + char(255) + ']%', dodgyColumn COLLATE Latin1_General_BIN2) > 0)
This works for us, to identify extended ASCII characters in our otherwise normal ASCII data (characters, numbers, punctuation, dollar and percent signs, etc.)

T-Sql function to convert a varchar - in this instance someone's name - from upper to title case?

Does anyone have in their back pocket a function that can achieve this?
Found this here :-
create function ProperCase(#Text as varchar(8000))
returns varchar(8000)
as
begin
declare #Reset bit;
declare #Ret varchar(8000);
declare #i int;
declare #c char(1);
select #Reset = 1, #i=1, #Ret = '';
while (#i <= len(#Text))
select #c= substring(#Text,#i,1),
#Ret = #Ret + case when #Reset=1 then UPPER(#c) else LOWER(#c) end,
#Reset = case when #c like '[a-zA-Z]' then 0 else 1 end,
#i = #i +1
return #Ret
end
Results from this:-
select dbo.propercase('ALL UPPERCASE'); -- All Uppercase
select dbo.propercase('MiXeD CaSe'); -- Mixed Case
select dbo.propercase('lower case'); -- Lower Case
select dbo.propercase('names with apostrophe - mr o''reilly '); -- Names With Apostrophe - Mr O'Reilly
select dbo.propercase('names with hyphen - mary two-barrels '); -- Names With Hyphen - Mary Two-Barrels
I'd do this outside of TSQL, in the calling code tbh.
e.g. if you're using .NET, it's just a case of using TextInfo.ToTitleCase.
That way, you leave your formatting code outside of TSQL (standard "let the caller decide how to use/format the data" approach).
This kind of function is better done on the application side, as it will perform relatively poorly in SQL.
With SQL-Server 2005 and above you could write a CLR function that does that and call it from your SQL. Here is an article on how to do this.
If you really want to do this in T-SQL and without a loop, see Tony Rogerson's article "Turning stuff into "Camel Case" without loops"
I haven't tried it... that's what client code it for :-)
No cursors, no while loops, no (inline) sub-queries
-- ===== IF YOU DON'T HAVE A NUMBERS TABLE =================
--CREATE TABLE Numbers (
-- Num INT NOT NULL PRIMARY KEY CLUSTERED WITH(FILLFACTOR = 100)
--)
--INSERT INTO Numbers
--SELECT TOP(11000)
-- ROW_NUMBER() OVER (ORDER BY (SELECT 1))
--FROM master.sys.all_columns a
-- CROSS JOIN master.sys.all_columns b
DECLARE #text VARCHAR(8000) = 'my text to make title-case';
DECLARE #result VARCHAR(8000);
SET #result = UPPER(LEFT(#text, 1));
SELECT
#result +=
CASE
WHEN SUBSTRING(#text, Num - 1, 1) IN (' ', '-') THEN UPPER(SUBSTRING(#text, Num, 1))
ELSE SUBSTRING(#text, Num, 1)
END
FROM Numbers
WHERE Num > 1 AND Num <= LEN(#text);
PRINT #result;
Will any given row only contain a firstname or a lastname that you wish to convert or will it contain full names separated by spaces? Also, are there any other rules you wish to what characters it should "upper" or lower"?
If you can guarantee that it's only first and last names and you aren't dealing with any specialized capitalization such as after an apostrophe, might this do what you're looking for?
SELECT -- Initial of First Name
UPPER(LEFT(FullName, 1))
-- Rest of First Name
+ SUBSTRING(LOWER(FullName), 2, CHARINDEX(' ', FullName, 0) - 2)
-- Space between names
+ ' '
-- Inital of last name
+ UPPER(SUBSTRING(FullName, CHARINDEX(' ', FullName, 0) + 1, 1))
-- Rest of last name
+ SUBSTRING(LOWER(FullName), CHARINDEX(' ', FullName, 0) + 2, LEN(FullName) - CHARINDEX(' ', FullName, 0) + 2)
FROM Employee

Resources