How to differentiate between 2 Arabic letters in SQL Server - sql-server

In Arabic there are 2 letters that pronounced the same but written differently
The letter ة
and the letter ت
I wanted to replace the letter ة with another letter ه
Now I used this
Update MyTable
SET MyColumn = Replace ( MyColumn, N'ة' , N'ه' )
But ended with replacing every letter that has ة or ت to be replaced with ه
How can I tell SQL Server to replace only ة Not ت ?

Specify a COLLATE clause with a binary collation to use the code points of the exact characters to be searched/replaced:
UPDATE dbo.MyTable
SET MyColumn = REPLACE( MyColumn COLLATE Arabic_BIN, N'ة' COLLATE Arabic_BIN, N'ه' COLLATE Arabic_BIN);

Related

Problem with Replace - it dosn't work for letter Ê

I have quite a strange problem. I have a column with surnames with national characters
I would like to replace national characters, so I made something like that
select replace(surname,'Ê','E')
for surname ABCDÊ as result still is ABCDÊ
but when I make a test and replace the value that I copied
select replace('ABCDÊ','Ê','E')
It works correctly and as result I get ABCDE
Feel like something is missing here. Is it possible your column is using a case sensitive collation? If so, either will have to override it, or just replace each letter case individually (probably the better method to preserve original letter case).
Adjusting for Case-Sensitive Collation
/*Collation abbreviation "CS" = Case-sensitive*/
DECLARE #table AS TABLE (surname VARCHAR(100) COLLATE SQL_Latin1_General_CP1_CS_AS)
INSERT INTO #table
VALUES ('abcdê'),('ABCDÊ')
SELECT surname
,YourOriginalCode = REPLACE(surname, 'Ê', 'E')
,ForceCaseInsensitiveCollation = REPLACE(surname COLLATE SQL_Latin1_General_CP1_CI_AS , 'Ê', 'E')
,ReplaceForEachLetterCase = REPLACE(REPLACE(surname,'Ê','E'),'ê','e')
,SQLServer2017Version = TRANSLATE(surname,'êÊ','eE')
FROM #table
Results
surname
YourOriginalCode
ForceCaseInsensitiveCollation
ReplaceForEachLetterCase
SQLServer2017Version
abcdê
abcdê
abcdE
abcde
abcde
ABCDÊ
ABCDE
ABCDE
ABCDE
ABCDE
This sort of issue usually comes down to how SQL stores unicode characters.
Try running the following to see what sort of output you get. On my server both work fine, but it may well help you identify your issue.
DECLARE #table AS TABLE ( surname VARCHAR(MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL )
INSERT INTO #table ( surname )
VALUES ( 'ABCDÊ' )
, ( N'ABCDÊ' )
SELECT surname
, REPLACE(surname, 'Ê', 'E')
FROM #table

Comparing romanian diacritics

I am working with romanian accented characters (diacritics) with Romanian_100_CI_AS collation.
Trying to select something regardless accented chars give me an unexpected result like this one:
IF N'tandarei' COLLATE Latin1_General_CI_AI = N'Țăndărei' COLLATE Latin1_General_CI_AI
SELECT 'Values are the same'
ELSE
SELECT 'Values are different'
returns
Values are different
"Țăndărei" is in a column with Romanian_100_CI_AS
what am I missing?

SQL Server 2016 How to use a simple Regular Expression in T-SQL?

I have a column with the name of a person in the following format: "LAST NAME, FIRST NAME"
Only Upper Cases Allowed
Space after comma optional
I would like to use a regular expression like: [A-Z]+,[ ]?[A-Z]+ but I do not know how to do this in T-SQL. In Oracle, I would use REGEXP_LIKE, is there something similar for SQL Server 2016?
I need something like the following:
UPDATE table
SET is_correct_format = 'YES'
WHERE REGEXP_LIKE(table.name,'[A-Z]+,[ ]?[A-Z]+');
First, case sensitivity depends on the collation of the DB, though with LIKE you can specify case comparisons. With that... here is some Boolean logic to take care of the cases you stated. Though, you may need to add additional clauses if you discover some bogus input.
declare #table table (Person varchar(64), is_correct_format varchar(3) default 'NO')
insert into #table (Person)
values
('LowerCase, Here'),
('CORRECTLY, FORMATTED'),
('CORRECTLY,FORMATTEDTWO'),
('ONLY FIRST UPPER, LowerLast'),
('WEGOT, FormaNUMB3RStted'),
('NoComma Formatted'),
('CORRECTLY, TWOCOMMA, A'),
(',COMMA FIRST'),
('COMMA LAST,'),
('SPACE BEFORE COMMA , GOOD'),
(' SPACE AT BEGINNING, GOOD')
update #table
set is_correct_format = 'YES'
where
Person not like '%[^A-Z, ]%' --check for non characters, excluding comma and spaces
and len(replace(Person,' ','')) = len(replace(replace(Person,' ',''),',','')) + 1 --make sure there is only one comma
and charindex(',',Person) <> 1 --make sure the comma isn't at the beginning
and charindex(',',Person) <> len(Person) --make sure the comma isn't at the end
and substring(Person,charindex(',',Person) - 1,1) <> ' ' --make sure there isn't a space before comma
and left(Person,1) <> ' ' --check preceeding spaces
and UPPER(Person) = Person collate Latin1_General_CS_AS --check collation for CI default (only upper cases)
select * from #table
The tsql equivalent could look like this. I'm not vouching for the efficiency of this solution.
declare #table as table(name varchar(20), is_Correct_format varchar(5))
insert into #table(name) Values
('Smith, Jon')
,('se7en, six')
,('Billy bob')
UPDATE #table
SET is_correct_format = 'YES'
WHERE
replace(name, ', ', ',x')
like (replicate('[a-z]', charindex(',', name) - 1)
+ ','
+ replicate('[a-z]', len(name) - charindex(',', name)) )
select * from #table
The optional space is hard to solve, so since it's next to a legal character I'm just replacing with another legal character when it's there.
TSQL does not provide the kind of 'repeating pattern' of * or + in regex, so you have to count the characters and construct the pattern that many times in your search pattern.
I split the string at the comma, counted the alphas before and after, and built a search pattern to match.
Clunky, but doable.

SQL Server not difference between 'ی' and 'ي' in Arabic_CI_AS collation

I'm using ASCII function for getting equivalent ASCII code of two characters, but I'm surprised when seeing there is no difference between 'ي' and 'ی', can anyone help me?
SELECT ASCII('ي'), ASCII('ی')
Because your character is non Unicode you have to use UNICODE() function instead of ASCII() .
SELECT ASCII('ي'), ASCII('ی')
will result: 237, 237
but
SELECT UNICODE(N'ي'), UNICODE(N'ی')
will result: 1610, 1740
Try this
SELECT UNICODE(N'ي'), UNICODE(N'ی')
Another solution by using the proper collate in case you want to use Ascii
Arabic_CS_AS_KS
result will come as ى = 236 and ي= 237
This is a limitation ASCII function. According to the documentation, ASCII:
Returns the ASCII code value of the leftmost character of a character expression.
However, the characters in your question are made up of more than one byte. It appears that ASCII can only read one byte.
When you use these characters as string literals without the N prefix, they are treated as single-byte characters. The following query shows that SQL Server does not treat these characters as equal in the Arabic_CI_AS collation when they are properly marked as multi-byte:
SELECT CASE WHEN 'ي' COLLATE Arabic_CI_AS <> 'ی' COLLATE Arabic_CI_AS
THEN 1 ELSE 0 END AS are_different_ascii,
CASE WHEN N'ي' COLLATE Arabic_CI_AS <> N'ی' COLLATE Arabic_CI_AS
THEN 1 ELSE 0 END AS are_different_unicode
The following query shows the bytes that make up the characters:
SELECT CAST(N'ي' COLLATE Arabic_CI_AS as varbinary(4)),
CAST(N'ی' COLLATE Arabic_CI_AS as varbinary(4)),
CAST('ي' COLLATE Arabic_CI_AS as varbinary(4)),
CAST('ی' COLLATE Arabic_CI_AS as varbinary(4))
However, even when you mark the characters as unicode, the ASCII function returns the same value because it can only read one byte:
SELECT ASCII(N'ي' COLLATE Arabic_CI_AS) , ASCII(N'ی' COLLATE Arabic_CI_AS)
EDIT As TT. points out, these characters don't have an entry in the ASCII code table.
The story becomes interesting when we have the following scripts:
SELECT ASCII('ك'), ASCII('ک');
SELECT
CASE
WHEN 'ك' COLLATE Arabic_CI_AS <> 'ک' COLLATE Arabic_CI_AS
THEN 1 ELSE 0 END AS are_different_ascii,
CASE WHEN N'ك' COLLATE Arabic_CI_AS <> N'ک' COLLATE Arabic_CI_AS
THEN 1 ELSE 0 END AS are_different_unicode;
The letter ک and ك seems to be an exception!
Isn't that so?

Special Character in SQL

I have a problem with a special character inserted in a table of SQL Server 2008 R2.
The point is that when i'm trying to insert a string with the character º (e.g. 3 ELBOW 90º LONG RADIUS) in the table this appears like this: 3 ELBOW 90� LONG RADIUS, and when i'm trying to select all the rows that contains the character � the result is null.
I tried to make the select with ASCII by making this:
select * from itemcode where description like '%'+char(63)+'%'
and make this to know that the ASCII of that symbol is 63:
select ASCII('�')
But that doesn't work.
What i must do to select all the rows that have that character and what should i do to make that SQL recognize the character º?
Thanks
The degree symbol
U+00B0 ° degree sign (HTML: ° °)
is not an ASCII character and generally requires an NVARCHAR column and a N'' string literal. (except for codepages etc that support the symbol)
63 is the code of the question mark, which is the fallback for your inverse question mark in ASCII:
select UNICODE('�') => 63
select UNICODE(N'�') => 65533
where 65533 is the Unicode Replacement Character used to display characters that could not be converted or displayed.
when I run this:
print ascii('º')
I get 186 as the ascii code value, so try:
select * from YourTable Where Description like '%'+char(186)+'%'
to see all the ascii codes run this:
;WITH AllNumbers AS
(
SELECT 1 AS Number
UNION ALL
SELECT Number+1
FROM AllNumbers
WHERE Number<255
)
SELECT Number,CHAR(Number) FROM AllNumbers
OPTION (MAXRECURSION 255)
EDIT op stated in a comment that they are using nvarchar columns.
forger about ascii, use NCHAR (Transact-SQL) to output a degree symbol:
print '32'+NCHAR(176)+'F' --to display it
select * from YourTable
Where Description like '%'+NCHAR(176)+'%' --to select based on it
and use UNICODE (Transact-SQL) to get the value:
print UNICODE('°')
returns:
176
select top 10 * from table_name
where tbl_colmn like N'%'+ NCHAR(65533) + N'%'
the function NCHAR(65533) will return the character your're looking for.
In addition to making sure that it is an NVARCHAR, I would use something like this
select (N'�')
How to display special characters in SQL server 2008?
I know this is old, but recently faced the same problem, and found solution here
"The best way I know of to find it or get rid of it in SQL is to check for it using a binary collation. For example"
Declare #Foo Table(PK int primary key identity, MyData nvarchar(20));
Insert #Foo(MyData) Values (N'abc'), (N'ab�c'), (N'abc�')
Select * From #Foo Where MyData Like N'%�%'
-- Find rows with the character
Select * From #Foo
Where CharIndex(nchar(65533) COLLATE Latin1_General_BIN2, MyData) > 0
-- Update rows replacing character with a !
Update #Foo
set MyData = Replace(MyData, nchar(65533) COLLATE Latin1_General_BIN2, '!')
Select * From #Foo

Resources