Accent insensitive collation for ð in sql server - sql-server

I have a dot net application that searches a SQL Server database, I use the collation Latin1_General_CI_AI to make the search query case and accent insensitive, this works 99.9% of the time. Is there a collation that enables me to match with ð? I'm sure there are other characters that don't work so it would be good to use one that covers everything.
select 1 as a where 'ð' = 'o' collate Latin1_General_CI_AI
select 1 as a where 'ó' = 'o' collate Latin1_General_CI_AI
output
a
a
1

Related

Order by multiple column first with alphabet then numbers

trying to convert DB2 query ORDER BY condition into SQL Server
DB2 Query
ORDER BY
CASE WHEN LEN(RTRIM(LTRIM(CorpName))) > 1 THEN CorpVal Else '999' END,
CASE WHEN SUBSTRING(FName,1,1) != '*' THEN FName Else '999' END
SQL Query
ORDER BY
CASE WHEN CorpName like '[a-z]%' THEN 0 ELSE 1 END,
CASE WHEN FName like '[a-z]%' THEN 0 ELSE 1 END
I have the data something like
ABC,
24KS,
ABE,
AJX,
-Extra,
ABF,
1X1
I need the output like below
ABC,
ABE,
ABF,
AJX,
24KS,
1X1,
-Extra
this does not works for me, need some more suggestion.
Ordering is determined by collations in SQL Server and DB2. It seems your iSeries DB2 is configured with an EBCDIC collation so you could add an explict COLLATE clause to the ORDER BY expression to coerce EBCDIC ordering rules for SQL Server since your SQL Server collation is apparently different.
Below is an example of your original DB2 query with the clause added for the SQL Server:
ORDER BY
CASE WHEN LEN(RTRIM(LTRIM(CorpName))) > 1 THEN CorpVal Else '999' END COLLATE SQL_EBCDIC037_CP1_CS_AS,
CASE WHEN SUBSTRING(FName,1,1) != '*' THEN FName Else '999' END COLLATE SQL_EBCDIC037_CP1_CS_AS

Replacing German Characters (Characters with Umlauts)

When I do a simple select like case 1 below, the replace function works as expected.
CASE 1
select replace ('äaü','ä','ae')
RESULT
aeaü
When I do the same on the column in a table it replaces even 'a' with 'ae' which in unexpected.
CASE 2
select replace (column_1,'ä','ae') as actual_text
from table
RESULT
aeaeü
How can I achieve the expected results in case 2?
Thanks #juergen d!
My database default collation was set to Latin1_General_CI_AI earlier.
Now i executed the following statement which gives me expected results
select replace (column_1 collate Latin1_General_CI_AS,'ä','ae') as actual_text
from table
i.e, the collation property has been changed from Accent insensitive to Accent sensitive.
SELECT 'Citroën' COLLATE Ukrainian_CI_AI -> return Citroen

How do I perform an accent insensitive compare in SQL Server for 1250 codepage

There are already sever question and solution on accent insensitive search on stackoverflow, but none of them work for codepage 1250 (Central European and Eastern European languages).
How do I perform an accent insensitive compare (e with è, é, ê and ë) in SQL Server?
LINQ Where Ignore Accentuation and Case
Ignoring accents in SQL Server using LINQ to SQL
Modify search to make it Accent Insensitive in SQL Server
Questions about accent insensitivity in SQL Server (Latin1_General_CI_AS)
The problem is that accent insensitive collation are bidned to some specific codepages and that I am missing accent insensitive collation for 1250 codepage in MSDN documentation.
I need to modify the collation of the column to make Entity Framework working in accent insensitive way.
For example if I change a collation to SQL_LATIN1_GENERAL_CP1_CI_AI, c with accute is select as c without accute (U+0107) because wrong codepage.
How to solve this?
SELECT *
FROM sys.fn_helpcollations()
WHERE COLLATIONPROPERTY(name, 'CodePage') = 1250
AND description LIKE '%accent-insensitive%';
Returns 264 results to choose from.
Picking the first one
SELECT N'è' COLLATE Albanian_CI_AI
UNION
SELECT N'é'
UNION
SELECT N'ê'
UNION
SELECT N'ë'
returns a single row as desired (showing all compared equal)
OK, it seems that the link MSDN documentation is for SQL server 2008 I use SQL Server 2014, but I was not able to find any collation documentation for 2014.
But the solution is to list the collations from server for my code page:
SELECT name, COLLATIONPROPERTY(name, 'CodePage') AS CodePage
FROM fn_helpcollations()
where COLLATIONPROPERTY(name, 'CodePage') = 1250
ORDER BY name;
And I can see there is a undocumented collation Czech_100_CI_AI which works for me. Heureka!

Confused about default string comparison option in SQL Server

I am completely confused about the default string comparison method used in Microsoft SQL Server. Up till now I had been using UPPER() and LOWER() functions for performing any string comparison on Microsoft SQL Server.
However got to know that by default Microsoft SQL Server is case insensitive and we need to change the collation while installing Microsoft SQL Server to make it case sensitive. However if this is the case then what is the use of UPPER and LOWER() functions.
if you like to compare case sensitive string this might be the syntax you looking for
IF #STR1 COLLATE Latin1_General_CS_AS <> #STR2 COLLATE Latin1_General_CS_AS
PRINT 'NOT MATCH'
As you have discovered, upper and lower are only of use in comparisons when you have a case-sensitive collation applied, but that doesn't make them useless.
For example, Upper and Lower can be used for formatting results.
select upper(LicencePlate) from cars
You can apply collations without reinstalling, by applying to a column in the table design, or to specific comparisons ie:
if 'a' = 'A' collate latin1_general_cs_as
select '1'
else
select '2'
if 'a' = 'A' collate latin1_general_ci_as
select '3'
else
select '4'
See http://technet.microsoft.com/en-us/library/aa258272(v=sql.80).aspx

Microsoft SQL Server collation names

Does anybody know what the WS property of a collation does? Does it have anything to do with Asian type of scripts? The MSDN docs explain it to be "Width Sensitive", but that doesn't make any sense for say Swedish, or English...?
A good description of width sensitivity is summarized here: http://www.databasejournal.com/features/mssql/article.php/3302341/SQL-Server-and-Collation.htm
Width sensitivity
When a single-byte character
(half-width) and the same character
when represented as a double-byte
character (full-width) are treated
differently then it is width
sensitive.
Perhaps from an English character perspective, I would theorize that a width-sensitive collation would mean that 'abc' <> N'abc', because one string is a Unicode string (2 bytes per character), whereas the other one byte per character.
From a Latin characterset perspective it seems like something that wouldn't make sense to set. Perhaps in other languages this is important.
I try to set these types of collation properties to insensitive in general in order to avoid weird things like records not getting returned in search results. I usually keep accents set to insensitive, since that can cause a lot of user search headaches, depending on the audience of your applications.
Edit:
After creating a test database with the Latin1_General_CS_AS_WS collation, I found that the N'a' = N'A' is actually true. Test queries were:
select case when 'a' = 'A' then 'yes' else 'no' end
select case when 'a' = 'a' then 'yes' else 'no' end
select case when N'a' = 'a' then 'yes' else 'no' end
So in practice I'm not sure where this type of rule comes into play
The accepted answer demonstrates that it does not come into play for the comparison N'a' = 'a'. This is easily explained because the char will get implicitly converted to nchar in the comparison between the two anyway so both strings in the comparison are Unicode.
I just thought of an example of a place where width sensitivity might be expected to come into play in a Latin Collation only to discover that it appeared to make no difference at all there either...
DECLARE #T TABLE (
a VARCHAR(2) COLLATE Latin1_General_100_CS_AS_WS,
b VARCHAR(2) COLLATE Latin1_General_100_CS_AS_WS )
INSERT INTO #T
VALUES (N'Æ',
N'AE');
SELECT LEN(a) AS [LEN(a)],
LEN(b) AS [LEN(b)],
a,
b,
CASE
WHEN a = b THEN 'Y'
ELSE 'N'
END AS [a=b]
FROM #T
LEN(a) LEN(b) a b a=b
----------- ----------- ---- ---- ----
1 2 Æ AE Y
The Book "Microsoft SQL Server 2008 Internals" has this to say.
Width Sensitivity refers to East Asian
languages for which there exists both
half-width and full-width forms of
some characters.
There is absolutely nothing stopping you storing these characters in a collation such as Latin1_General_100_CS_AS_WS as long as the column has a unicode data type so I guess that the WS part would only apply in that particular situation.

Resources