Confused about default string comparison option in SQL Server - sql-server

I am completely confused about the default string comparison method used in Microsoft SQL Server. Up till now I had been using UPPER() and LOWER() functions for performing any string comparison on Microsoft SQL Server.
However got to know that by default Microsoft SQL Server is case insensitive and we need to change the collation while installing Microsoft SQL Server to make it case sensitive. However if this is the case then what is the use of UPPER and LOWER() functions.

if you like to compare case sensitive string this might be the syntax you looking for
IF #STR1 COLLATE Latin1_General_CS_AS <> #STR2 COLLATE Latin1_General_CS_AS
PRINT 'NOT MATCH'

As you have discovered, upper and lower are only of use in comparisons when you have a case-sensitive collation applied, but that doesn't make them useless.
For example, Upper and Lower can be used for formatting results.
select upper(LicencePlate) from cars
You can apply collations without reinstalling, by applying to a column in the table design, or to specific comparisons ie:
if 'a' = 'A' collate latin1_general_cs_as
select '1'
else
select '2'
if 'a' = 'A' collate latin1_general_ci_as
select '3'
else
select '4'
See http://technet.microsoft.com/en-us/library/aa258272(v=sql.80).aspx

Related

Accent insensitive collation for ð in sql server

I have a dot net application that searches a SQL Server database, I use the collation Latin1_General_CI_AI to make the search query case and accent insensitive, this works 99.9% of the time. Is there a collation that enables me to match with ð? I'm sure there are other characters that don't work so it would be good to use one that covers everything.
select 1 as a where 'ð' = 'o' collate Latin1_General_CI_AI
select 1 as a where 'ó' = 'o' collate Latin1_General_CI_AI
output
a
a
1

Order by multiple column first with alphabet then numbers

trying to convert DB2 query ORDER BY condition into SQL Server
DB2 Query
ORDER BY
CASE WHEN LEN(RTRIM(LTRIM(CorpName))) > 1 THEN CorpVal Else '999' END,
CASE WHEN SUBSTRING(FName,1,1) != '*' THEN FName Else '999' END
SQL Query
ORDER BY
CASE WHEN CorpName like '[a-z]%' THEN 0 ELSE 1 END,
CASE WHEN FName like '[a-z]%' THEN 0 ELSE 1 END
I have the data something like
ABC,
24KS,
ABE,
AJX,
-Extra,
ABF,
1X1
I need the output like below
ABC,
ABE,
ABF,
AJX,
24KS,
1X1,
-Extra
this does not works for me, need some more suggestion.
Ordering is determined by collations in SQL Server and DB2. It seems your iSeries DB2 is configured with an EBCDIC collation so you could add an explict COLLATE clause to the ORDER BY expression to coerce EBCDIC ordering rules for SQL Server since your SQL Server collation is apparently different.
Below is an example of your original DB2 query with the clause added for the SQL Server:
ORDER BY
CASE WHEN LEN(RTRIM(LTRIM(CorpName))) > 1 THEN CorpVal Else '999' END COLLATE SQL_EBCDIC037_CP1_CS_AS,
CASE WHEN SUBSTRING(FName,1,1) != '*' THEN FName Else '999' END COLLATE SQL_EBCDIC037_CP1_CS_AS

Does Snowflake support case insensitive where clause filter similar to SQL Server

I am migrating SQL code to snowflake and during migration i found that by default snowflake is comparing varchar field (ex. select 1 where 'Hello' = 'hello') incorrectly. To solve this problem i set collation 'en-ci' at account level. However now i am not able to use REPLACE like crucial function.
Is it possible in snowflake to do case insensitive varchar comparison (without mentioning collation explicitly or using UPPER function every time) and still use replace function?
I will appreciate any help.
Thanks,
you can compare the test with a case intensive match via ILIKE
select 1 where 'Hello' ilike 'hello'
regexp_replace is your friend, it allows for 'i' parameter that stands for "ignore case":
https://docs.snowflake.com/en/sql-reference/functions/regexp_replace.html
For example you can do something like that:
select regexp_replace('cats are grey, cAts are Cats','cats','dogs',1,0,'i');
I've assumed the default values for position and occurrence but those can also be adjusted
And you can still do comparison (also based on regexp, aka "RLIKE"):
https://docs.snowflake.com/en/sql-reference/functions/rlike.html
Snowflake supports COLLATE:
SELECT 1 WHERE 'Hello' = 'hello' COLLATE 'en-ci';
-- 1
SELECT 'Hello' = 'hello'
,'Hello' = 'hello' COLLATE 'en-ci';
Output:
The collation could be setup at account/database/schema/table level with parameter DEFAULT_DDL_COLLATION:
Sets the default collation used for the following DDL operations:
CREATE TABLE
ALTER TABLE … ADD COLUMN
Setting this parameter forces all subsequently-created columns in the affected objects (table, schema, database, or account) to have the specified collation as the default, unless the collation for the column is explicitly defined in the DDL.

SQL Server 2014 Case Sensitivity issue

I am migrating a database and etl from MySQl to SQL Server and have hit a case sensitivity issue.
In MySql the DB is setup as case sensitive because one of the applications we are loading from has codes like 'Divh' and 'divh' in the one set (its not my doing)
all is well and the select statements all over the place in etl, queries reports etc have all used whatever the author wanted regarding case - some are all UPPER some all lower most mixed.
So, in other words MYSql has case-insensitive DDL and SQL but allows case sensitive data.
It doesn't look like SQL Server can accommodate this. If I choose a CI collation all the tables and columns are insensitive along with the data (presumably).
and the converse - If its CS everything is case-sensitive.
Am I reading it right ?
If so then I either have to change the collation of every text column in the DB
OR edit each and every query.
Ironically the 1st test was to an Azure SQL Database which was set up with the same collation (SQL_Latin1_General_CP1_CS_AS)
and it doesn't care about the case of the table name in a select.
Any ideas ?
Thanks
JC
Firstly are you aware that collation settings exist at every level in SQL Server; Instance, database, table and even field level.
It sounds like you just want to enforce the case sensitive collation for the affected fields leaving the database and DDL as a whole case insensitive.
Another trick i've used in the past is to cast values to a VARBINARY data type if you want to do data comparisions between different cases, but without the need to change the collation of anything.
For example:
DECLARE #Var1 VARCHAR(5)
DECLARE #Var2 VARCHAR(5)
SET #Var1 = 'Divh'
SET #Var2 = 'divh'
--Comparison1:
IF #Var1 = #Var2
PRINT 'Same'
ELSE
PRINT 'Not the same'
--Comparison2:
IF CAST(#Var1 AS VARBINARY) = CAST(#Var2 AS VARBINARY)
PRINT 'Same'
ELSE
PRINT 'Not the same'

Microsoft SQL Server collation names

Does anybody know what the WS property of a collation does? Does it have anything to do with Asian type of scripts? The MSDN docs explain it to be "Width Sensitive", but that doesn't make any sense for say Swedish, or English...?
A good description of width sensitivity is summarized here: http://www.databasejournal.com/features/mssql/article.php/3302341/SQL-Server-and-Collation.htm
Width sensitivity
When a single-byte character
(half-width) and the same character
when represented as a double-byte
character (full-width) are treated
differently then it is width
sensitive.
Perhaps from an English character perspective, I would theorize that a width-sensitive collation would mean that 'abc' <> N'abc', because one string is a Unicode string (2 bytes per character), whereas the other one byte per character.
From a Latin characterset perspective it seems like something that wouldn't make sense to set. Perhaps in other languages this is important.
I try to set these types of collation properties to insensitive in general in order to avoid weird things like records not getting returned in search results. I usually keep accents set to insensitive, since that can cause a lot of user search headaches, depending on the audience of your applications.
Edit:
After creating a test database with the Latin1_General_CS_AS_WS collation, I found that the N'a' = N'A' is actually true. Test queries were:
select case when 'a' = 'A' then 'yes' else 'no' end
select case when 'a' = 'a' then 'yes' else 'no' end
select case when N'a' = 'a' then 'yes' else 'no' end
So in practice I'm not sure where this type of rule comes into play
The accepted answer demonstrates that it does not come into play for the comparison N'a' = 'a'. This is easily explained because the char will get implicitly converted to nchar in the comparison between the two anyway so both strings in the comparison are Unicode.
I just thought of an example of a place where width sensitivity might be expected to come into play in a Latin Collation only to discover that it appeared to make no difference at all there either...
DECLARE #T TABLE (
a VARCHAR(2) COLLATE Latin1_General_100_CS_AS_WS,
b VARCHAR(2) COLLATE Latin1_General_100_CS_AS_WS )
INSERT INTO #T
VALUES (N'Æ',
N'AE');
SELECT LEN(a) AS [LEN(a)],
LEN(b) AS [LEN(b)],
a,
b,
CASE
WHEN a = b THEN 'Y'
ELSE 'N'
END AS [a=b]
FROM #T
LEN(a) LEN(b) a b a=b
----------- ----------- ---- ---- ----
1 2 Æ AE Y
The Book "Microsoft SQL Server 2008 Internals" has this to say.
Width Sensitivity refers to East Asian
languages for which there exists both
half-width and full-width forms of
some characters.
There is absolutely nothing stopping you storing these characters in a collation such as Latin1_General_100_CS_AS_WS as long as the column has a unicode data type so I guess that the WS part would only apply in that particular situation.

Resources