Risk and effects of changing database collation in SQL Server

Risk and effects of changing database collation in SQL Server - sql-server

Since my SQL Server 2012 instance is using a collation (Latin1_General_CI_AS) different to some DBs in use (SQL_Latin1_General_CP1_CI_AI), I was evaluating possible risks of changing the collation of the databases using a different collation than the SQL Server instance.
I retrieved hundreds of procedure to perform this step. What is not clear for me is to understand if there are some constraints or risk on performing an action such this.
Thanks for any replies.

To understand the risk means you need to understand the difference. We can't tell you the impact in your system. The difference is the new collation would start finding matches where it didn't before.
Consider this query using your current collation. This will not return a row because those two values are not equal.
select 1
where 'e' = 'é' collate Latin1_General_CI_AS
Now since both of those characters are the letter 'e' but with different accents they will be equal when you ignore the accent.
select 1
where 'e' = 'é' collate SQL_Latin1_General_CP1_CI_AI
Again there is no way we can tell what the potential problems might be in your system because we don't know your system.

Related

MsSQL collation advice needed. case-insensitive querys but case-sensitive data needed

I read a lot about "choosing the right collation" and similar stuff, but I still haven't a solution for the following problem:
We got the following constellation:
We switched the database for our application from Advantage Database to MsSQL Server 2014.
All databases have the same collation "Latin1_General_CI_AS" (except of ReportServer and ReportServerTempDB). We chose this collation, because "Latin1_General_CS_AS" would have a big impact on our existing querys because the tablenames in querys would change to case-sensitive.
To retain the data-order of querys we created our string-columns with "Latin1_General_CS_AS".
The problem we actually have is that querys with joins on temptables fail because of collation conflicts. It is clear to me that a TempDB-table has "Latin1_General_CI_AS" as collation and throw an error when it get joined with a string-column of a physical table.
We could solve our problems, if it would be possible to build case-insensitive querys regardless of the database-collation.
The other way around we could solve the problem by changing the collation of the databases to "Latin1_General_CS_AS" and correcting all our tablenames in querys to be the same as the original tablename.
If the first solution i think about is not possible, does someone have a good advice how to solve that collation problem?
The second solution is actually not practicable. We never cared about the capitalization and have simply to much database-interactions as it would worth it to correct it all.
Is there maybe a third or a fourth solution for our problem?
Thx 4 help in advance.

To force the collations to max you can use the COLLATE statement
See: https://msdn.microsoft.com/en-us/library/ms184391.aspx
SELECT
a.col1,b.col2
FROM
table a join table b
ON a.col1 collate databse_default = b.col2 collate databse_default

How to make WHERE clause case insensitive: From SQL Server querying Oracle linked server

We have a MS SQL Server 2005 installation that connects to an Oracle database through a linked server connection.
Lots of SELECT statements are being performed through a series of OPENQUERY() commands. The WHERE clause in the majority of these statements are against VARCHAR columns.
I've heard that if the WHERE clause is case sensitive, it can have a big impact on performance.
So my question is, how can I make sure that the non-binary string WHERE clauses are being performed in a case insensitive way for maximum performance?

It's actually the other way around:
Case sensitive...
WHERE column = :criteria
...will use index on column directly and perform well.
Case insensitivity typically requires something like this...
WHERE UPPER(column) = UPPER(:criteria)
...which does not use index on column and performs poorly (unless you are careful and create a functional index on UPPER(column)).
I'm not sure whether OPENQUERY() changes anything, but from purely Oracle perspective both case-sensitive and insensitive queries can be made performant, with the insensitive ones requiring special care (functional index).

By default SQL server uses a case insensitive collation where Oracle is case sensitive by default. For searches we normally implement the Upper() comparison to ensure the user has a better search experience.

I've heard that if the WHERE clause is case sensitive, it can have a big impact on performance.
From where did you hear that? Sounds like a myth to me... rather it would be other way around, ie if you'd use something like WHERE lower(field) = 'some str' to achieve case-insentive comparision it would be bad on perfomance. Using case-insensitive collation would probably be significantly faster...
Another important point to consider is do your business rules actually allow case-insensitive comparision.
And last but not least, you should start to optimize when you indeed do have a perfomance problem, not because you heard something...

WHERE LOWER(field_name) = 'field_value'

To make WHERE clause case insensitive, you can use LOWER or UPPER for this purpose.
select * from Table_Name
where lower(Column_Name) = lower('mY Any Value')
OR
select * from Table_Name
where UPPER(Column_Name) = UPPER('mY Any Value')

SQL Server case insensitive collation

What are the benefits/drawbacks of using a case insensitive collation in SQL Server (in terms of query performance)?
I have a database that is currently using a case-insensitive collation, and I don't really like it. I would very much like to change it to case sensitive. What should I be aware of when changing the collation?

If you change the collation on the database, you also have to change it on each column individually - they maintain the collation setting that was in force when their table was created.
create database CollTest COLLATE Latin1_General_CI_AI
go
use CollTest
go
create table T1 (
ID int not null,
Val1 varchar(50) not null
)
go
select name,collation_name from sys.columns where name='Val1'
go
alter database CollTest COLLATE Latin1_General_CS_AS
go
select name,collation_name from sys.columns where name='Val1'
go
Result:
name collation_name
---- --------------
Val1 Latin1_General_CI_AI
name collation_name
---- --------------
Val1 Latin1_General_CI_AI

(I added this as a separate answer because its substantially different than my first.)
Ok, found some actual documentation. This MS KB article says that there are performance differences between different collations, but not where you think. The difference is between SQL collations (backward compatible, but not unicode aware) and Windows collations (unicode aware):
Generally, the degree of performance difference between the Windows and the SQL collations will not be significant. The difference only appears if a workload is CPU-bound, rather than being constrained by I/O or by network speed, and most of this CPU burden is caused by the overhead of string manipulation or comparisons performed in SQL Server.
Both SQL and Windows collations have case sensitive and case insensitive versions, so it sounds like that isn't the primary concern.
Another good story "from the trenches" in Dan's excellent article titled "Collation Hell":
I inherited a mixed collation environment with more collations than I can count on one hand. The different collations require workarounds to avoid "cannot resolve collation conflict" errors and those workarounds kill performance due to non-sargable expressions. Dealing with mixed collations is a real pain so I strongly recommend you standardize on a single collation and deviate only after careful forethought.
He concludes:
I personally don't think performance should even be considered in choosing the proper collation. One of the reasons I'm living in collation hell is that my predecessors chose binary collations to eke out every bit of performance for our highly transactional OLTP systems. With the sole exception of a leading wildcard table scan search, I've found no measurable performance difference with our different collations. The real key to performance is query and index tuning rather than collation. If performance is important to you, I recommend you perform a performance test with your actual application queries before you choose a collation on based on performance expectations.
Hope this helps.

I would say the biggest drawback to changing to a case sensitive collation in a production database would be that many, if not most, of your queries would fail because they are currently designed to ignore case.
I've not tried to change collation on an existing datbase, but I suspect it could be quite time consuming to do as well. You probably will have to lock your users out completely while the process happens too. Do not try this unless you have thoroughly tested on dev.

I can't find anything to confirm whether properly constructed queries work faster on a case-sensitive vs case-insensitive database (although I suspect the difference is negligible), but a few things are clear to me:
If your business requirements don't ask for it, you are putting yourself up to a lot of extra work (this is the crux of both HLGEM and Damien_The_Unbeliever's answers).
If your business requirements don't ask for it, you are setting yourself up for a lot of possible errors.
Its way too easy to construct poorly performing queries in a case-insensitive database if a case sensitive lookup is required:
A query like:
... WHERE UPPER(GivenName) = 'PETER'
won't use an index on GivenName. You would think something like:
... WHERE GivenName = 'PETER' COLLATE SQL_Latin1_General_CP1_CS_AS
would work better, and it does. But for maximum performance you'd have to do something like:
... WHERE GivenName = 'PETER' COLLATE SQL_Latin1_General_CP1_CS_AS
AND GivenName LIKE 'PETER'
(see this article for the details)

If you change the database collation but not the server collation (and they then don't match as a result), watch out when using temporary tables. Unless otherwise specified in their CREATE statement, they will use the server's default collation rather than that of the database which may cause JOINs or other comparisons against your DB's columns (assuming they're also changed to the DB's collation, as alluded to by Damien_The_Unbeliever) to fail.

Stop SQL Server Evaluating Useless UPPER/LOWER In WHERE Clause?

it seems that despite the fact that SQL Server does not match on case in a WHERE clause it still honours UPPER/LOWER in a WHERE clause which seems to be quite expensive. Is it possible to instruct SQL Server to disregard UPPER/LOWER in a WHERE clause?
This might seem like a pointless question but it's very nice to be able to write a single query for both Oracle and SQL Server.
Thanks, Jamie

The short answer to your question is no - you can't have SQL server magically ignore function calls in the WHERE clause.
As others have said, the performance issue is caused because, on SQL Server, using a function in the WHERE clause prevents the use of an index and forces a table scan.
To get best performance, you need to maintain two queries, one for each RDBMS platform (either in your application or in database objects like stored procedures or views). Given that so many other areas of functionality differ between Oracle and SQL Server, you're likely to end up doing it anyway, for something else if not for this.

So you mean something like:
WHERE YourColumn = #YourValue collate Latin1_General_BIN
But if you want it to work without the collate keyword, you could just set the collation of the column to something which is case insensitive.
Bear in mind that an index on YourColumn will be using a particular collation, so if you specify the collation in the WHERE clause (rather than on the column itself), an index will be less useful. I liken this to the fact that when I flew in Sweden a few years ago, I couldn't find Vasteras on the map, because the letters I thought were a actually had accents on them and were located at the end of the alphabet. The index in the back of the map wasn't so good when I was trying to use the wrong collation.

Doing a join across two databases with different collations on SQL Server and getting an error

I know, I know with what I wrote in the question I shouldn't be surprised. But my situation is slowly working on an inherited POS system and my predecessor apparently wasn't aware of JOINs so when I looked into one of the internal pages that loads for 60 seconds I see that it's a fairly quick, rewrite these 8 queries as one query with JOINs situation. Problem is that besides not knowing about JOINs he also seems to have had a fetish for multiple databases and surprise, surprise they use different collations. Fact of the matter is we use all "normal" latin characters that English speaking people would consider the entire alphabet and this whole thing will be out of use in a few months so a bandaid is all I need.
Long story short is I need some kind of method to cast to a single collation so I can compare two fields from two databases.
Exact error is:
Cannot resolve the collation conflict
between
"SQL_Latin1_General_CP850_CI_AI" and
"SQL_Latin1_General_CP1_CI_AS" in the
equal to operation.

You can use the collate clause in a query (I can't find my example right now, so my syntax is probably wrong - I hope it points you in the right direction)
select sone_field collate SQL_Latin1_General_CP850_CI_AI
from table_1
inner join table_2
on (table_1.field collate SQL_Latin1_General_CP850_CI_AI = table_2.field)
where whatever

A general purpose way is to coerce the collation to DATABASE_DEFAULT. This removes hardcoding the collation name which could change.
It's also useful for temp table and table variables, and where you may not know the server collation (eg you are a vendor placing your system on the customer's server)
select
sone_field collate DATABASE_DEFAULT
from
table_1
inner join
table_2 on table_1.field collate DATABASE_DEFAULT = table_2.field
where whatever

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight