MsSQL collation advice needed. case-insensitive querys but case-sensitive data needed

MsSQL collation advice needed. case-insensitive querys but case-sensitive data needed - sql-server

I read a lot about "choosing the right collation" and similar stuff, but I still haven't a solution for the following problem:
We got the following constellation:
We switched the database for our application from Advantage Database to MsSQL Server 2014.
All databases have the same collation "Latin1_General_CI_AS" (except of ReportServer and ReportServerTempDB). We chose this collation, because "Latin1_General_CS_AS" would have a big impact on our existing querys because the tablenames in querys would change to case-sensitive.
To retain the data-order of querys we created our string-columns with "Latin1_General_CS_AS".
The problem we actually have is that querys with joins on temptables fail because of collation conflicts. It is clear to me that a TempDB-table has "Latin1_General_CI_AS" as collation and throw an error when it get joined with a string-column of a physical table.
We could solve our problems, if it would be possible to build case-insensitive querys regardless of the database-collation.
The other way around we could solve the problem by changing the collation of the databases to "Latin1_General_CS_AS" and correcting all our tablenames in querys to be the same as the original tablename.
If the first solution i think about is not possible, does someone have a good advice how to solve that collation problem?
The second solution is actually not practicable. We never cared about the capitalization and have simply to much database-interactions as it would worth it to correct it all.
Is there maybe a third or a fourth solution for our problem?
Thx 4 help in advance.

To force the collations to max you can use the COLLATE statement
See: https://msdn.microsoft.com/en-us/library/ms184391.aspx
SELECT
a.col1,b.col2
FROM
table a join table b
ON a.col1 collate databse_default = b.col2 collate databse_default

Related

Risk and effects of changing database collation in SQL Server

Since my SQL Server 2012 instance is using a collation (Latin1_General_CI_AS) different to some DBs in use (SQL_Latin1_General_CP1_CI_AI), I was evaluating possible risks of changing the collation of the databases using a different collation than the SQL Server instance.
I retrieved hundreds of procedure to perform this step. What is not clear for me is to understand if there are some constraints or risk on performing an action such this.
Thanks for any replies.

To understand the risk means you need to understand the difference. We can't tell you the impact in your system. The difference is the new collation would start finding matches where it didn't before.
Consider this query using your current collation. This will not return a row because those two values are not equal.
select 1
where 'e' = 'é' collate Latin1_General_CI_AS
Now since both of those characters are the letter 'e' but with different accents they will be equal when you ignore the accent.
select 1
where 'e' = 'é' collate SQL_Latin1_General_CP1_CI_AI
Again there is no way we can tell what the potential problems might be in your system because we don't know your system.

SQL Server case insensitive collation

What are the benefits/drawbacks of using a case insensitive collation in SQL Server (in terms of query performance)?
I have a database that is currently using a case-insensitive collation, and I don't really like it. I would very much like to change it to case sensitive. What should I be aware of when changing the collation?

If you change the collation on the database, you also have to change it on each column individually - they maintain the collation setting that was in force when their table was created.
create database CollTest COLLATE Latin1_General_CI_AI
go
use CollTest
go
create table T1 (
ID int not null,
Val1 varchar(50) not null
)
go
select name,collation_name from sys.columns where name='Val1'
go
alter database CollTest COLLATE Latin1_General_CS_AS
go
select name,collation_name from sys.columns where name='Val1'
go
Result:
name collation_name
---- --------------
Val1 Latin1_General_CI_AI
name collation_name
---- --------------
Val1 Latin1_General_CI_AI

(I added this as a separate answer because its substantially different than my first.)
Ok, found some actual documentation. This MS KB article says that there are performance differences between different collations, but not where you think. The difference is between SQL collations (backward compatible, but not unicode aware) and Windows collations (unicode aware):
Generally, the degree of performance difference between the Windows and the SQL collations will not be significant. The difference only appears if a workload is CPU-bound, rather than being constrained by I/O or by network speed, and most of this CPU burden is caused by the overhead of string manipulation or comparisons performed in SQL Server.
Both SQL and Windows collations have case sensitive and case insensitive versions, so it sounds like that isn't the primary concern.
Another good story "from the trenches" in Dan's excellent article titled "Collation Hell":
I inherited a mixed collation environment with more collations than I can count on one hand. The different collations require workarounds to avoid "cannot resolve collation conflict" errors and those workarounds kill performance due to non-sargable expressions. Dealing with mixed collations is a real pain so I strongly recommend you standardize on a single collation and deviate only after careful forethought.
He concludes:
I personally don't think performance should even be considered in choosing the proper collation. One of the reasons I'm living in collation hell is that my predecessors chose binary collations to eke out every bit of performance for our highly transactional OLTP systems. With the sole exception of a leading wildcard table scan search, I've found no measurable performance difference with our different collations. The real key to performance is query and index tuning rather than collation. If performance is important to you, I recommend you perform a performance test with your actual application queries before you choose a collation on based on performance expectations.
Hope this helps.

I would say the biggest drawback to changing to a case sensitive collation in a production database would be that many, if not most, of your queries would fail because they are currently designed to ignore case.
I've not tried to change collation on an existing datbase, but I suspect it could be quite time consuming to do as well. You probably will have to lock your users out completely while the process happens too. Do not try this unless you have thoroughly tested on dev.

I can't find anything to confirm whether properly constructed queries work faster on a case-sensitive vs case-insensitive database (although I suspect the difference is negligible), but a few things are clear to me:
If your business requirements don't ask for it, you are putting yourself up to a lot of extra work (this is the crux of both HLGEM and Damien_The_Unbeliever's answers).
If your business requirements don't ask for it, you are setting yourself up for a lot of possible errors.
Its way too easy to construct poorly performing queries in a case-insensitive database if a case sensitive lookup is required:
A query like:
... WHERE UPPER(GivenName) = 'PETER'
won't use an index on GivenName. You would think something like:
... WHERE GivenName = 'PETER' COLLATE SQL_Latin1_General_CP1_CS_AS
would work better, and it does. But for maximum performance you'd have to do something like:
... WHERE GivenName = 'PETER' COLLATE SQL_Latin1_General_CP1_CS_AS
AND GivenName LIKE 'PETER'
(see this article for the details)

If you change the database collation but not the server collation (and they then don't match as a result), watch out when using temporary tables. Unless otherwise specified in their CREATE statement, they will use the server's default collation rather than that of the database which may cause JOINs or other comparisons against your DB's columns (assuming they're also changed to the DB's collation, as alluded to by Damien_The_Unbeliever) to fail.

Stop SQL Server Evaluating Useless UPPER/LOWER In WHERE Clause?

it seems that despite the fact that SQL Server does not match on case in a WHERE clause it still honours UPPER/LOWER in a WHERE clause which seems to be quite expensive. Is it possible to instruct SQL Server to disregard UPPER/LOWER in a WHERE clause?
This might seem like a pointless question but it's very nice to be able to write a single query for both Oracle and SQL Server.
Thanks, Jamie

The short answer to your question is no - you can't have SQL server magically ignore function calls in the WHERE clause.
As others have said, the performance issue is caused because, on SQL Server, using a function in the WHERE clause prevents the use of an index and forces a table scan.
To get best performance, you need to maintain two queries, one for each RDBMS platform (either in your application or in database objects like stored procedures or views). Given that so many other areas of functionality differ between Oracle and SQL Server, you're likely to end up doing it anyway, for something else if not for this.

So you mean something like:
WHERE YourColumn = #YourValue collate Latin1_General_BIN
But if you want it to work without the collate keyword, you could just set the collation of the column to something which is case insensitive.
Bear in mind that an index on YourColumn will be using a particular collation, so if you specify the collation in the WHERE clause (rather than on the column itself), an index will be less useful. I liken this to the fact that when I flew in Sweden a few years ago, I couldn't find Vasteras on the map, because the letters I thought were a actually had accents on them and were located at the end of the alphabet. The index in the back of the map wasn't so good when I was trying to use the wrong collation.

Doing a join across two databases with different collations on SQL Server and getting an error

I know, I know with what I wrote in the question I shouldn't be surprised. But my situation is slowly working on an inherited POS system and my predecessor apparently wasn't aware of JOINs so when I looked into one of the internal pages that loads for 60 seconds I see that it's a fairly quick, rewrite these 8 queries as one query with JOINs situation. Problem is that besides not knowing about JOINs he also seems to have had a fetish for multiple databases and surprise, surprise they use different collations. Fact of the matter is we use all "normal" latin characters that English speaking people would consider the entire alphabet and this whole thing will be out of use in a few months so a bandaid is all I need.
Long story short is I need some kind of method to cast to a single collation so I can compare two fields from two databases.
Exact error is:
Cannot resolve the collation conflict
between
"SQL_Latin1_General_CP850_CI_AI" and
"SQL_Latin1_General_CP1_CI_AS" in the
equal to operation.

You can use the collate clause in a query (I can't find my example right now, so my syntax is probably wrong - I hope it points you in the right direction)
select sone_field collate SQL_Latin1_General_CP850_CI_AI
from table_1
inner join table_2
on (table_1.field collate SQL_Latin1_General_CP850_CI_AI = table_2.field)
where whatever

A general purpose way is to coerce the collation to DATABASE_DEFAULT. This removes hardcoding the collation name which could change.
It's also useful for temp table and table variables, and where you may not know the server collation (eg you are a vendor placing your system on the customer's server)
select
sone_field collate DATABASE_DEFAULT
from
table_1
inner join
table_2 on table_1.field collate DATABASE_DEFAULT = table_2.field
where whatever

What fields should be indexed on a given table?

I've a table with a lot of registers (more than 2 million). It's a transaction table but I need a report with a lot of joins. Whats the best practice to index that table because it's consuming too much time.
I'm paging the table using the storedprocedure paging method but I need an index because when I want to export the report I need to get the entire query without pagination and to get the total records I need a select all.
Any help?

The SQL Server 2008 Management Studio query tool, if you turn on "Include Actual Execution Plan", will tell you what indexes a given query needs to run fast. (Assuming there's an obvious missing index that is making the query run unusually slow, that is.)
SQL Server 2008 Management Studio Query Screenshot http://img208.imageshack.us/img208/4108/image4sy8.png
We use this all the time on Stack Overflow.. one of the best features of SQL 2008. It works against older SQL instances as well, just install the SQL 2008 tools and point them at a SQL 2005 instance. Not sure if it works on anything earlier, though.
As others have noted, you can also do this manually, but it takes a bit of trial and error. You'll want indexes on fields that are used in ORDER BY and WHERE clauses.

key fields have to be everithing in
the where clause ???
No, that would be overkill. Indexing a field really only works if a) your WHERE clause is selective enough (that is: only selects out about 1-2% of the values; an index on a "Gender" field which can be only one of two or three possible values is pointless), and b) your WHERE clause doesn't involve function calls or other magic.
In your case, TBL.Status might be a candidate - how many possible values are there? You select the '1' and '2' value - if there are hundreds of possible values, then it's a good choice.
On a side note:
this clause here: (TBL.Login IS NULL AND TBL.Login <> 'dev' ) is pretty pointless - if the value of TBL.login IS NULL, then it's DEFINITELY not 'dev' ..... so just the "IS NULL" will be more than sufficient......
The other field you might want to consider putting an index on is the TBL.Date, since you seem to select a range of dates here - that might be a good choice.
Also, on a general note: whenever possible, DO NOT use a SELECT * FROM ...... to select your fields. This causes a lot of overhead for SQL Server. SPECIFY your columns - and ONLY select those that you REALLY NEED - not just all of them for the heck of it.....

Check your queries, and find which fields are used to match them. Those are usually the best candidates!

SQL Server has a 'Database Engine Tuning Advisor' that could help you. This does not exist for SQL Server Express, but does for all other versions of SQL Server.
Load your query in a query window.
On the menu, click Query -> Analyze Query in Database Engine
Tuning Advisor
The tuning advisor will identify indexes that could be added to your table(s) to improve performance. In my experience, the tuning advisor doesn't always help, but most of the time it does. It's where I suggest you start.

ok this is the query in doing
SELECT
TBL.*
FROM
FOREINGDATABASE..TABLENAME TBL
LEFT JOIN Status S
ON TBL.Status = S.Number
WHERE
(TBL.ID = CASE #Reference WHEN 0 THEN TBL.ID ELSE #Reference END) AND
TBL.Date >= #FechaInicial AND
TBL.Date <= #FechaFinal AND
(TBL.Channel = CASE #Canal WHEN '' THEN TBL.Channel ELSE #Canal END)AND
(TBL.DocType = CASE #TipoDocumento WHEN '' THEN TBL.DocType ELSE #TipoDocumento END)AND
(TBL.Document = CASE #NumDocumento WHEN '' THEN TBL.Document ELSE #NumDocumento END)AND
(TBL.Login = CASE #Login WHEN '' THEN TBL.Login ELSE #Login END)AND
(TBL.Login IS NULL AND TBL.Login <> 'dev' ) AND
TBL.Status IN ('1','2')
key fields have to be everithing in the where clause ???

If I am not mistaken, please correct me if I am, I think you should create non-clustered Index on the fields of the conditions of the where clause. (Maybe this can be useful as a starting point to get some candidates for the indexes).
Good Luck

if an Index Scan instead of a seek is performed, the cause might be that the fields are not in the correct order in the index.

put indexes on all columns that you're joining and filtering on.
the use of indexes is also determined by the selectivity of the indexed column.
the best way would be to show us your query so we can try to improve it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight