it seems that despite the fact that SQL Server does not match on case in a WHERE clause it still honours UPPER/LOWER in a WHERE clause which seems to be quite expensive. Is it possible to instruct SQL Server to disregard UPPER/LOWER in a WHERE clause?
This might seem like a pointless question but it's very nice to be able to write a single query for both Oracle and SQL Server.
Thanks, Jamie
The short answer to your question is no - you can't have SQL server magically ignore function calls in the WHERE clause.
As others have said, the performance issue is caused because, on SQL Server, using a function in the WHERE clause prevents the use of an index and forces a table scan.
To get best performance, you need to maintain two queries, one for each RDBMS platform (either in your application or in database objects like stored procedures or views). Given that so many other areas of functionality differ between Oracle and SQL Server, you're likely to end up doing it anyway, for something else if not for this.
So you mean something like:
WHERE YourColumn = #YourValue collate Latin1_General_BIN
But if you want it to work without the collate keyword, you could just set the collation of the column to something which is case insensitive.
Bear in mind that an index on YourColumn will be using a particular collation, so if you specify the collation in the WHERE clause (rather than on the column itself), an index will be less useful. I liken this to the fact that when I flew in Sweden a few years ago, I couldn't find Vasteras on the map, because the letters I thought were a actually had accents on them and were located at the end of the alphabet. The index in the back of the map wasn't so good when I was trying to use the wrong collation.
Related
I'm using MSSQL 2012,
To handle special character in searching through LINQ, i found to change COLLATE of the column to *_CI_AI, but before changing it i would like to know what and where its impact.
This might be not so easy...
If this column takes part in indexes and constraints you will have to drop them, change the collation and recreate them.
One very painfull point with collations is the fact, that the temp-db uses - by default - the default-collation of the server-instance. We once had a project, where after such a step certain statements ran into errors. This happened, when a Stored Procedure created a #table' and used such a column in any kind of comparison (in WHEREorJOIN`-predicat).
You can type the collation in any statement manually, so it will be possible to get things working, but the effort might be huge...
Some related answers:
https://stackoverflow.com/a/39101572/5089204
https://stackoverflow.com/a/35840417/5089204
UPDATE a list if effects / impacts
sorting might change (a sorted list could appear in a different order)
comparisons will be less restrictive with _CI_AI. "Peter" eq. to "peter". Sometimes this is OK (most of the time actually), but not always (imagine a password). In cases where "Pétè" should be the same as "Pete" this helps...
Joins on string base might join differently (If ProductCode "aBx5" is not the same code as "ABx5")
Check-Constraints might be less restrictive (you force values "A","B" or "C" and suddenly you may insert "a","b" and "c"...)
You might run (this can be very annoying!) into collation errors in connection with temp tables. This can break existing code...
With simple text columns this should be not problem...
Is there any way to speed up queries like this below ? I am looking for option which would require minimal change to application code.
SELECT *
FROM my_table
WHERE some_column like '%my string%'
ORDER BY some_column
The table which causes most of the slowdown has 2,5 million records and query takes 10 seconds to execute.
Execution plan tells that 99% of the cost is index scan (NonClustered), which is understandable because of LIKE and pattern with "%" on both sides.
If there is "%" just at the end, then index seek is used and query executes in a moment.
So I am looking for something like:
to add some kind of aditional index on the table, probably not
possible ?
a way to put this table and/or index into RAM sa the seek would be
faster
anything else ?
I can use either MS SQL 2012 or 2014, both standard edition.
Bonus question
Is it possible that this very same queries would execute instanteniously on DB2 database ? App was using db2 initially but was migrated over to MS SQL.
There may not be an answer, but there is a reason. When you use a search string with a leading wildcard, such as '%string', you're forcing the optimizer to do a table scan.
You might want to revisit some of the suggestions in this thread.
Good luck!
You can try changing the column collation to some binary form:
in query:
SELECT *
FROM my_table
WHERE some_column COLLATE Latin1_General_BIN like '%my string%'
ORDER BY some_column
or change it in the table design permanently if you can.
Caveat: it's cAsE sEnSiTiVe.
Edit: you can get around case sensitivity by converting both the column and the search string to upper case for example:
SELECT *
FROM my_table
WHERE UPPER(some_column) COLLATE Latin1_General_BIN like '%MY STRING%'
ORDER BY some_column
Edit 2: backup the database before doing any perpanent collation changes, I'm not sure how exactly it compares but I think in query it should be ok.
Explation article.
I'm not sure this solution is an option for you as it stores more data in database. it also may increase the time for update/insert, but its an idea anyway. Too long to put in to comment, so don't blame me!
Add a persisted computed column for the some_column with this formula: REVERSE(some_column) to store reverse of the string
Add index on that column
In your query use some_column like 'my string%' or rev_some_column like REVERSE('%my string'). You'd better to replace REVERSE('%my string') with a variable initiated before query.
I think in this case, both likes will use index.
We have a MS SQL Server 2005 installation that connects to an Oracle database through a linked server connection.
Lots of SELECT statements are being performed through a series of OPENQUERY() commands. The WHERE clause in the majority of these statements are against VARCHAR columns.
I've heard that if the WHERE clause is case sensitive, it can have a big impact on performance.
So my question is, how can I make sure that the non-binary string WHERE clauses are being performed in a case insensitive way for maximum performance?
It's actually the other way around:
Case sensitive...
WHERE column = :criteria
...will use index on column directly and perform well.
Case insensitivity typically requires something like this...
WHERE UPPER(column) = UPPER(:criteria)
...which does not use index on column and performs poorly (unless you are careful and create a functional index on UPPER(column)).
I'm not sure whether OPENQUERY() changes anything, but from purely Oracle perspective both case-sensitive and insensitive queries can be made performant, with the insensitive ones requiring special care (functional index).
By default SQL server uses a case insensitive collation where Oracle is case sensitive by default. For searches we normally implement the Upper() comparison to ensure the user has a better search experience.
I've heard that if the WHERE clause is case sensitive, it can have a big impact on performance.
From where did you hear that? Sounds like a myth to me... rather it would be other way around, ie if you'd use something like WHERE lower(field) = 'some str' to achieve case-insentive comparision it would be bad on perfomance. Using case-insensitive collation would probably be significantly faster...
Another important point to consider is do your business rules actually allow case-insensitive comparision.
And last but not least, you should start to optimize when you indeed do have a perfomance problem, not because you heard something...
WHERE LOWER(field_name) = 'field_value'
To make WHERE clause case insensitive, you can use LOWER or UPPER for this purpose.
select * from Table_Name
where lower(Column_Name) = lower('mY Any Value')
OR
select * from Table_Name
where UPPER(Column_Name) = UPPER('mY Any Value')
What are the benefits/drawbacks of using a case insensitive collation in SQL Server (in terms of query performance)?
I have a database that is currently using a case-insensitive collation, and I don't really like it. I would very much like to change it to case sensitive. What should I be aware of when changing the collation?
If you change the collation on the database, you also have to change it on each column individually - they maintain the collation setting that was in force when their table was created.
create database CollTest COLLATE Latin1_General_CI_AI
go
use CollTest
go
create table T1 (
ID int not null,
Val1 varchar(50) not null
)
go
select name,collation_name from sys.columns where name='Val1'
go
alter database CollTest COLLATE Latin1_General_CS_AS
go
select name,collation_name from sys.columns where name='Val1'
go
Result:
name collation_name
---- --------------
Val1 Latin1_General_CI_AI
name collation_name
---- --------------
Val1 Latin1_General_CI_AI
(I added this as a separate answer because its substantially different than my first.)
Ok, found some actual documentation. This MS KB article says that there are performance differences between different collations, but not where you think. The difference is between SQL collations (backward compatible, but not unicode aware) and Windows collations (unicode aware):
Generally, the degree of performance difference between the Windows and the SQL collations will not be significant. The difference only appears if a workload is CPU-bound, rather than being constrained by I/O or by network speed, and most of this CPU burden is caused by the overhead of string manipulation or comparisons performed in SQL Server.
Both SQL and Windows collations have case sensitive and case insensitive versions, so it sounds like that isn't the primary concern.
Another good story "from the trenches" in Dan's excellent article titled "Collation Hell":
I inherited a mixed collation environment with more collations than I can count on one hand. The different collations require workarounds to avoid "cannot resolve collation conflict" errors and those workarounds kill performance due to non-sargable expressions. Dealing with mixed collations is a real pain so I strongly recommend you standardize on a single collation and deviate only after careful forethought.
He concludes:
I personally don't think performance should even be considered in choosing the proper collation. One of the reasons I'm living in collation hell is that my predecessors chose binary collations to eke out every bit of performance for our highly transactional OLTP systems. With the sole exception of a leading wildcard table scan search, I've found no measurable performance difference with our different collations. The real key to performance is query and index tuning rather than collation. If performance is important to you, I recommend you perform a performance test with your actual application queries before you choose a collation on based on performance expectations.
Hope this helps.
I would say the biggest drawback to changing to a case sensitive collation in a production database would be that many, if not most, of your queries would fail because they are currently designed to ignore case.
I've not tried to change collation on an existing datbase, but I suspect it could be quite time consuming to do as well. You probably will have to lock your users out completely while the process happens too. Do not try this unless you have thoroughly tested on dev.
I can't find anything to confirm whether properly constructed queries work faster on a case-sensitive vs case-insensitive database (although I suspect the difference is negligible), but a few things are clear to me:
If your business requirements don't ask for it, you are putting yourself up to a lot of extra work (this is the crux of both HLGEM and Damien_The_Unbeliever's answers).
If your business requirements don't ask for it, you are setting yourself up for a lot of possible errors.
Its way too easy to construct poorly performing queries in a case-insensitive database if a case sensitive lookup is required:
A query like:
... WHERE UPPER(GivenName) = 'PETER'
won't use an index on GivenName. You would think something like:
... WHERE GivenName = 'PETER' COLLATE SQL_Latin1_General_CP1_CS_AS
would work better, and it does. But for maximum performance you'd have to do something like:
... WHERE GivenName = 'PETER' COLLATE SQL_Latin1_General_CP1_CS_AS
AND GivenName LIKE 'PETER'
(see this article for the details)
If you change the database collation but not the server collation (and they then don't match as a result), watch out when using temporary tables. Unless otherwise specified in their CREATE statement, they will use the server's default collation rather than that of the database which may cause JOINs or other comparisons against your DB's columns (assuming they're also changed to the DB's collation, as alluded to by Damien_The_Unbeliever) to fail.
I've a table with a lot of registers (more than 2 million). It's a transaction table but I need a report with a lot of joins. Whats the best practice to index that table because it's consuming too much time.
I'm paging the table using the storedprocedure paging method but I need an index because when I want to export the report I need to get the entire query without pagination and to get the total records I need a select all.
Any help?
The SQL Server 2008 Management Studio query tool, if you turn on "Include Actual Execution Plan", will tell you what indexes a given query needs to run fast. (Assuming there's an obvious missing index that is making the query run unusually slow, that is.)
SQL Server 2008 Management Studio Query Screenshot http://img208.imageshack.us/img208/4108/image4sy8.png
We use this all the time on Stack Overflow.. one of the best features of SQL 2008. It works against older SQL instances as well, just install the SQL 2008 tools and point them at a SQL 2005 instance. Not sure if it works on anything earlier, though.
As others have noted, you can also do this manually, but it takes a bit of trial and error. You'll want indexes on fields that are used in ORDER BY and WHERE clauses.
key fields have to be everithing in
the where clause ???
No, that would be overkill. Indexing a field really only works if a) your WHERE clause is selective enough (that is: only selects out about 1-2% of the values; an index on a "Gender" field which can be only one of two or three possible values is pointless), and b) your WHERE clause doesn't involve function calls or other magic.
In your case, TBL.Status might be a candidate - how many possible values are there? You select the '1' and '2' value - if there are hundreds of possible values, then it's a good choice.
On a side note:
this clause here: (TBL.Login IS NULL AND TBL.Login <> 'dev' ) is pretty pointless - if the value of TBL.login IS NULL, then it's DEFINITELY not 'dev' ..... so just the "IS NULL" will be more than sufficient......
The other field you might want to consider putting an index on is the TBL.Date, since you seem to select a range of dates here - that might be a good choice.
Also, on a general note: whenever possible, DO NOT use a SELECT * FROM ...... to select your fields. This causes a lot of overhead for SQL Server. SPECIFY your columns - and ONLY select those that you REALLY NEED - not just all of them for the heck of it.....
Check your queries, and find which fields are used to match them. Those are usually the best candidates!
SQL Server has a 'Database Engine Tuning Advisor' that could help you. This does not exist for SQL Server Express, but does for all other versions of SQL Server.
Load your query in a query window.
On the menu, click Query -> Analyze Query in Database Engine
Tuning Advisor
The tuning advisor will identify indexes that could be added to your table(s) to improve performance. In my experience, the tuning advisor doesn't always help, but most of the time it does. It's where I suggest you start.
ok this is the query in doing
SELECT
TBL.*
FROM
FOREINGDATABASE..TABLENAME TBL
LEFT JOIN Status S
ON TBL.Status = S.Number
WHERE
(TBL.ID = CASE #Reference WHEN 0 THEN TBL.ID ELSE #Reference END) AND
TBL.Date >= #FechaInicial AND
TBL.Date <= #FechaFinal AND
(TBL.Channel = CASE #Canal WHEN '' THEN TBL.Channel ELSE #Canal END)AND
(TBL.DocType = CASE #TipoDocumento WHEN '' THEN TBL.DocType ELSE #TipoDocumento END)AND
(TBL.Document = CASE #NumDocumento WHEN '' THEN TBL.Document ELSE #NumDocumento END)AND
(TBL.Login = CASE #Login WHEN '' THEN TBL.Login ELSE #Login END)AND
(TBL.Login IS NULL AND TBL.Login <> 'dev' ) AND
TBL.Status IN ('1','2')
key fields have to be everithing in the where clause ???
If I am not mistaken, please correct me if I am, I think you should create non-clustered Index on the fields of the conditions of the where clause. (Maybe this can be useful as a starting point to get some candidates for the indexes).
Good Luck
if an Index Scan instead of a seek is performed, the cause might be that the fields are not in the correct order in the index.
put indexes on all columns that you're joining and filtering on.
the use of indexes is also determined by the selectivity of the indexed column.
the best way would be to show us your query so we can try to improve it.