DECIMAL Index on CHAR column - sql-server

I am dealing with some legacy code that looks like this:
Declare #PolicyId int
;
Select top 1 #PolicyId = policyid from policytab
;
Select col002
From SOMETAB
Where (cast(Col001 as int) = #PolicyId)
;
The code is actually in a loop, but the problem is the same. col001 is a CHAR(10)
Is there a way to specify an index on SOMETAB.Col001 that would speed up this code?
Are there other ways to speed up this code without modifying it?
The context of that question is that I am guessing that a simple index on col001 will not speed up the code because the select statement doing a cast on the column.
I am looking for a solution that does not involve changing this code because this technique was used on several tables and in several scripts for each table.
Once I determine that it is hopeless to speed up this code without changing it I have several options. I am bringing that so this post can stay on the topic of speeding up the code without changing the code.

Shortcut to hopeless if you can not change the code, (cast(Col001 as int) = #PolicyId) is not SARGable.
Sargable
SARGable functions in SQL Server - Rob Farley
SARGable expressions and performance - Daniel Hutmachier
After shortcut, avoid loops when possible and keep your search arguments SARGable. Indexed persisted computed columns are an option if you must maintain the char column and must compare to an integer.

If you cannot change the table structure, cast your parameter to the data type you are searching on in your select statement.
Cast(#PolicyId as char(10)) . This is a code change, and a good place to start looking if you decide to change code based on sqlZim's answer.
Zim's advice is excellent, and searching on int will always be faster than char. But, you may find this method an acceptable alternative to any schema changes.
Is policy stored as an int in PolicyTab and char in SomeTab for certain?

Related

Is it possible to make a substring out of a Guid in T-SQL?

I have to truncate the first couple of digits out of a Guid on a table. Is it possible to do it only using a SQL script? Or I have to do it programatically?
Thanks!
To answer the direct question at hand (and assume the column's name is foo):
foo is uniqueidentifier:
SELECT substring(convert(nvarchar(50), foo), 3)
foo is simply a string:
SELECT substring(foo, 3)
3 is just an arbitrary starting offset to remove the first "few" characters.
With that said, this sounds like of like an XY problem. If you're running into an issue where you feel you need to truncate the first few characters, it would be important to list that information in your question as well as what you've described sounds like an odd request. However, you're also entitled to have odd requests.
The previous answer was perfectly good. Another option is to use the wonderful RIGHT function. Assuming the Guid is a a uniqueidentifier, then it has 36 characters in it. Just use RIGHT(theguid, 34), e.g.
declare #temp as uniqueidentifier = newid();
select right(#temp, 34);

T-SQL view is "optimized" but I don't understand why

I'm struggling with some sort of auto-optimization when creating a view in TSQL.
The optimization is done both when using the Designer and when using CREATE VIEW.
The output is equal, but I don't understand why it's done.
Can anyone please explain me why this is optimized / why the lower one is better:
[...]
WHERE
today.statusId = 7
AND yesterday.cardId IS NULL
AND NOT (
today.name LIKE 'TEST_%'
AND today.department LIKE 'T_%'
)
gets optimized into the following
[...]
WHERE (
today.statusId = 7
AND yesterday.cardId IS NULL
AND NOT (today.name LIKE 'TEST_%')
)
OR (
today.statusId = 7
AND yesterday.cardId IS NULL
AND NOT (today.department LIKE 'T_%')
)
Isn't the second where clause forcing the view to check statusId and cardId two times no matter of their value? While the first allows it to abort as soon as statusId is 6 (e.g.)
Also in the first one the parentheses part can abort as soon as one value is FALSE.
This behavior also does not change when the inner parentheses contain like 20 values. The optimizer will create 20 blocks checking over and over again the values of statusId and cardId...
Thanks in advance.
The visual designers do not try to optimize your code ever.
Don't use them unless you are prepared to put up with them mangling your formatting and rewriting your queries in this manner.
SQL Server does not guarantee short circuit evaluation in any case but certainly this type of rewrite can act as a pessimisization. Especially if one of the predicates involves a sub query the rewritten form could end up significantly more expensive.
Presumably the reason for the rewrite is just so it can present them easily in a grid and allow editing of points at individual grid intersections - or so you can select the whole column and delete.

Issue with join: Error converting data type nvarchar to float?

I've been trying to run the program below and I keep on getting the error
Error converting data type nvarchar to float
SQL:
SELECT
distinct
coalesce(a.File_NBR,b.File_NBR) as ID,
b.Division,
b.Program,
a.Full_Name,
a.SBC_RESULT
FROM
New_EEs.dbo.vw_SBC_RESULTS a
full join
New_EEs.dbo.vw_SBC_Employee_Info b on a.File_NBR = b.File_NBR
where
(a.File_NBR is not null OR b.File_NBR is not null)
and A.Full_Name is not null
order by
a.Full_Name, b.Division, b.Program
When I comment out /*and A.Full_Name is not null */ the program works.
I can't figure out what the error means and why the join works when I comment out /*and A.Full_Name is not null */
Any feedback is appreciated.
Thanks!
The error message clearly says that the issue has to do with conversion of an nvarchar to a float. There's no explicit conversion in your query, therefore it's about implicit one. If the issue indeed stems from this particular query and not from somewhere else, only two places can be responsible for that:
1) the join predicate;
2) the COALESCE call.
Both places involve one and the same pair of columns, a.File_NBR and b.File_NBR. So, one of them must be an nvarchar column and the other a float one. Since the float type has higher precedence than nvarchar, the latter would implicitly be converted to the former, not the other way around. And apparently one of the string values failed to convert. That's the explanation of the immediate issue (the conversion).
I've seen your comment where you are saying that one of the columns is an int and the other float. I have no problem with that as I believe you are talking about columns in physical tables whereas both sources in this query appear to be views, judging by their names. I believe one of the columns enjoys a transformation to nvarchar in a view, and this query ends up seeing it as such. So, that should account for you where the nvarchar can come from.
As for an explanation to why commenting a seemingly irrelevant condition out appears to make such a big difference, the answer must lie in the internals of the query planner's workings. While there's a documented order of logical evaluation of clauses in a Transact-SQL SELECT query, the real, physical order may differ from that. The actual plan chosen for the query determines that physical order. And the choice of a plan can be affected, in particular, by such a trivial thing as incorporation or elimination of a simple condition.
To apply that to your situation, when the offending condition is commented out, the planner chooses such a plan for the query that both the join predicate and the COALESCE expression evaluate only when all the rows capable of causing the issue in question have been filtered out by predicates in the underlying views. When the condition is put back, however, the query is assigned a different execution plan, and either COALESCE or (more likely) the join predicate ends up being applied to a row containing a string that cannot be converted to a float, which results in an exception raised.
Conversion of both a.File_NBR and b.File_NBR to char, as you did, is one way of solving the issue. You could in fact pick any of these four string types:
char
varchar
nchar
nvarchar
And since one of the columns is already a string (possibly the a.File_NBR one, but you are at a better position to find that out exactly), the conversion could be applied to the other one only.
Alternatively, you could look into the view producing the nvarchar column to try and see if the int to nvarchar conversion could be eliminated in the first place.
Please see this example, maybe will be useful.
CREATE TABLE TEST(
ID FLOAT)
INSERT INTO TEST(ID) VALUES(NULL)
INSERT INTO TEST(ID) VALUES(12.3)
--SELECT COALESCE(ID,'TEST') FROM TEST; NOT WORKING ERROR:Error converting data type nvarchar to float
SELECT COALESCE(CAST(ID AS VARCHAR),'TEST') FROM TEST; --WORKS

T-SQL Where Clause Case Statement Optimization (optional parameters to StoredProc)

I've been battling this one for a while now. I have a stored proc that takes in 3 parameters that are used to filter. If a specific value is passed in, I want to filter on that. If -1 is passed in, give me all.
I've tried it the following two ways:
First way:
SELECT field1, field2...etc
FROM my_view
WHERE
parm1 = CASE WHEN #PARM1= -1 THEN parm1 ELSE #PARM1 END
AND parm2 = CASE WHEN #PARM2 = -1 THEN parm2 ELSE #PARM2 END
AND parm3 = CASE WHEN #PARM3 = -1 THEN parm3 ELSE #PARM3 END
Second Way:
SELECT field1, field2...etc
FROM my_view
WHERE
(#PARM1 = -1 OR parm1 = #PARM1)
AND (#PARM2 = -1 OR parm2 = #PARM2)
AND (#PARM3 = -1 OR parm3 = #PARM3)
I read somewhere that the second way will short circuit and never eval the second part if true. My DBA said it forces a table scan. I have not verified this, but it seems to run slower on some cases.
The main table that this view selects from has somewhere around 1.5 million records, and the view proceeds to join on about 15 other tables to gather a bunch of other information.
Both of these methods are slow...taking me from instant to anywhere from 2-40 seconds, which in my situation is completely unacceptable.
Is there a better way that doesn't involve breaking it down into each separate case of specific vs -1 ?
Any help is appreciated. Thanks.
I read somewhere that the second way will short circuit and never eval the second part if true. My DBA said it forces a table scan.
You read wrong; it will not short circuit. Your DBA is right; it will not play well with the query optimizer and likely force a table scan.
The first option is about as good as it gets. Your options to improve things are dynamic sql or a long stored procedure with every possible combination of filter columns so you get independent query plans. You might also try using the "WITH RECOMPILE" option, but I don't think it will help you.
if you are running SQL Server 2005 or above you can use IFs to make multiple version of the query with the proper WHERE so an index can be used. Each query plan will be placed in the query cache.
also, here is a very comprehensive article on this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
If you pass in a null value when you want everything, then you can write your where clause as
Where colName = IsNull(#Paramater, ColName)
This is basically same as your first method... it will work as long as the column itself is not nullable... Null values IN the column will mess it up slightly.
The only approach to speed it up is to add an index on the column being filtered on in the Where clause. Is there one already? If not, that will result in a dramatic improvement.
No other way I can think of then doing:
WHERE
(MyCase IS NULL OR MyCase = #MyCaseParameter)
AND ....
The second one is more simpler and readable to ther developers if you ask me.
SQL 2008 and later make some improvements to optimization for things like (MyCase IS NULL OR MyCase = #MyCaseParameter) AND ....
If you can upgrade, and if you add an OPTION (RECOMPILE) to get decent perf for all possible param combinations (this is a situation where there is no single plan that is good for all possible param combinations), you may find that this performs well.
http://blogs.msdn.com/b/bartd/archive/2009/05/03/sometimes-the-simplest-solution-isn-t-the-best-solution-the-all-in-one-search-query.aspx

What makes a SQL statement sargable?

By definition (at least from what I've seen) sargable means that a query is capable of having the query engine optimize the execution plan that the query uses. I've tried looking up the answers, but there doesn't seem to be a lot on the subject matter. So the question is, what does or doesn't make an SQL query sargable? Any documentation would be greatly appreciated.
For reference: Sargable
The most common thing that will make a query non-sargable is to include a field inside a function in the where clause:
SELECT ... FROM ...
WHERE Year(myDate) = 2008
The SQL optimizer can't use an index on myDate, even if one exists. It will literally have to evaluate this function for every row of the table. Much better to use:
WHERE myDate >= '01-01-2008' AND myDate < '01-01-2009'
Some other examples:
Bad: Select ... WHERE isNull(FullName,'Ed Jones') = 'Ed Jones'
Fixed: Select ... WHERE ((FullName = 'Ed Jones') OR (FullName IS NULL))
Bad: Select ... WHERE SUBSTRING(DealerName,4) = 'Ford'
Fixed: Select ... WHERE DealerName Like 'Ford%'
Bad: Select ... WHERE DateDiff(mm,OrderDate,GetDate()) >= 30
Fixed: Select ... WHERE OrderDate < DateAdd(mm,-30,GetDate())
Don't do this:
WHERE Field LIKE '%blah%'
That causes a table/index scan, because the LIKE value begins with a wildcard character.
Don't do this:
WHERE FUNCTION(Field) = 'BLAH'
That causes a table/index scan.
The database server will have to evaluate FUNCTION() against every row in the table and then compare it to 'BLAH'.
If possible, do it in reverse:
WHERE Field = INVERSE_FUNCTION('BLAH')
This will run INVERSE_FUNCTION() against the parameter once and will still allow use of the index.
In this answer I assume the database has sufficient covering indexes. There are enough questions about this topic.
A lot of the times the sargability of a query is determined by the tipping point of the related indexes. The tipping point defines the difference between seeking and scanning an index while joining one table or result set onto another. One seek is of course much faster than scanning a whole table, but when you have to seek a lot of rows, a scan could make more sense.
So among other things a SQL statement is more sargable when the optimizer expects the number of resulting rows of one table to be less than the tipping point of a possible index on the next table.
You can find a detailed post and example here.
For an operation to be considered sargable, it is not sufficient for it to just be able to use an existing index. In the example above, adding a function call against an indexed column in the where clause, would still most likely take some advantage of the defined index. It will "scan" aka retrieve all values from that column (index) and then eliminate the ones that do not match to the filter value provided. It is still not efficient enough for tables with high number of rows.
What really defines sargability is the query ability to traverse the b-tree index using the binary search method that relies on half-set elimination for the sorted items array. In SQL, it would be displayed on the execution plan as a "index seek".

Resources