Suppose I have a T-SQL command with multiple WHERE conditions like this:
SELECT *
FROM TableName
WHERE Column1 NOT LIKE '%exclude%'
AND Column2 > 10
Would the query exclude a row as soon as Column1 was not met or would it still go on to test the next condition for Column2?
I am asking because I want to see if it would be more efficient to swap my conditions around to first test if Column2 > 10 before I run a more time-consuming condition.
Edit: If it matters, Column1 is of type bigint and Column2 is of type ntext
Sql will devise a query plan based on available indexes and statistics. Sql doesn't necessarily have "short-circuit" expression evaluation per se because it is a procedural language but ultimately the query plan will perform short-circuit evaluation.
Swapping the expressions should not affect performance.
As Marc said, swapping columns in where clause will not make any change in performance. Instead, you could look for changing the data type NTEXT into nvarchar(X) where x represents some meaningful data length.
Related
I am looking for a potentially faster way to do this check:
NOT LIKE '%[^0-9]%'
This checks to ensure all characters are numbers (see description of T-SQL pattern)
Is there a faster way to do this in Microsoft SQL Server (T-SQL)?
The full context is as part of a CASE/WHEN statement in the select part of a vary large query:
Select DATEADD(dd, CAST(CASE WHEN a.dateDuration NOT LIKE '%[^0-9]%' THEN a.Duration ELSE 1 END AS INT), a.StartDate) AS 'ourEndDate'
In the above, a is a table alias. The column a.dateDuration is a nullable varchar column. (The real names of entities have been replaced for proprietary reasons).
Indeed, variants of this are repeated in various "UNION ALL" operators, so if it could be made faster it could speed the query considerably.
The NOT LIKE operator is presumably relatively slow.
The version of the underlying database is SQL Server 2012.
In this context performance of LIKE / NOT LIKE operator is almost for sure not a problem. If your query is slow consider first how many rows you are returning and if you are doing full scans on tables for looking interesing rows.
Here it looks like you are only trying to format/adjust your final result - if you consider SQL is too slow here you can do this processing on application server side as this is not a part of fetching data from disk.
If this is subquery please show entire query.
Below is a snippet of code similar to what I am using.
DECLARE
#UserParam = NULL --optional paramater
SELECT
rtrim(item) [aKey]
INTO
#aKeyTable
FROM
myDB.dbo.fnSplit(#UserParam,',')
SELECT
/* Lots of columns, not important to the question */
FROM
myDB.dbo.tableB b
JOIN myDB.dbo.tableC c ON c.cKey = b.bKEY
AND (c.columnA IN
(
SELECT
aKey
FROM
#aKeyTable
)
OR #UserParam IS NULL)
My question is this: How do I remove the subquery to improve performance.
Requirements:
#UserParam is optional
#UserParam can have multiple comma separated parameters
#UserParam has to either match columnA in tableC OR be NULL
Using a WHERE clause isn't an option either, it impacts performance too much as well
I am using SQL Server 2014
UPDATE: My entire query is very long and it takes about 15-20 secs on average to run depending on parameters but according to the Execution Plan this subquery is using 89% of the performance. I had it in a WHERE clause previous to this and the performance was comparable and sometimes slower.
Thanks
Hard to know for sure without a query plan to see; that said, perhaps create an index on column aKey?
Did You consider using TVP? They are best for this purpose. Read also Erland Sommarskog for more details.
It is problematic to combine different cases into into one execution plan. When #UserParam is empty the situation is entirely different than when it is not. You should have execution plan for each case. You can induce an IF and make two queries. For more parameters You would end up with dynamic sql as exponential growth of combinations is not manageable otherwise.
Number of rows for table variable should be estimated by optimizer as 1, leading to index seeks. If the selectivity of parameters is good, this should work.
Here is simpler version of one of the SELECT statement from my procedure:
select
...
from
...
where
((#SearchTextList IS NULL) OR
(SomeColumn IN (SELECT SomeRelatedColumn From #SearchTextListTable)))
#SearchTextList is just a varchar variable that holds a comma-separated list of strings. #SearchTextListTable is single column temp table that holds search text values.
This query takes 30 seconds to complete, which is performance issue in my application.
If I get rid of the first condition (i.e. if I remove OR condition), it takes just ONE second.
select
...
from
...
where
SomeColumn IN (SELECT SomeRelatedColumn From #SearchTextListTable)
Can somebody please explain why this much difference?
What's going on internally in SQL Server engine?
Thanks.
Since you said that the SQL is fast when you don't have the OR specified, I assume the table has index for SomeColumn and the amount of rows in #SearchTextListTable is small. When that is the case, SQL Server can decide to use the index for searching the rows.
If you specify the or clause, and the query is like this:
((#SearchTextList IS NULL) OR
(SomeColumn IN (SELECT SomeRelatedColumn From #SearchTextListTable)))
SQL Server can't create a plan where the index is used because the plans are cached and must be usable also when #SearchTextList is NULL.
There's usually 2 ways to improve this, either use dynamic SQL or recompile the plan for each execution.
To get the plan recompiled, just add option (recompile) to the end of the query. Unless this query is executed really often, that should be an ok solution. The downside is that it causes slightly higher CPU usage because the plans can't be re-used.
The other option is to create dynamic SQL and execute it with sp_executesql. Since in that point you know if #SearchTextList will be NULL, you can just omit the SomeColumn IN ... when it's not needed. Be aware of SQL injection in this case and don't just concatenate the variable values into the SQL string, but use variables in the SQL and give those as parameter for sp_executesql.
If you only have this one column in the SQL, you could also make 2 separate procedures for both options and execute them from the original procedure depending on which is the case.
I have a query:
SELECT
someFields
FROM
someTable
WHERE
cheapLookup=1
AND (CAST(someField as FLOAT)/otherField)<0.9
So, will the CAST and division be performed in the case that cheapLookup is 0? If not, how can I avoid the calculation in this case?
It depends on the query plan, which is determined by the estimated cost of each considered alternative plan that would produce correct results.
If the predicate 'cheapLookup = 1' can use an index, and it is sufficiently selective, SQL Server would likely choose to seek on that index and apply the second predicate as a residual (that is, only evaluating it on rows that are matched by the seeking operation).
On the other hand, if cheapLookup is not the leading key in an index, or if it is not very selective, SQL Server might choose to scan, applying both predicates to every row encountered.
The second predicate will not be chosen for a seeking operation, unless there happens to be an indexed computed column on the whole expression, and using that index turns out to be the cheapest way to execute the whole query. If a suitable index exists, SQL Server would seek on 'second predicate result < 0.9', and apply 'cheapLookup=1' as a residual. There is also the possibility that the indexed computed column has cheapLookup as its second key, which would result in a pure seek, with no residual.
The other thing about the second predicate is that without a computed column (whether or not indexed), SQL Server will have to guess at the selectivity of the expression. With the computed column, the server might be able to create statistics on the expression-result column, which will help the optimizer. Note that a computed column on 'CAST(someField as FLOAT)/otherField' would have to be persisted before it could be indexed or have statistics created on it, because it contains an imprecise data type.
In summary, it's not the complexity of the expression that counts so much as the estimated cost of the whole plan that uses each of the available access methods considered by the optimizer.
SQL is declarative: you tell the database what you want, not how you want it done. The database is entirely free to evaluate lazily or eagerly. In fact, it can evaluate thrice in reverse order for all I know :)
In rare cases, you can improve performance by reframing your query in such a way that it avoids a specific expensive operation. For example, moving the floating point math to a separate query would force lazy evaluation:
declare #t table (id int, someField float, otherField float)
insert #t select id, someField, otherField from someTable
where cheaplLookup <> 1
delete #t where (CAST(someField as FLOAT)/otherField) >= 0.9
insert #t select id, someField, otherField from someTable
where cheaplLookup = 1
In your example, I would expect SQL Server to choose the best way without any hints or tricks.
What you're referring to is short-circuiting, like other languages (e.g. C#) support.
I believe SQL Server can short-circuit but depends on the scenario / what happens in the optimizer so there is certainly not a guarantee that it will. It just might.
Excellent reference on this by Remus Rusanu here: http://rusanu.com/2009/09/13/on-sql-server-boolean-operator-short-circuit/
It depends on how SQL Server optimizes the query, you could run the Query Analyzer to see for your particular case
A sure fire way to optimize would to say
WITH QueryResult AS (
SELECT
someFields
FROM
someTable
WHERE
cheapLookup=1
)
SELECT * FROM QueryResult WHERE (CAST(someField as FLOAT)/otherField)<0.9
What's the efficient way to check for a null or value for a column in SQL query. Consider a sql table table with integer column column which has an index. #value can be some integer or null ex: 16 or null.
Query 1: Not sure, but it seems one should not rely on the short-circuit in SQL. However, below query always works correctly when #value is some integer or null.
select * from
table
where (#value is null or column = #value)
The below query is an expanded version of the above query. It works correctly too.
select * from
table
where ((#value is null)
or (#value is not null and column = #value))
Would the above 2 queries would take the advantage of the index?
Query 2: The below query compares the column with non-null #value else compares the column column with itself which will always be true and returns everything. It works correctly too. Would this query take advantage of the index?
select * from
table
where (column = isnull(#value, column))
What's the best way?
Note: If the answer varies with databases, I'm interested in MS-SQL.
Variations on this question have come up several times in the past couple of days (why do these things always happen in groups?). The short answer is that yes, SQL Server will short-circuit the logic IF it creates the query plan with known values. So, if you have that code in a script where the variables are set then I believe it should short-circuit the logic (test to be sure). However, if it's in a stored procedure then SQL Server will create a query plan ahead of time and it won't know whether or not it can short-circuit the query, because it doesn't know the parameter values at the time of generating the query plan.
Regardless of whether it is short-circuited or not, SQL Server should be able to use the index if that's the only part of your query. If the variable is NULL though, then you probably don't want SQL Server using the index because it will be useless.
If you're in a stored procedure then your best bet is to use OPTION (RECOMPILE) on your query. This will cause SQL Server to create a new query plan each time. This is a little bit of overhead, but the gains typically outweigh that by a lot. This is ONLY good for SQL 2008 and even then only for some of the later service packs. There was a bug with RECOMPILE before that rendering it useless. For more information check out Erland Sommarskog's great article on the subject. Specifically you'll want to look under the Static SQL sections.
To clarify a point, SQL doesn't really have a short circuit as we know it in C-based languages. What looks like a short circuit is really that in SQL Server ternary logic TRUE OR NULL evaluates to TRUE
TRUE OR NULL ===> TRUE
TRUE AND NULL ===> NULL
Eg:
if 1=null or 1=1 print 'true' else print 'false'
if 1=null and 1=1 print 'true' else print 'false'