SQL Server Query Optimization: Where (Col=#Col or #Col=Null)

SQL Server Query Optimization: Where (Col=#Col or #Col=Null) - sql-server

Not sure where to start on this one -- not sure if the problem is that I'm fooling the query optimizer, or if it's something intrinsic to the way indexes work when nulls are involved.
One coding convention I've followed is to code stored procedures like such:
declare procedure SomeProc
#ID int = null
as
select
st.ID,st.Col1,st.Col2
from
SomeTable st
where
(st.ID = #ID or #ID is null) --works, but very slow (relatively)
Not very useful in that simple test case, of course, but useful in other scenarios when you want a stored proc to act on either the entire table OR rows that meet some criteria. However, that's quite slow when used on bigger tables... roughly 3-5x slower than if I replaced the where clause with:
where
st.ID = #ID --3-5x faster than first example
I'm even more puzzled by the fact that replacing the null with -1 gives me nearly the same speed as that "fixed" WHERE clause above:
declare procedure SomeProc
#ID int = -1
as
select
st.ID,st.Col1,st.Col2
from
SomeTable st
where
(st.ID = #ID or #ID=-1) --much better... but why?
Clearly it's the null that's making things wacky but why, exactly? The answer is not clear to me from examining the execution plan. This is something I've noticed over the years on various databases, tables, and editions of SQL Server so I don't think it's a quirk of my current environment. I've resolved the issue by switching the default parameter value from null to -1; my question is why this works.
Notes
SomeTable.ID is indexed
It may be related to (or may, in fact, be) a parameter sniffing issue
Parameter Sniffing (or Spoofing) in SQL Server
For whatever it's worth, I've been
testing almost exclusively with
"exec SomeProc" after each
edit/recompile of the proc, ie, with
the optional parameter omitted.

You have a combination of issues, most likely
Parameter sniffing
OR is not a good operator to use
But without seeing the plans, these are educated guesses.
Parameter sniffing
... of the default "NULL". Try it with different defaults, say -1 or no default.
The #ID = -1 with a default of NULL and parameter sniffing = trivial check, so it's faster.
You could also try OPTIMISE FOR UNKNOWN in SQL Server 2008
The OR operator
Some ideas..
If the columns is not nullable, in most cases the optimiser ignores the condition
st.ID = ISNULL(#ID, st.ID)
Also, you can use IF statement
IF #ID IS NULL
SELECT ... FROM...
ELSE
SELECT ... FROM... WHERE st.ID
Or UNION ALL in a similar fashion.
Personally, I'd use parameter masking (always) and ISNULL in most cases (I'd try it first)
alter procedure SomeProc
#ID int = NULL
AS
declare #maskID int
select #maskID = #ID
...

Related

Multiple repeated IF statements vs single IF with loop in stored procedure

I have following IF statement in my stored procedure:
IF #parameter2 IS NOT NULL
BEGIN
-- delete the existing data
DELETE FROM #tab
-- fetch data to #tab
INSERT INTO #tab EXECUTE sp_executesql #getValueSql, N'#parameter nvarchar(MAX)', #parameter2
SET #value2 = (SELECT * FROM #tab)
IF #value2 = #parameter2
RETURN 5
END
ELSE
RETURN 5
This is to check if the #parameter2 value already exists in the database. Now the trouble I have is that I have to do this for up to 10 parameters. I am wondering If it would be faster to just copy the statement and repeat the code for all the possible parameters. That would mean that I have 10 almost identical IF statements. The other option I see possible is inserting all the parameters to #tab and loop through them with CURSOR or WHILE. I am concerned about the speed of looping because as far as I know they are pretty slow.

As mentioned in comments, it would be about the same.
Indeed cursors are slow. But the reasons why they are slow is not avoided if you type a multitude of IFs in your code. You are essentially using a cursor in an "unrolled form"; The problems stay the same: all execution plans of your procedure will be re-created, and the optimizer will not have the chance to use a better, set-based plan.
So, your options are:
Either use a cursor of a multitude of IFs. For the sake of readability and ease, I would recommend the cursor. The performance will be the same
If possible, re-write the code of #getValueSql to support operating not on one value for #parameterN, but rather on a parameter table which will have N rows, one for each parameter value you're interested in. This is probably hard or impossible to do, but the only way to increase performance.
However, I should also mention that the cursor drawbacks won't be very noticeable on just 10 iterations, except maybe if you have exceptinally complex and nested queries. Remember, "Premature optimization is the root of all evil". The most probable thing is you don't need to worry about this.

DATETIME search predicate on DATETIME column much slower than string literal predicate

I'm doing a search on a large table of about 10 million rows. I want to specify a start and end date and return all records in the table created between those dates.
It's a straight-forward query:
declare #StartDateTime datetime = '2016-06-21',
#EndDateTime datetime = '2016-06-22';
select *
FROM Archive.dbo.Order O WITH (NOLOCK)
where O.Created >= #StartDateTime
AND O.Created < #EndDateTime;
Created is a DATETIME column which has a non-clustered index.
This query took about 15 seconds to complete.
However, if I modify the query slightly, as follows, it takes only 1 second to return the same result:
declare #StartDateTime datetime = '2016-06-21',
#EndDateTime datetime = '2016-06-22';
select *
FROM Archive.dbo.Order O WITH (NOLOCK)
where O.Created >= '2016-06-21'
AND O.Created < #EndDateTime;
The only change is replacing the #StartDateTime search predicate with a string literal. Looking at the execution plan, when I used #StartDateTime it did an index scan but when I used a string literal it did an index seek and was 15 times faster.
Does anyone know why using the string literal is so much faster?
I would have thought doing a comparison between a DATETIME column and a DATETIME variable would be quicker than comparing the column to a string representation of a date. I've tried dropping and recreating the index on the Created column and it made no difference. I notice I get similar results on the production system as I do on the test system so the weird behaviour doesn't seem specific to a particular database or SQL Server instance.

All variables have instances that they are recognized.
In OOP languages, we usually distinguish between static/constant variables from temporary variables by using keywords, or when a variable is called into a function where inside that instance the variable is treated as a constant if the function transforms that variable, such like the following in C++:
void string MyFunction(string& name)
//technically, `&` calls the actual location of the variable
//instead of using a logical representation. The concept is the same.
In SQL Server, the Standard chose to implement it a bit differently. There are no constant data types, so instead we use literals which are either
object names (which have similar precedence in the call as system keywords)
names with an object deliminator (including ', [])
or strings with a deliminator CHAR(39) (').
This is the reason you noticed that the two queries produce different results, because those variables are not constants to the Optimizer, which means SQL Server will already have chosen it's execution path beforehand.
If you have SSMS installed, include the Actual Execution Plan (CTRL + M), and notice in the select statement what the Estimated Rows are. This is the highlight of the execution plan. The greater difference between the Estimated and Actual rows, the more likely your query can use optimization. In your example, SQL Server had to guess how many rows, and ended up overshooting the results, losing efficiency.
The solution is one and the same, but you can still encapsulate everything if you wanted to. We use the AdventureWorks2012 for this example:
1) Declare the Variable in the Procedure
CREATE PROC dbo.TEST1 (#NameStyle INT, #FirstName VARCHAR(50) )
AS
BEGIN
SELECT *
FROM Person.PErson
WHERE FirstName = #FirstName
AND NameStyle = #NameStyle; --namestyle is 0
END
2) Pass the variable into Dynamic SQL
CREATE PROC dbo.TEST2 (#NameStyle INT)
AS
BEGIN
DECLARE #Name NVARCHAR(50) = N'Ken';
DECLARE #String NVARCHAR(MAX)
SET #String =
N'SELECT *
FROM Person.PErson
WHERE FirstName = #Other
AND NameStyle = #NameStyle';
EXEC sp_executesql #String
, N'#Other VARCHAR(50), #NameStyle INT'
, #Other = #Name
, #NameStyle = #NameStyle
END
Both plans will produce the same results. I could have used EXEC by itself, but sp_executesql can cache the entire select statement (plus, its more SQL Injection safe)
Notice how in both cases the level of the instance allowed SQL Server to transform the variable into a constant value (meaning it entered the object with a set value), and then the Optimizer was capable of choosing the most efficient execution plan available.
-- Remove Procs
DROP PROC dbo.TEST1
DROP PROC dbo.TEST2
A great article was highlighted in the comment section of the OP, but you can see it here: Optimizing Variables and Parameters - SQLMAG

TVF is much slower when using parameterized query

I am trying to run an inline TVF as a raw parameterized SQL query.
When I run the following query in SSMS, it takes 2-3 seconds
select * from dbo.history('2/1/15','1/1/15','1/31/15',2,2021,default)
I was able to capture the following query through SQL profiler (parameterized, as generated by Entity framework) and run it in SSMS.
exec sp_executesql N'select * from dbo.history(#First,#DatedStart,#DatedEnd,#Number,#Year,default)',N'#First date,#DatedStart date,#DatedEnd date,#Maturity int,#Number decimal(10,5)',#First='2015-02-01',#DatedStart='2015-01-01',#DatedEnd='2015-01-31',#Year=2021,#Number=2
Running the above query in SSMS takes 1:08 which is around 30x longer than the non parameterized version.
I have tried adding option(recompile) to the end of the parameterized query, but it did absolutely nothing as far as performance. This is clearly an indexing issue to me, but I have no idea how to resolve it.
When looking at the execution plan, it appears that the parameterized version mostly gets mostly hung up on an Eager Spool (46%) and then a Clustered Index scan (30%) which are not present in the execution plan without parameters.
Perhaps there is something I am missing, can someone please point me in the right direction as to how I can get this parameterized query to work properly?
EDIT: Parameterized query execution plan, non-parameterized plan

Maybe it's a parameter sniffing problem.
Try modifying your function so that the parameters are set to local variables, and use the local vars in your SQL instead of the parameters.
So your function would have this structure
CREATE FUNCTION history(
#First Date,
#DatedStart Date,
#DatedEnd Date,
#Maturity int,
#Number decimal(10,5))
RETURNS #table TABLE (
--tabledef
)
AS
BEGIN
Declare #FirstVar Date = #First
Declare #DatedStartVar Date = #DatedStart
Declare #DatedEndVar Date = #DatedEnd
Declare #MaturityVar int = #Maturity
Declare #NumberVar decimal(10,5) = #Number
--SQL Statement which uses the local 'Var' variables and not the parameters
RETURN;
END
;
I've had similar probs in the past where this has been the culprit, and mapping to local variables stops SQL Server from coming up with a dud execution plan.

How does Sql Server compile execution plans for logic flow in stored procedures?

I need to use the same query twice but have a slightly different where clause. I was wondering if it would be efficient to simply call the same stored proc with a bit value, and have an IF... ELSE... statement, deciding on which fields to compare.
Or should I make two stored procs and call each one based on logic in my app?
I'd like to know this more in detail though to understand properly.
How is the execution plan compiled for this? Is there one for each code block in each IF... ELSE...?
Or is it compiled as one big execution plan?

You are right to be concerned about the execution plan being cached.
Martin gives a good example showing that the plan is cached and will be optimized for a certain branch of your logic the first time it is executed.
After the first execution that plan is reused even if you call the stored procedure (sproc) with a different parameter causing your executing flow to choose another branch.
This is very bad and will kill performance. I've seen this happen many times and it takes a while to find the root cause.
The reason behind this is called "Parameter Sniffing" and it is well worth researching.
A common proposed solution (one that I don't advice) is to split up your sproc into a few tiny ones.
If you call a sproc inside a sproc that inner sproc will get an execution plan optimized for the parameter being passed to it.
Splitting up a sproc into a few smaller ones when there is no good reason (a good reason would be modularity) is an ugly workaround. Martin shows that it's possible for a statement to be recompiled by introducing a change to the schema.
I would use OPTION (RECOMPILE) at the end of the statement. This instructs the optimizer to do a statement recompilation taking into account the current value of all variables: not only parameters but local variables are also taken into account which can makes the difference between a good and a bad plan.
To come back to your question of constructing a query with a different where clause according to a parameter. I would use the following pattern:
WHERE
(#parameter1 is null or col1 = #parameter1 )
AND
(#parameter2 is null or col2 = #parameter2 )
...
OPTION (RECOMPILE)
The down side is that the execution plan for this statement is never cached (it doesn't influence caching up to the point of the statement though) which can have an impact if the sproc is executed many time as the compilation time should now be taken into account. Performing a test with production quality data will give you the answer if it's a problem or not.
The upside is that you can code readable and elegant sprocs and not set the optimizer on the wrong foot.
Another option to keep in mind is that you can disable execution plan caching at the sproc level (as opposed to the statement level) level which is less granular and, more importantly, will not take into account the value of local variables when optimizing.
More information at
http://www.sommarskog.se/dyn-search-2005.html
http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/

It is compiled once using the initial value of the parameters passed into the procedure. Though some statements may be subject to deferred compile in which case they will be compiled with whatever the parameter values are when eventually compiled.
You can see this from running the below and looking at the actual execution plans.
CREATE TABLE T
(
C INT
)
INSERT INTO T
SELECT 1 AS C
UNION ALL
SELECT TOP (1000) 2
FROM master..spt_values
UNION ALL
SELECT TOP (1000) 3
FROM master..spt_values
GO
CREATE PROC P #C INT
AS
IF #C = 1
BEGIN
SELECT '1'
FROM T
WHERE C = #C
END
ELSE IF #C = 2
BEGIN
SELECT '2'
FROM T
WHERE C = #C
END
ELSE IF #C = 3
BEGIN
CREATE TABLE #T
(
X INT
)
INSERT INTO #T
VALUES (1)
SELECT '3'
FROM T,
#T
WHERE C = #C
END
GO
EXEC P 1
EXEC P 2
EXEC P 3
DROP PROC P
DROP TABLE T
Running the 2 case shows an estimated number of rows coming from T as 1 not 1000 because that statement was compiled according to the initial parameter value passed in of 1. Running the 3 case gives an accurate estimated count of 1000 as the reference to the (yet to be created) temporary table means that statement was subject to a deferred compile.

What is an alternative to cursors for sql looping?

Using SQL 2005 / 2008
I have to use a forward cursor, but I don't want to suffer poor performance. Is there a faster way I can loop without using cursors?

Here is the example using cursor:
DECLARE #VisitorID int
DECLARE #FirstName varchar(30), #LastName varchar(30)
-- declare cursor called ActiveVisitorCursor
DECLARE ActiveVisitorCursor Cursor FOR
SELECT VisitorID, FirstName, LastName
FROM Visitors
WHERE Active = 1
-- Open the cursor
OPEN ActiveVisitorCursor
-- Fetch the first row of the cursor and assign its values into variables
FETCH NEXT FROM ActiveVisitorCursor INTO #VisitorID, #FirstName, #LastName
-- perform action whilst a row was found
WHILE ##FETCH_STATUS = 0
BEGIN
Exec MyCallingStoredProc #VisitorID, #Forename, #Surname
-- get next row of cursor
FETCH NEXT FROM ActiveVisitorCursor INTO #VisitorID, #FirstName, #LastName
END
-- Close the cursor to release locks
CLOSE ActiveVisitorCursor
-- Free memory used by cursor
DEALLOCATE ActiveVisitorCursor
Now here is the example how can we get same result without using cursor:
/* Here is alternative approach */
-- Create a temporary table, note the IDENTITY
-- column that will be used to loop through
-- the rows of this table
CREATE TABLE #ActiveVisitors (
RowID int IDENTITY(1, 1),
VisitorID int,
FirstName varchar(30),
LastName varchar(30)
)
DECLARE #NumberRecords int, #RowCounter int
DECLARE #VisitorID int, #FirstName varchar(30), #LastName varchar(30)
-- Insert the resultset we want to loop through
-- into the temporary table
INSERT INTO #ActiveVisitors (VisitorID, FirstName, LastName)
SELECT VisitorID, FirstName, LastName
FROM Visitors
WHERE Active = 1
-- Get the number of records in the temporary table
SET #NumberRecords = ##RowCount
--You can use: SET #NumberRecords = SELECT COUNT(*) FROM #ActiveVisitors
SET #RowCounter = 1
-- loop through all records in the temporary table
-- using the WHILE loop construct
WHILE #RowCounter <= #NumberRecords
BEGIN
SELECT #VisitorID = VisitorID, #FirstName = FirstName, #LastName = LastName
FROM #ActiveVisitors
WHERE RowID = #RowCounter
EXEC MyCallingStoredProc #VisitorID, #FirstName, #LastName
SET #RowCounter = #RowCounter + 1
END
-- drop the temporary table
DROP TABLE #ActiveVisitors

"NEVER use Cursors" is a wonderful example of how damaging simple rules can be. Yes, they are easy to communicate, but when we remove the reason for the rule so that we can have an "easy to follow" rule, then most people will just blindly follow the rule without thinking about it, even if following the rule has a negative impact.
Cursors, at least in SQL Server / T-SQL, are greatly misunderstood. It is not accurate to say "Cursors affect performance of SQL". They certainly have a tendency to, but a lot of that has to do with how people use them. When used properly, Cursors are faster, more efficient, and less error-prone than WHILE loops (yes, this is true and has been proven over and over again, regardless of who argues "cursors are evil").
First option is to try to find a set-based approach to the problem.
If logically there is no set-based approach (e.g. needing to call EXEC per each row), and the query for the Cursor is hitting real (non-Temp) Tables, then use the STATIC keyword which will put the results of the SELECT statement into an internal Temporary Table, and hence will not lock the base-tables of the query as you iterate through the results. By default, Cursors are "sensitive" to changes in the underlying Tables of the query and will verify that those records still exist as you call FETCH NEXT (hence a large part of why Cursors are often viewed as being slow). Using STATIC will not help if you need to be sensitive of records that might disappear while processing the result set, but that is a moot point if you are considering converting to a WHILE loop against a Temp Table (since that will also not know of changes to underlying data).
If the query for the cursor is only selecting from temporary tables and/or table variables, then you don't need to prevent locking as you don't have concurrency issues in those cases, in which case you should use FAST_FORWARD instead of STATIC.
I think it also helps to specify the three options of LOCAL READ_ONLY FORWARD_ONLY, unless you specifically need a cursor that is not one or more of those. But I have not tested them to see if they improve performance.
Assuming that the operation is not eligible for being made set-based, then the following options are a good starting point for most operations:
DECLARE [Thing1] CURSOR LOCAL READ_ONLY FORWARD_ONLY STATIC
FOR SELECT columns
FROM Schema.ReadTable(s);
DECLARE [Thing2] CURSOR LOCAL READ_ONLY FORWARD_ONLY FAST_FORWARD
FOR SELECT columns
FROM #TempTable(s) and/or #TableVariables;

You can do a WHILE loop, however you should seek to achieve a more set based operation as anything in SQL that is iterative is subject to performance issues.
http://msdn.microsoft.com/en-us/library/ms178642.aspx

Common Table Expressions would be a good alternative as #Neil suggested. Here's an example from Adventureworks:
WITH cte_PO AS
(
SELECT [LineTotal]
,[ModifiedDate]
FROM [AdventureWorks].[Purchasing].[PurchaseOrderDetail]
),
minmax AS
(
SELECT MIN([LineTotal]) as DayMin
,MAX([LineTotal]) as DayMax
,[ModifiedDate]
FROM cte_PO
GROUP BY [ModifiedDate]
)
SELECT * FROM minmax ORDER BY ModifiedDate
Here's the top few lines of what it returns:
DayMin DayMax ModifiedDate
135.36 8847.30 2001-05-24 00:00:00.000
129.8115 25334.925 2001-06-07 00:00:00.000

Recursive Queries using Common Table Expressions.

I have to use a forward cursor, but I don't want to suffer poor performance. Is there a faster way I can loop without using cursors?
This depends on what you do with the cursor.
Almost everything can be rewritten using set-based operations in which case the loops are performed inside the query plan and since they involve no context switch are much faster.
However, there are some things SQL Server is just not good at, like computing cumulative values or joining on date ranges.
These kinds of queries can be made faster using a CURSOR:
Flattening timespans: SQL Server
But again, this is a quite a rare exception, and normally a set-based way performs better.
If you posted your query, we could probably optimize it and get rid of a CURSOR.

Depending on what you want it for, you may be able to use a tally table.
Jeff Moden has an excellent article on tally tables Here

Don't use a cursor, instead look for a set-based solution. If you can't find a set-based solution... still don't use a cursor! Post details of what you are trying to achieve, someone will be able to find a set-based solution for you.

There may be some scenarios where one can use Tally tables. It could be a good alternative of loop and cusrors but remember it cannot be applied in every case. A well explain case can be found here

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight