The question from quite a long time boiling in my head, that out of the following two stored procedures which one would perform better.
Proc 1
CREATE PROCEDURE GetEmployeeDetails #EmployeeId uniqueidentifier,
#IncludeDepartmentInfo bit
AS
BEGIN
SELECT * FROM Employees
WHERE Employees.EmployeeId = #EmployeeId
IF (#IncludeDepartmentInfo = 1)
BEGIN
SELECT Departments.* FROM Departments, Employees
WHERE Departments.DepartmentId = Employees.DepartmentId
AND Employees.EmployeeId = #EmployeeId
END
END
Proc 2
CREATE PROCEDURE GetEmployeeDetails #EmployeeId uniqueidentifier,
#IncludeDepartmentInfo bit
AS
BEGIN
SELECT * FROM Employees
WHERE Employees.EmployeeId = #EmployeeId
SELECT Departments.* FROM Departments, Employees
WHERE Departments.DepartmentId = Employees.DepartmentId
AND Employees.EmployeeId = #EmployeeId
AND #IncludeDepartmentInfo = 1
END
the only difference between the two is use of 'if statment'.
if proc 1/proc 2 are called with alternating values of #IncludeDepartmentInfo then from my understanding proc 2 would perform better, because it will retain the same query plan irrespective of the value of #IncludeDepartmentInfo, whereas proc1 will change query plan in each call
answers are really appericated
PS: this is just a scenario, please don't go to the explicit query results but the essence of example. I am really particular about the query optimizer result (in both cases of 'if and where' and their difference), there are many aspects which I know could affect the performance which I want to avoid in this question.
SELECT Departments.* FROM Departments, Employees
WHERE Departments.DepartmentId = Employees.DepartmentId
AND Employees.EmployeeId = #EmployeeId
AND #IncludeDepartmentInfo = 1
When SQL compiles a query like this it must be compiled for any value of #IncludeDepartmentInfo. The resulted plan can well be one that scans the tables and performs the join and after that checks the variable, resulting in unnecessary I/O. The optimizer may be smart and move the check for the variable ahead of the actual I/O operations in the execution plan, but this is never guaranteed. This is why I always recommend to use explicit IFs in the T-SQL for queries that need to perform very differently based on a variable value (the typical example being OR conditions).
gbn's observation is also an important one: from an API design point of view is better to have a consistent return type (ie. always return the same shaped and number of result sets).
From a consistency perspective, number 2 will always return 2 datasets. Overloading aside, you wouldn't have a client code method that may be returns a result, maybe not.
If you reuse this code, the other calling client will have to know this flag too.
If the code does 2 different things, then why not 2 different stored procs?
Finally, it's far better practice to use modern JOIN syntax and separate joining from filtering. In this case, personally I'd use EXISTS too.
SELECT
D.*
FROM
Departments D
JOIN
Employees E ON D.DepartmentId = E.DepartmentId
WHERE
E.EmployeeId = #EmployeeId
AND
#IncludeDepartmentInfo = 1
When you use the 'if' statement, you may run only one query instead of two. I would think that one query would almost always be faster than two. Your point about query plans may be valid if the first query were complex and took a long time to run, and the second was trivial. However, the first query looks like it retrieves a single row based on a primary key - probably pretty fast every time. So, I would keep the 'if' - but I would test to verify.
The performance difference would be too small for anyone to notice.
Premature optimization is the root of all evil. Stop worrying about performance and start implementing features that make your customers smile.
Related
At first I thought I should go see my Dr. and have an MRI of my brain. But, the results I am seeing were confirmed by a second programmer. Running the same exact SQL is giving me different results at different times! It makes no sense. Here is what I am running (table names changed to protect the innocent)...
declare #customerid int;
declare #locationid int;
declare #reportdate datetime;
set #customerid=2063;
set #locationid=101;
set #reportdate=getdate();
select i.InvoiceDate as [Date], ... from Invoice i (nolock) ...
UNION ALL select ... from Remit e (nolock) ...
UNION ALL select ... from [Rec] r (nolock) join TrTypes t (nolock) on ...
The two select/union all's are identical but sometimes the second does not return records from the Remit table when the first shows them. And... Sometimes the first doesn't pull them either. I put these queries in SQL Server Management Studio and keep hitting F5... Sometime I get proper results, sometimes I don't.
Further, when I remove the UNION ALLs and just run the queries, nothing is returned from Remit.
Could this be a locking issue?
This query is from a C# program that produces a detail report and it never returns the Remit records. There is another program that creates summary records from the Remit table and it always returns the records from a similar query... That is why I had to look at the detail program/report.
Thoughts?
NOLOCK means you don't care about consistency, concurrency and locking/blocking. It allows you to read the same row 0 times, 1 time, or 2 times. It is not a magic "go faster" button for queries: you should only use it if you fully understand the potential consequences on a busy system (such as getting different results at different times!).
The solution is to either:
stop blindly throwing NOLOCK everywhere (accept your blocking to get accurate and consistent data, or use read committed snapshot isolation).
accept that NOLOCK sacrifices accuracy for less blocking, and sometimes it just won't give the right results. If you ask for this, you don't get to complain about it.
Everywhere I look I see that in order to loop through results you have to use a cursor and in the same post someone saying cursors are bad don't use them (which has always been my philosophy) but now I am stuck. I need to loop through a result set!
Here's the situation. I need to come up with a list of ProductIDs that have 2 different statuses set to a specific value. I start the stored procedure, run the query that finds my products that meet the criteria.
So, now I have a list of ProductIDs that I need to run through my validation process:
16050
16052
41817
48255
Now I need for each of those products (there may be 1 there may be 1000, i don't know) to check a whole list of conditions:
Is a specific field = 'SIMPLE'? if so, perform a bunch of other queries and make sure everything is good
If it is not 'SIMPLE' then run a whole other set of queries and make sure that information is all good.
Is another field = 'YES'? if so, perform a bunch of other queries, if it is not, then do other queries.
Is a cursor what I need to use? Is there some other way to do what I need that I just am not seeing?
Thanks,
Leslie
I ended up using a WHILE loop that I can pass each ProductID into a series of checks!!
declare #counter int
declare #productKey varchar(20)
SET #counter = (select COUNT(*) from ##Magento)
while (1=1)
begin
SET #productKey = (select top 1 ProductKey from ##Magento)
print #productKey;
delete from ##Magento Where ProductKey = #productKey
SET #counter-=1;
IF (#counter=0) BREAK;
end
go
It's hard to say without knowing the specifics of your process, but one approach is to create a function that performs your logic and call that.
eg:
delete from yourtable
where productid in (select ProductID from FilteredProducts)
and dbo.ShouldBeDeletedFunction(ProductID) = 1
In general, cursors are bad, but there are always exceptions. Try to avoid them by thinking in terms of sets, rather than the attributes of an individual record.
I'm trying to figure out if this is relatively well-performing T-SQL (this is SQL Server 2008). I need to create a stored procedure that updates a table. The proc accepts as many parameters as there are columns in the table, and with the exception of the PK column, they all default to NULL. The body of the procedure looks like this:
CREATE PROCEDURE proc_repo_update
#object_id bigint
,#object_name varchar(50) = NULL
,#object_type char(2) = NULL
,#object_weight int = NULL
,#owner_id int = NULL
-- ...etc
AS
BEGIN
update
object_repo
set
object_name = ISNULL(#object_name, object_name)
,object_type = ISNULL(#object_type, object_type)
,object_weight = ISNULL(#object_weight, object_weight)
,owner_id = ISNULL(#owner_id, owner_id)
-- ...etc
where
object_id = #object_id
return ##ROWCOUNT
END
So basically:
Update a column only if its corresponding parameter was provided, and leave the rest alone.
This works well enough, but as the ISNULL call will return the value of the column if the received parameter was null, will SQL Server optimize this somehow? This might be a performance bottleneck on the application where the table might be updated heavily (insertion will be uncommon so the performance there is not a problem). So I'm trying to figure out what's the best way to do this. Is there a way to condition the column expressions with something like CASE WHEN or something? The table will be indexed up the wazoo as well for read performance. Is this the best approach? My alternative at this point is to create the UPDATE expression in code (e.g. inline SQL) and execute it against the server. This would solve my doubts about performance, but I'd rather leave this in a stored proc if possible.
Take a look at Hugo Kornelis' blog post at http://sqlblog.com/blogs/hugo_kornelis/archive/2007/09/30/what-if-null-if-null-is-null-null-null-is-null.aspx. Scoll down a bit to the discussion on COALESCE vs. ISNULL. If portability is a future consideration, look at COALESCE.
However, from a performance perspective, take a look at Adam's performance-centric blog post at http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/performance-isnull-vs-coalesce.aspx. ISNULL is the speedier.
Your choice...
BTW, I have a bunch of SP's that are just like your example and have no performance issues using ISNULL. (Being a bit lazy, I like to type 6 vs. 8 chars and being a littel prone to finger-dyslexia, ISNULL is much easier to type :-) )
ISNULL is the fastest way- the only way you'll improve is if you pass in NULL or the actual value, and do the ISNULL in the application.
I want to execute select statement within CTE based on a codition. something like below
;with CTE_AorB
(
if(condition)
select * from table_A
else
select * from table_B
),
CTE_C as
(
select * from CTE_AorB // processing is removed
)
But i get error on this. Is it possible to have if else within CTEs? If not is there a work around Or a better approach.
Thanks.
try:
;with CTE_AorB
(
select * from table_A WHERE (condition true)
union all
select * from table_B WHERE NOT (condition true)
),
CTE_C as
(
select * from CTE_AorB // processing is removed
)
the key with a dynamic search condition is to make sure an index is used, Here is a very comprehensive article on how to handle this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions. This main thing you need to be concerned with is not the duplication of code, but the use of an index. If your query fails to use an index, it will preform poorly. There are several techniques that can be used, which may or may not allow an index to be used.
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
if you are on the proper version of SQL Server 2008, there is an additional technique that can be used, see: Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later)
If you are on that proper release of SQL Server 2008, you can just add OPTION (RECOMPILE) to the query and the local variable's value at run time is used for the optimizations.
Consider this, OPTION (RECOMPILE) will take this code (where no index can be used with this mess of ORs):
WHERE
(#search1 IS NULL or Column1=#Search1)
AND (#search2 IS NULL or Column2=#Search2)
AND (#search3 IS NULL or Column3=#Search3)
and optimize it at run time to be (provided that only #Search2 was passed in with a value):
WHERE
Column2=#Search2
and an index can be used (if you have one defined on Column2)
Never ever try to put conditions like IF inside a single query statements. Even if you do manage to pull it off, this is the one sure-shot way to kill performance. Remember, a single statement means a single plan, and the plan will have to be generated in a way to satisfy both cases, when condition is true and when condition is false, at once. This usually result in the worse possible plan, since the 'condition' usually creates mutually exclusive access path for the plan and the union of the two results in always end-to-end table scan.
Your best approach, for this and many many other reasons, is to pull the IF outside of the statement:
if(condition true)
select * from table_A
else
select * from table_B
I think the IF ELSE stuff might have poor caching if your branch condition flips. Maybe someone more knowledgeable can comment.
Another way would be to UNION ALL with the WHERE clauses as suggested by others. The UNION ALL would replace the IF ELSE
If you are using a parameter, then you only need one statement.
#ID (Some parameter)
;with CTE
(
select * from table_A WHERE id = #ID
union all
select * from table_B WHERE (id = #ID and condition)
)
For my SQL queries, I usually do the following for SELECT statements:
SELECT ...
FROM table t
WHERE 1=1
AND t.[column1] = #param1
AND t.[column2] = #param2
This will make it easy if I need to add / remove / comment any WHERE clauses, since I don't have to care about the first line.
Is there any performance hit when using this pattern?
Additional Info:
Example for sheepsimulator and all other who didn't get the usage.
Suppose the above query, I need to change #param1 to be not included into the query:
With 1=1:
...
WHERE 1=1 <-- no change
--AND t.[column1] = #param1 <-- changed
AND t.[column2] = #param2 <-- no change
...
Without 1=1:
...
WHERE <-- no change
--t.[column1] = #param1 <-- changed
{AND removed} t.[column2] = #param2 <-- changed
...
No, SQL Server is smart enough to omit this condition from the execution plan since it's always TRUE.
Same is true for Oracle, MySQL and PostgreSQL.
It is likely that if you use the profiler and look, you will end up seeing that the optimizer will end up ignoring that more often than not, so in the grand scheme of things, there probably won't be much in the way of performance gain or losses.
This has no performance impact, but there the SQL text looks like it has been mangled by a SQL injection attack. The '1=1' trick appears in many sql injection based attacks. You just run the risk that some customer of yours someday deploys a 'black box' that monitors SQL traffic and you'll find your app flagged as 'hacked'. Also source code analyzers may flag this. Its a long long shot, of course, but something worth putting into the balance.
One potentially mildly negative impact of this is that the AND 1=1 will stop SQL Server's simple parameterisation facility from kicking in.
Demo script
DBCC FREEPROCCACHE; /*<-- Don't run on production box!*/
CREATE TABLE [E7ED0174-9820-4B29-BCDF-C999CA319131]
(
X INT,
Y INT,
PRIMARY KEY (X,Y)
);
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE X = 1
AND Y = 2;
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE X = 2
AND Y = 3;
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE 1 = 1
AND X = 1
AND Y = 2
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE 1 = 1
AND X = 2
AND Y = 3
SELECT usecounts,
execution_count,
size_in_bytes,
cacheobjtype,
objtype,
text,
creation_time,
last_execution_time,
execution_count
FROM sys.dm_exec_cached_plans a
INNER JOIN sys.dm_exec_query_stats b
ON a.plan_handle = b.plan_handle
CROSS apply sys.dm_exec_sql_text(b.sql_handle) AS sql_text
WHERE text LIKE '%\[E7ED0174-9820-4B29-BCDF-C999CA319131\]%' ESCAPE '\'
AND text NOT LIKE '%this_query%'
ORDER BY last_execution_time DESC
GO
DROP TABLE [E7ED0174-9820-4B29-BCDF-C999CA319131]
Shows that both the queries without the 1=1 were satisfied by a single parameterised version of the cached plan whereas the queries with the 1=1 compiled and stored a separate plan for the different constant values.
Ideally you shouldn't be relying on this anyway though and should be explicitly parameterising queries to ensure that the desired elements are parameterised and the parameters have the correct datatypes.
There is no difference, as they evaluated constants and are optimized out. I use both 1=1 and 0=1 in both hand- and code-generated AND and OR lists and it has no effect.
Since the condition is always true, SQL Server will ignore it. You can check by running two queries, one with the condition and one without, and comparing the two actual execution plans.
An alternative to achieve your ease of commenting requirement is to restructure your query:
SELECT ...
FROM table t
WHERE
t.[column1] = #param1 AND
t.[column2] = #param2 AND
t.[column3] = #param3
You can then add/remove/comment out lines in the where conditions and it will still be valid SQL.
No performance hit. Even if your WHERE clause is loaded with a large number of comparisons, this is tiny.
Best case scenario is that it's a bit-for-bit comparison. Worse case is that the digits are evaluated as integers.
For queries of any reasonable complexity there will be no difference. You can look at some execution plans and also compare real execution costs, and see for yourself.