if else within CTE? - sql-server

I want to execute select statement within CTE based on a codition. something like below
;with CTE_AorB
(
if(condition)
select * from table_A
else
select * from table_B
),
CTE_C as
(
select * from CTE_AorB // processing is removed
)
But i get error on this. Is it possible to have if else within CTEs? If not is there a work around Or a better approach.
Thanks.

try:
;with CTE_AorB
(
select * from table_A WHERE (condition true)
union all
select * from table_B WHERE NOT (condition true)
),
CTE_C as
(
select * from CTE_AorB // processing is removed
)
the key with a dynamic search condition is to make sure an index is used, Here is a very comprehensive article on how to handle this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions. This main thing you need to be concerned with is not the duplication of code, but the use of an index. If your query fails to use an index, it will preform poorly. There are several techniques that can be used, which may or may not allow an index to be used.
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
if you are on the proper version of SQL Server 2008, there is an additional technique that can be used, see: Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later)
If you are on that proper release of SQL Server 2008, you can just add OPTION (RECOMPILE) to the query and the local variable's value at run time is used for the optimizations.
Consider this, OPTION (RECOMPILE) will take this code (where no index can be used with this mess of ORs):
WHERE
(#search1 IS NULL or Column1=#Search1)
AND (#search2 IS NULL or Column2=#Search2)
AND (#search3 IS NULL or Column3=#Search3)
and optimize it at run time to be (provided that only #Search2 was passed in with a value):
WHERE
Column2=#Search2
and an index can be used (if you have one defined on Column2)

Never ever try to put conditions like IF inside a single query statements. Even if you do manage to pull it off, this is the one sure-shot way to kill performance. Remember, a single statement means a single plan, and the plan will have to be generated in a way to satisfy both cases, when condition is true and when condition is false, at once. This usually result in the worse possible plan, since the 'condition' usually creates mutually exclusive access path for the plan and the union of the two results in always end-to-end table scan.
Your best approach, for this and many many other reasons, is to pull the IF outside of the statement:
if(condition true)
select * from table_A
else
select * from table_B

I think the IF ELSE stuff might have poor caching if your branch condition flips. Maybe someone more knowledgeable can comment.
Another way would be to UNION ALL with the WHERE clauses as suggested by others. The UNION ALL would replace the IF ELSE

If you are using a parameter, then you only need one statement.
#ID (Some parameter)
;with CTE
(
select * from table_A WHERE id = #ID
union all
select * from table_B WHERE (id = #ID and condition)
)

Related

Optimizing Execution Plans for Parameterized T-SQL Queries Containing Window Functions

EDIT: I've updated the example code and provided complete table and view implementations for reference, but the essential question remains unchanged.
I have a fairly complex view in a database that I am attempting to query. When I attempt to retrieve a set of rows from the view by hard-coding the WHERE clause to specific foreign key values, the view executes very quickly with an optimal execution plan (indexes are used properly, etc.)
SELECT *
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = 20
However, when I attempt to add parameters to the query, all of a sudden my execution plan falls apart. When I run the query below, I'm getting index scans instead of seeks all over the place and the query performance is very poor.
DECLARE #ForeignKeyCol int = 20
SELECT *
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = #ForeignKeyCol
I'm using SQL Server 2008 R2. What gives here? What is it about using parameters that is causing a sub-optimal plan? Any help would be greatly appreciated.
For reference, here are the object definitions for which I'm getting the error.
CREATE TABLE [dbo].[BaseTable]
(
[PrimaryKeyCol] [uniqueidentifier] PRIMARY KEY,
[ForeignKeyCol] [int] NULL,
[DataCol] [binary](1000) NOT NULL
)
CREATE NONCLUSTERED INDEX [IX_BaseTable_ForeignKeyCol] ON [dbo].[BaseTable]
(
[ForeignKeyCol] ASC
)
CREATE VIEW [dbo].[ViewOnBaseTable]
AS
SELECT
PrimaryKeyCol,
ForeignKeyCol,
DENSE_RANK() OVER (PARTITION BY ForeignKeyCol ORDER BY PrimaryKeyCol) AS ForeignKeyRank,
DataCol
FROM
dbo.BaseTable
I am certain that the window function is the problem, but I am filtering my query by a single value that the window function is partitioning by, so I would expect the optimizer to filter first and then run the window function. It does this in the hard-coded example but not the parameterized example. Below are the two query plans. The top plan is good and the bottom plan is bad.
When using OPTION (RECOMPILE) be sure to look at the post-execution ('actual') plan rather than the pre-execution ('estimated') one. Some optimizations are only applied when execution occurs:
DECLARE #ForeignKeyCol int = 20;
SELECT ForeignKeyCol, ForeignKeyRank
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = #ForeignKeyCol
OPTION (RECOMPILE);
Pre-execution plan:
Post-execution plan:
Tested on SQL Server 2012 build 11.0.3339 and SQL Server 2008 R2 build 10.50.4270
Background & limitations
When windowing functions were added in SQL Server 2005, the optimizer had no way to push selections past these new sequence projections. To address some common scenarios where this caused performance problems, SQL Server 2008 added a new simplification rule, SelOnSeqPrj, which allows suitable selections to be pushed where the value is a constant. This constant may be a literal in the query text, or the sniffed value of a parameter obtained via OPTION (RECOMPILE). There is no particular problem with NULLs though the query may need to have ANSI_NULLS OFF to see this. As far as I know, applying the simplification to constant values only is an implementation limitation; there is no particular reason it could not be extended to work with variables. My recollection is that the SelOnSeqPrj rule addresssed the most commonly seen performance problems.
Parameterization
The SelOnSeqPrj rule is not applied when a query is successfully auto-parameterized. There is no reliable way to determine if a query was auto-parameterized in SSMS, it only indicates that auto-param was attempted. To be clear, the presence of place-holders like [#0] only shows that auto-parameterization was attempted. A reliable way to tell if a prepared plan was cached for reuse is to inspect the plan cache, where the 'parameterized plan handle' provides the link between ad-hoc and prepared plans.
For example, the following query appears to be auto-parameterized in SSMS:
SELECT *
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = 20;
But the plan cache shows otherwise:
WITH XMLNAMESPACES
(
DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'
)
SELECT
parameterized_plan_handle =
deqp.query_plan.value('(//StmtSimple)[1]/#ParameterizedPlanHandle', 'nvarchar(64)'),
parameterized_text =
deqp.query_plan.value('(//StmtSimple)[1]/#ParameterizedText', 'nvarchar(max)'),
decp.cacheobjtype,
decp.objtype,
decp.plan_handle
FROM sys.dm_exec_cached_plans AS decp
CROSS APPLY sys.dm_exec_sql_text(decp.plan_handle) AS dest
CROSS APPLY sys.dm_exec_query_plan(decp.plan_handle) AS deqp
WHERE
dest.[text] LIKE N'%ViewOnBaseTable%'
AND dest.[text] NOT LIKE N'%dm_exec_cached_plans%';
If the database option for forced parameterization is enabled, we get a parameterized result, where the optimization is not applied:
ALTER DATABASE Sandpit SET PARAMETERIZATION FORCED;
DBCC FREEPROCCACHE;
SELECT *
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = 20;
The plan cache query now shows a parameterized cached plan, linked by the parameterized plan handle:
Workaround
Where possible, my preference is to rewrite the view as an in-line table-valued function, where the intended position of the selection can be made more explicit (if necessary):
CREATE FUNCTION dbo.ParameterizedViewOnBaseTable
(#ForeignKeyCol integer)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT
bt.PrimaryKeyCol,
bt.ForeignKeyCol,
ForeignKeyRank = DENSE_RANK() OVER (
PARTITION BY bt.ForeignKeyCol
ORDER BY bt.PrimaryKeyCol),
bt.DataCol
FROM dbo.BaseTable AS bt
WHERE
bt.ForeignKeyCol = #ForeignKeyCol;
The query becomes:
DECLARE #ForeignKeyCol integer = 20;
SELECT pvobt.*
FROM dbo.ParameterizedViewOnBaseTable(#ForeignKeyCol) AS pvobt;
With the execution plan:
You could always go the CROSS APPLY way.
ALTER VIEW [dbo].[ViewOnBaseTable]
AS
SELECT
PrimaryKeyCol,
ForeignKeyCol,
ForeignKeyRank,
DataCol
FROM (
SELECT DISTINCT
ForeignKeyCol
FROM dbo.BaseTable
) AS Src
CROSS APPLY (
SELECT
PrimaryKeyCol,
DENSE_RANK() OVER (ORDER BY PrimaryKeyCol) AS ForeignKeyRank,
DataCol
FROM dbo.BaseTable AS B
WHERE B.ForeignKeyCol = Src.ForeignKeyCol
) AS X
I think in this particular case it may be because the data types between your parameters and your table do not match exactly so SQL Server has to do an implicit conversion which is not a sargable operation.
Check your table data types and make your parameters the same type. Or do the cast yourself outside the query.

T-SQL 1=1 Performance Hit

For my SQL queries, I usually do the following for SELECT statements:
SELECT ...
FROM table t
WHERE 1=1
AND t.[column1] = #param1
AND t.[column2] = #param2
This will make it easy if I need to add / remove / comment any WHERE clauses, since I don't have to care about the first line.
Is there any performance hit when using this pattern?
Additional Info:
Example for sheepsimulator and all other who didn't get the usage.
Suppose the above query, I need to change #param1 to be not included into the query:
With 1=1:
...
WHERE 1=1 <-- no change
--AND t.[column1] = #param1 <-- changed
AND t.[column2] = #param2 <-- no change
...
Without 1=1:
...
WHERE <-- no change
--t.[column1] = #param1 <-- changed
{AND removed} t.[column2] = #param2 <-- changed
...
No, SQL Server is smart enough to omit this condition from the execution plan since it's always TRUE.
Same is true for Oracle, MySQL and PostgreSQL.
It is likely that if you use the profiler and look, you will end up seeing that the optimizer will end up ignoring that more often than not, so in the grand scheme of things, there probably won't be much in the way of performance gain or losses.
This has no performance impact, but there the SQL text looks like it has been mangled by a SQL injection attack. The '1=1' trick appears in many sql injection based attacks. You just run the risk that some customer of yours someday deploys a 'black box' that monitors SQL traffic and you'll find your app flagged as 'hacked'. Also source code analyzers may flag this. Its a long long shot, of course, but something worth putting into the balance.
One potentially mildly negative impact of this is that the AND 1=1 will stop SQL Server's simple parameterisation facility from kicking in.
Demo script
DBCC FREEPROCCACHE; /*<-- Don't run on production box!*/
CREATE TABLE [E7ED0174-9820-4B29-BCDF-C999CA319131]
(
X INT,
Y INT,
PRIMARY KEY (X,Y)
);
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE X = 1
AND Y = 2;
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE X = 2
AND Y = 3;
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE 1 = 1
AND X = 1
AND Y = 2
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE 1 = 1
AND X = 2
AND Y = 3
SELECT usecounts,
execution_count,
size_in_bytes,
cacheobjtype,
objtype,
text,
creation_time,
last_execution_time,
execution_count
FROM sys.dm_exec_cached_plans a
INNER JOIN sys.dm_exec_query_stats b
ON a.plan_handle = b.plan_handle
CROSS apply sys.dm_exec_sql_text(b.sql_handle) AS sql_text
WHERE text LIKE '%\[E7ED0174-9820-4B29-BCDF-C999CA319131\]%' ESCAPE '\'
AND text NOT LIKE '%this_query%'
ORDER BY last_execution_time DESC
GO
DROP TABLE [E7ED0174-9820-4B29-BCDF-C999CA319131]
Shows that both the queries without the 1=1 were satisfied by a single parameterised version of the cached plan whereas the queries with the 1=1 compiled and stored a separate plan for the different constant values.
Ideally you shouldn't be relying on this anyway though and should be explicitly parameterising queries to ensure that the desired elements are parameterised and the parameters have the correct datatypes.
There is no difference, as they evaluated constants and are optimized out. I use both 1=1 and 0=1 in both hand- and code-generated AND and OR lists and it has no effect.
Since the condition is always true, SQL Server will ignore it. You can check by running two queries, one with the condition and one without, and comparing the two actual execution plans.
An alternative to achieve your ease of commenting requirement is to restructure your query:
SELECT ...
FROM table t
WHERE
t.[column1] = #param1 AND
t.[column2] = #param2 AND
t.[column3] = #param3
You can then add/remove/comment out lines in the where conditions and it will still be valid SQL.
No performance hit. Even if your WHERE clause is loaded with a large number of comparisons, this is tiny.
Best case scenario is that it's a bit-for-bit comparison. Worse case is that the digits are evaluated as integers.
For queries of any reasonable complexity there will be no difference. You can look at some execution plans and also compare real execution costs, and see for yourself.

SQL Server Stored Procedure - 'IF statement' vs 'Where criteria'

The question from quite a long time boiling in my head, that out of the following two stored procedures which one would perform better.
Proc 1
CREATE PROCEDURE GetEmployeeDetails #EmployeeId uniqueidentifier,
#IncludeDepartmentInfo bit
AS
BEGIN
SELECT * FROM Employees
WHERE Employees.EmployeeId = #EmployeeId
IF (#IncludeDepartmentInfo = 1)
BEGIN
SELECT Departments.* FROM Departments, Employees
WHERE Departments.DepartmentId = Employees.DepartmentId
AND Employees.EmployeeId = #EmployeeId
END
END
Proc 2
CREATE PROCEDURE GetEmployeeDetails #EmployeeId uniqueidentifier,
#IncludeDepartmentInfo bit
AS
BEGIN
SELECT * FROM Employees
WHERE Employees.EmployeeId = #EmployeeId
SELECT Departments.* FROM Departments, Employees
WHERE Departments.DepartmentId = Employees.DepartmentId
AND Employees.EmployeeId = #EmployeeId
AND #IncludeDepartmentInfo = 1
END
the only difference between the two is use of 'if statment'.
if proc 1/proc 2 are called with alternating values of #IncludeDepartmentInfo then from my understanding proc 2 would perform better, because it will retain the same query plan irrespective of the value of #IncludeDepartmentInfo, whereas proc1 will change query plan in each call
answers are really appericated
PS: this is just a scenario, please don't go to the explicit query results but the essence of example. I am really particular about the query optimizer result (in both cases of 'if and where' and their difference), there are many aspects which I know could affect the performance which I want to avoid in this question.
SELECT Departments.* FROM Departments, Employees
WHERE Departments.DepartmentId = Employees.DepartmentId
AND Employees.EmployeeId = #EmployeeId
AND #IncludeDepartmentInfo = 1
When SQL compiles a query like this it must be compiled for any value of #IncludeDepartmentInfo. The resulted plan can well be one that scans the tables and performs the join and after that checks the variable, resulting in unnecessary I/O. The optimizer may be smart and move the check for the variable ahead of the actual I/O operations in the execution plan, but this is never guaranteed. This is why I always recommend to use explicit IFs in the T-SQL for queries that need to perform very differently based on a variable value (the typical example being OR conditions).
gbn's observation is also an important one: from an API design point of view is better to have a consistent return type (ie. always return the same shaped and number of result sets).
From a consistency perspective, number 2 will always return 2 datasets. Overloading aside, you wouldn't have a client code method that may be returns a result, maybe not.
If you reuse this code, the other calling client will have to know this flag too.
If the code does 2 different things, then why not 2 different stored procs?
Finally, it's far better practice to use modern JOIN syntax and separate joining from filtering. In this case, personally I'd use EXISTS too.
SELECT
D.*
FROM
Departments D
JOIN
Employees E ON D.DepartmentId = E.DepartmentId
WHERE
E.EmployeeId = #EmployeeId
AND
#IncludeDepartmentInfo = 1
When you use the 'if' statement, you may run only one query instead of two. I would think that one query would almost always be faster than two. Your point about query plans may be valid if the first query were complex and took a long time to run, and the second was trivial. However, the first query looks like it retrieves a single row based on a primary key - probably pretty fast every time. So, I would keep the 'if' - but I would test to verify.
The performance difference would be too small for anyone to notice.
Premature optimization is the root of all evil. Stop worrying about performance and start implementing features that make your customers smile.

Stored Procedure for Updating a Column in Sql Server

I have a requirement to update a column with multiple values. The query looks like below.
Update table1 set column1 = (
select value from table2 where table1.column0 = table2.coulmn
)
Is there any generalised stored procedure for a requirement like the above?
short of creating a statement as a string and using the "execute" statement, I don't know of one. Generally "execute" is frowned on as it's a potential injection attack point.
Why would you want to update one table with information that is easily available in another? Seems like you are just guaranteeing that you are going to have to run this query every single time you perform an update, insert or delete against the camsnav table. Otherwise how are you going to keep them in sync?
Also, if you cannot guarantee that the sub-query will return exactly one row, it is probably safer to use the SQL Server-specific and proprietary update format:
UPDATE f SET nav = n.nav
FROM camsfolio AS f
INNER JOIN camsnav AS n
ON f.schcode = n.schcode;
SQL Server doesn't use "generalised stored procedures" for this kind of thing. It's up to you to build your own SP, composed using an appropriate parameterized UPDATE statement.

What is a good way to paginate out of SQL 2000 using Start and Length parameters?

I have been given the task of refactoring an existing stored procedure so that the results are paginated. The SQL server is SQL 2000 so I can't use the ROW_NUMBER method of pagination. The Stored proc is already fairly complex, building chunks of a large sql statement together before doing an sp_executesql and has various sorting options available.
The first result out of google seems like a good method but I think the example is wrong in that the 2nd sort needs to be reversed and the case when the start is less than the pagelength breaks down. The 2nd example on that page also seems like a good method but the SP is taking a pageNumber rather than the start record. And the whole temp table thing seems like it would be a performance drain.
I am making progress going down this path but it seems slow and confusing and I am having to do quite a bit of REPLACE methods on the Sort order to get it to come out right.
Are there any other easier techniques I am missing?
There are two SQL Server 2000 compliant answers in this StackOverflow question - skip the accepted one, which is 2005-only:
No, I'm afraid not - SQL Server 2000 doesn't have any of the 2005 niceties like Common Table Expression (CTE) and such..... the method described in the Google link seems to be one way to go.
Marc
Also take a look here
http://databases.aspfaq.com/database/how-do-i-page-through-a-recordset.html
scroll down to Stored Procedure Methods
Depending on your application architecture (and your amount of data, it's structure, DB server load etc.) you could use the DB access layer for paging.
For example, with ADO you can define a page size on the record set (DataSet in ADO.NET) object and do the paging on the client. Classic ADO even lets you use a server side cursor, though I don't know if that scales well (I think this was removed altogether in ADO.NET).
MSDN documentation: Paging Through a Query Result (ADO.NET)
After playing with this for a while there seems to be only one way of really doing this (using Start and Length parameters) and that's with the temp table.
My final solution was to not use the #start parameter and instead use a #page parameter and then use the
SET #sql = #sql + N'
SELECT * FROM
(
SELECT TOP ' + Cast( #length as varchar) + N' * FROM
(
SELECT TOP ' + Cast( #page*#length as varchar) + N'
field1,
field2
From Table1
order by field1 ASC
) as Result
Order by Field1 DESC
) as Result
Order by Field 1 ASC'
The original query was much more complex than what is shown here and the order by was ordered on at least 3 fields and determined by a long CASE clause, requiring me to use a series of REPLACE functions to get the fields in the right order.
We've been using variations on this query for a number of years. This example gives items 50,000 to 50,300.
select top 300
Items.*
from Items
where
Items.CustomerId = 1234 AND
Items.Active = 1 AND
Items.Id not in
(
select top 50000 Items.Id
from Items
where
Items.CustomerId = 1234 AND
Items.Active = 1
order by Items.id
)
order by Items.Id

Resources