Where clause with varbinary doesn't work - sql-server

I have a table MyTable with a field MyField, which is varbinary(max). I have the following query:
SELECT COUNT(*) FROM MyTable WHERE MyField IS NOT NULL
The SELECT clause can have anything, it does not matter. Because of the varbinary MyField in the Where clause, the execution never ends.
I tried even this:
SELECT Size
FROM
(
SELECT ISNULL(DATALENGTH(MyField), 0) AS Size FROM MyTable
) AS A
WHERE A.Size > 0
The inner query works fine, the whole query without the Where clause works fine, but with that Where clause it is stuck. Could anybody please explain this to me?
Thanks.

Don't think or assume that it couldn't possibly be blocking, just because a different query returns immediately. With a different where clause, and a different plan, and possibly different locking escalation, you could certainly have cases where one query is blocked and another isn't, even against the same table.
The query is obviously being blocked, if your definition of "stuck" is what I think it is. Now you just need to determine by who.
In one query window, do this:
SELECT ##SPID;
Make note of that number. Now in that same query window, run your query that gets "stuck" (in other words, don't select a spid in one window and expect it to have anything to do with your query that is already running in another window).
Then, in a different query window, do this:
SELECT blocking_session_id, status, command, wait_type, last_wait_type
FROM sys.dm_exec_requests
WHERE session_id = <spid from above>;
Here is a visualization that I suspect might help (note that my "stuck" query is different from yours):
If you get a non-zero number in the first column, then in that different query window, do this:
DBCC INPUTBUFFER(<that blocking session id>);
If you aren't blocked, I'd be curious to know what the other columns show.
As an aside, changing the WHERE clause to use slightly different predicates to identify the same rows isn't going to magically eliminate blocking. Also, there is no real benefit to doing this:
SELECT Size
FROM
(
SELECT ISNULL(DATALENGTH(MyField), 0) AS Size FROM MyTable
) AS A
WHERE A.Size > 0
When you can just do this:
SELECT ISNULL(DATALENGTH(MyField), 0) AS Size
FROM dbo.MyTable -- always use schema prefix
WHERE DATALENGTH(MyField) > 0; -- always use semi-colon
SQL Server is smart enough to not calculate the DATALENGTH twice, if that is your concern.

Just for fun and giggles I decided to toss this one out to you. I'm taking some things for granted here but you can look at tasks that had to wait. Look for wait types that begin with LCK_ as these will be your blocked tasks. Please note that on a busy server the ring buffer may have rolled over after a while. Also note that this is intended to supplement #AaronBertran's excellent answer and in no way meant to supplant it. His is more comprehensive and is the correct way to identify the issue while it's happening.
SELECT
td.r.value('#name','sysname') event_name,
td.r.value('#timestamp','datetime2(0)') event_timestamp,
td.r.value('(data[#name="wait_type"]/text)[1]','sysname') wait_type,
td.r.value('(data[#name="duration"]/value)[1]','bigint') wait_duration,
td.r.value('(action[#name="sql_text"]/value)[1]','nvarchar(max)') sql_text,
td.r.query('.') event_data
FROM (
SELECT
CAST(target_data AS XML) target_data
FROM sys.dm_xe_sessions s
JOIN sys.dm_xe_session_targets t
ON s.address = t.event_session_address
WHERE s.name = N'system_health'
and target_name = 'ring_buffer'
) base
CROSS APPLY target_data.nodes('/RingBufferTarget/*') td(r)
where td.r.value('#name','sysname') = 'wait_info';

Related

WITH (NOLOCK) Giving varying results

At first I thought I should go see my Dr. and have an MRI of my brain. But, the results I am seeing were confirmed by a second programmer. Running the same exact SQL is giving me different results at different times! It makes no sense. Here is what I am running (table names changed to protect the innocent)...
declare #customerid int;
declare #locationid int;
declare #reportdate datetime;
set #customerid=2063;
set #locationid=101;
set #reportdate=getdate();
select i.InvoiceDate as [Date], ... from Invoice i (nolock) ...
UNION ALL select ... from Remit e (nolock) ...
UNION ALL select ... from [Rec] r (nolock) join TrTypes t (nolock) on ...
The two select/union all's are identical but sometimes the second does not return records from the Remit table when the first shows them. And... Sometimes the first doesn't pull them either. I put these queries in SQL Server Management Studio and keep hitting F5... Sometime I get proper results, sometimes I don't.
Further, when I remove the UNION ALLs and just run the queries, nothing is returned from Remit.
Could this be a locking issue?
This query is from a C# program that produces a detail report and it never returns the Remit records. There is another program that creates summary records from the Remit table and it always returns the records from a similar query... That is why I had to look at the detail program/report.
Thoughts?
NOLOCK means you don't care about consistency, concurrency and locking/blocking. It allows you to read the same row 0 times, 1 time, or 2 times. It is not a magic "go faster" button for queries: you should only use it if you fully understand the potential consequences on a busy system (such as getting different results at different times!).
The solution is to either:
stop blindly throwing NOLOCK everywhere (accept your blocking to get accurate and consistent data, or use read committed snapshot isolation).
accept that NOLOCK sacrifices accuracy for less blocking, and sometimes it just won't give the right results. If you ask for this, you don't get to complain about it.

Optimizing Execution Plans for Parameterized T-SQL Queries Containing Window Functions

EDIT: I've updated the example code and provided complete table and view implementations for reference, but the essential question remains unchanged.
I have a fairly complex view in a database that I am attempting to query. When I attempt to retrieve a set of rows from the view by hard-coding the WHERE clause to specific foreign key values, the view executes very quickly with an optimal execution plan (indexes are used properly, etc.)
SELECT *
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = 20
However, when I attempt to add parameters to the query, all of a sudden my execution plan falls apart. When I run the query below, I'm getting index scans instead of seeks all over the place and the query performance is very poor.
DECLARE #ForeignKeyCol int = 20
SELECT *
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = #ForeignKeyCol
I'm using SQL Server 2008 R2. What gives here? What is it about using parameters that is causing a sub-optimal plan? Any help would be greatly appreciated.
For reference, here are the object definitions for which I'm getting the error.
CREATE TABLE [dbo].[BaseTable]
(
[PrimaryKeyCol] [uniqueidentifier] PRIMARY KEY,
[ForeignKeyCol] [int] NULL,
[DataCol] [binary](1000) NOT NULL
)
CREATE NONCLUSTERED INDEX [IX_BaseTable_ForeignKeyCol] ON [dbo].[BaseTable]
(
[ForeignKeyCol] ASC
)
CREATE VIEW [dbo].[ViewOnBaseTable]
AS
SELECT
PrimaryKeyCol,
ForeignKeyCol,
DENSE_RANK() OVER (PARTITION BY ForeignKeyCol ORDER BY PrimaryKeyCol) AS ForeignKeyRank,
DataCol
FROM
dbo.BaseTable
I am certain that the window function is the problem, but I am filtering my query by a single value that the window function is partitioning by, so I would expect the optimizer to filter first and then run the window function. It does this in the hard-coded example but not the parameterized example. Below are the two query plans. The top plan is good and the bottom plan is bad.
When using OPTION (RECOMPILE) be sure to look at the post-execution ('actual') plan rather than the pre-execution ('estimated') one. Some optimizations are only applied when execution occurs:
DECLARE #ForeignKeyCol int = 20;
SELECT ForeignKeyCol, ForeignKeyRank
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = #ForeignKeyCol
OPTION (RECOMPILE);
Pre-execution plan:
Post-execution plan:
Tested on SQL Server 2012 build 11.0.3339 and SQL Server 2008 R2 build 10.50.4270
Background & limitations
When windowing functions were added in SQL Server 2005, the optimizer had no way to push selections past these new sequence projections. To address some common scenarios where this caused performance problems, SQL Server 2008 added a new simplification rule, SelOnSeqPrj, which allows suitable selections to be pushed where the value is a constant. This constant may be a literal in the query text, or the sniffed value of a parameter obtained via OPTION (RECOMPILE). There is no particular problem with NULLs though the query may need to have ANSI_NULLS OFF to see this. As far as I know, applying the simplification to constant values only is an implementation limitation; there is no particular reason it could not be extended to work with variables. My recollection is that the SelOnSeqPrj rule addresssed the most commonly seen performance problems.
Parameterization
The SelOnSeqPrj rule is not applied when a query is successfully auto-parameterized. There is no reliable way to determine if a query was auto-parameterized in SSMS, it only indicates that auto-param was attempted. To be clear, the presence of place-holders like [#0] only shows that auto-parameterization was attempted. A reliable way to tell if a prepared plan was cached for reuse is to inspect the plan cache, where the 'parameterized plan handle' provides the link between ad-hoc and prepared plans.
For example, the following query appears to be auto-parameterized in SSMS:
SELECT *
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = 20;
But the plan cache shows otherwise:
WITH XMLNAMESPACES
(
DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'
)
SELECT
parameterized_plan_handle =
deqp.query_plan.value('(//StmtSimple)[1]/#ParameterizedPlanHandle', 'nvarchar(64)'),
parameterized_text =
deqp.query_plan.value('(//StmtSimple)[1]/#ParameterizedText', 'nvarchar(max)'),
decp.cacheobjtype,
decp.objtype,
decp.plan_handle
FROM sys.dm_exec_cached_plans AS decp
CROSS APPLY sys.dm_exec_sql_text(decp.plan_handle) AS dest
CROSS APPLY sys.dm_exec_query_plan(decp.plan_handle) AS deqp
WHERE
dest.[text] LIKE N'%ViewOnBaseTable%'
AND dest.[text] NOT LIKE N'%dm_exec_cached_plans%';
If the database option for forced parameterization is enabled, we get a parameterized result, where the optimization is not applied:
ALTER DATABASE Sandpit SET PARAMETERIZATION FORCED;
DBCC FREEPROCCACHE;
SELECT *
FROM dbo.ViewOnBaseTable
WHERE ForeignKeyCol = 20;
The plan cache query now shows a parameterized cached plan, linked by the parameterized plan handle:
Workaround
Where possible, my preference is to rewrite the view as an in-line table-valued function, where the intended position of the selection can be made more explicit (if necessary):
CREATE FUNCTION dbo.ParameterizedViewOnBaseTable
(#ForeignKeyCol integer)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT
bt.PrimaryKeyCol,
bt.ForeignKeyCol,
ForeignKeyRank = DENSE_RANK() OVER (
PARTITION BY bt.ForeignKeyCol
ORDER BY bt.PrimaryKeyCol),
bt.DataCol
FROM dbo.BaseTable AS bt
WHERE
bt.ForeignKeyCol = #ForeignKeyCol;
The query becomes:
DECLARE #ForeignKeyCol integer = 20;
SELECT pvobt.*
FROM dbo.ParameterizedViewOnBaseTable(#ForeignKeyCol) AS pvobt;
With the execution plan:
You could always go the CROSS APPLY way.
ALTER VIEW [dbo].[ViewOnBaseTable]
AS
SELECT
PrimaryKeyCol,
ForeignKeyCol,
ForeignKeyRank,
DataCol
FROM (
SELECT DISTINCT
ForeignKeyCol
FROM dbo.BaseTable
) AS Src
CROSS APPLY (
SELECT
PrimaryKeyCol,
DENSE_RANK() OVER (ORDER BY PrimaryKeyCol) AS ForeignKeyRank,
DataCol
FROM dbo.BaseTable AS B
WHERE B.ForeignKeyCol = Src.ForeignKeyCol
) AS X
I think in this particular case it may be because the data types between your parameters and your table do not match exactly so SQL Server has to do an implicit conversion which is not a sargable operation.
Check your table data types and make your parameters the same type. Or do the cast yourself outside the query.

if else within CTE?

I want to execute select statement within CTE based on a codition. something like below
;with CTE_AorB
(
if(condition)
select * from table_A
else
select * from table_B
),
CTE_C as
(
select * from CTE_AorB // processing is removed
)
But i get error on this. Is it possible to have if else within CTEs? If not is there a work around Or a better approach.
Thanks.
try:
;with CTE_AorB
(
select * from table_A WHERE (condition true)
union all
select * from table_B WHERE NOT (condition true)
),
CTE_C as
(
select * from CTE_AorB // processing is removed
)
the key with a dynamic search condition is to make sure an index is used, Here is a very comprehensive article on how to handle this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions. This main thing you need to be concerned with is not the duplication of code, but the use of an index. If your query fails to use an index, it will preform poorly. There are several techniques that can be used, which may or may not allow an index to be used.
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
if you are on the proper version of SQL Server 2008, there is an additional technique that can be used, see: Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later)
If you are on that proper release of SQL Server 2008, you can just add OPTION (RECOMPILE) to the query and the local variable's value at run time is used for the optimizations.
Consider this, OPTION (RECOMPILE) will take this code (where no index can be used with this mess of ORs):
WHERE
(#search1 IS NULL or Column1=#Search1)
AND (#search2 IS NULL or Column2=#Search2)
AND (#search3 IS NULL or Column3=#Search3)
and optimize it at run time to be (provided that only #Search2 was passed in with a value):
WHERE
Column2=#Search2
and an index can be used (if you have one defined on Column2)
Never ever try to put conditions like IF inside a single query statements. Even if you do manage to pull it off, this is the one sure-shot way to kill performance. Remember, a single statement means a single plan, and the plan will have to be generated in a way to satisfy both cases, when condition is true and when condition is false, at once. This usually result in the worse possible plan, since the 'condition' usually creates mutually exclusive access path for the plan and the union of the two results in always end-to-end table scan.
Your best approach, for this and many many other reasons, is to pull the IF outside of the statement:
if(condition true)
select * from table_A
else
select * from table_B
I think the IF ELSE stuff might have poor caching if your branch condition flips. Maybe someone more knowledgeable can comment.
Another way would be to UNION ALL with the WHERE clauses as suggested by others. The UNION ALL would replace the IF ELSE
If you are using a parameter, then you only need one statement.
#ID (Some parameter)
;with CTE
(
select * from table_A WHERE id = #ID
union all
select * from table_B WHERE (id = #ID and condition)
)

T-SQL 1=1 Performance Hit

For my SQL queries, I usually do the following for SELECT statements:
SELECT ...
FROM table t
WHERE 1=1
AND t.[column1] = #param1
AND t.[column2] = #param2
This will make it easy if I need to add / remove / comment any WHERE clauses, since I don't have to care about the first line.
Is there any performance hit when using this pattern?
Additional Info:
Example for sheepsimulator and all other who didn't get the usage.
Suppose the above query, I need to change #param1 to be not included into the query:
With 1=1:
...
WHERE 1=1 <-- no change
--AND t.[column1] = #param1 <-- changed
AND t.[column2] = #param2 <-- no change
...
Without 1=1:
...
WHERE <-- no change
--t.[column1] = #param1 <-- changed
{AND removed} t.[column2] = #param2 <-- changed
...
No, SQL Server is smart enough to omit this condition from the execution plan since it's always TRUE.
Same is true for Oracle, MySQL and PostgreSQL.
It is likely that if you use the profiler and look, you will end up seeing that the optimizer will end up ignoring that more often than not, so in the grand scheme of things, there probably won't be much in the way of performance gain or losses.
This has no performance impact, but there the SQL text looks like it has been mangled by a SQL injection attack. The '1=1' trick appears in many sql injection based attacks. You just run the risk that some customer of yours someday deploys a 'black box' that monitors SQL traffic and you'll find your app flagged as 'hacked'. Also source code analyzers may flag this. Its a long long shot, of course, but something worth putting into the balance.
One potentially mildly negative impact of this is that the AND 1=1 will stop SQL Server's simple parameterisation facility from kicking in.
Demo script
DBCC FREEPROCCACHE; /*<-- Don't run on production box!*/
CREATE TABLE [E7ED0174-9820-4B29-BCDF-C999CA319131]
(
X INT,
Y INT,
PRIMARY KEY (X,Y)
);
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE X = 1
AND Y = 2;
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE X = 2
AND Y = 3;
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE 1 = 1
AND X = 1
AND Y = 2
GO
SELECT *
FROM [E7ED0174-9820-4B29-BCDF-C999CA319131]
WHERE 1 = 1
AND X = 2
AND Y = 3
SELECT usecounts,
execution_count,
size_in_bytes,
cacheobjtype,
objtype,
text,
creation_time,
last_execution_time,
execution_count
FROM sys.dm_exec_cached_plans a
INNER JOIN sys.dm_exec_query_stats b
ON a.plan_handle = b.plan_handle
CROSS apply sys.dm_exec_sql_text(b.sql_handle) AS sql_text
WHERE text LIKE '%\[E7ED0174-9820-4B29-BCDF-C999CA319131\]%' ESCAPE '\'
AND text NOT LIKE '%this_query%'
ORDER BY last_execution_time DESC
GO
DROP TABLE [E7ED0174-9820-4B29-BCDF-C999CA319131]
Shows that both the queries without the 1=1 were satisfied by a single parameterised version of the cached plan whereas the queries with the 1=1 compiled and stored a separate plan for the different constant values.
Ideally you shouldn't be relying on this anyway though and should be explicitly parameterising queries to ensure that the desired elements are parameterised and the parameters have the correct datatypes.
There is no difference, as they evaluated constants and are optimized out. I use both 1=1 and 0=1 in both hand- and code-generated AND and OR lists and it has no effect.
Since the condition is always true, SQL Server will ignore it. You can check by running two queries, one with the condition and one without, and comparing the two actual execution plans.
An alternative to achieve your ease of commenting requirement is to restructure your query:
SELECT ...
FROM table t
WHERE
t.[column1] = #param1 AND
t.[column2] = #param2 AND
t.[column3] = #param3
You can then add/remove/comment out lines in the where conditions and it will still be valid SQL.
No performance hit. Even if your WHERE clause is loaded with a large number of comparisons, this is tiny.
Best case scenario is that it's a bit-for-bit comparison. Worse case is that the digits are evaluated as integers.
For queries of any reasonable complexity there will be no difference. You can look at some execution plans and also compare real execution costs, and see for yourself.

SQL Server Stored Procedure - 'IF statement' vs 'Where criteria'

The question from quite a long time boiling in my head, that out of the following two stored procedures which one would perform better.
Proc 1
CREATE PROCEDURE GetEmployeeDetails #EmployeeId uniqueidentifier,
#IncludeDepartmentInfo bit
AS
BEGIN
SELECT * FROM Employees
WHERE Employees.EmployeeId = #EmployeeId
IF (#IncludeDepartmentInfo = 1)
BEGIN
SELECT Departments.* FROM Departments, Employees
WHERE Departments.DepartmentId = Employees.DepartmentId
AND Employees.EmployeeId = #EmployeeId
END
END
Proc 2
CREATE PROCEDURE GetEmployeeDetails #EmployeeId uniqueidentifier,
#IncludeDepartmentInfo bit
AS
BEGIN
SELECT * FROM Employees
WHERE Employees.EmployeeId = #EmployeeId
SELECT Departments.* FROM Departments, Employees
WHERE Departments.DepartmentId = Employees.DepartmentId
AND Employees.EmployeeId = #EmployeeId
AND #IncludeDepartmentInfo = 1
END
the only difference between the two is use of 'if statment'.
if proc 1/proc 2 are called with alternating values of #IncludeDepartmentInfo then from my understanding proc 2 would perform better, because it will retain the same query plan irrespective of the value of #IncludeDepartmentInfo, whereas proc1 will change query plan in each call
answers are really appericated
PS: this is just a scenario, please don't go to the explicit query results but the essence of example. I am really particular about the query optimizer result (in both cases of 'if and where' and their difference), there are many aspects which I know could affect the performance which I want to avoid in this question.
SELECT Departments.* FROM Departments, Employees
WHERE Departments.DepartmentId = Employees.DepartmentId
AND Employees.EmployeeId = #EmployeeId
AND #IncludeDepartmentInfo = 1
When SQL compiles a query like this it must be compiled for any value of #IncludeDepartmentInfo. The resulted plan can well be one that scans the tables and performs the join and after that checks the variable, resulting in unnecessary I/O. The optimizer may be smart and move the check for the variable ahead of the actual I/O operations in the execution plan, but this is never guaranteed. This is why I always recommend to use explicit IFs in the T-SQL for queries that need to perform very differently based on a variable value (the typical example being OR conditions).
gbn's observation is also an important one: from an API design point of view is better to have a consistent return type (ie. always return the same shaped and number of result sets).
From a consistency perspective, number 2 will always return 2 datasets. Overloading aside, you wouldn't have a client code method that may be returns a result, maybe not.
If you reuse this code, the other calling client will have to know this flag too.
If the code does 2 different things, then why not 2 different stored procs?
Finally, it's far better practice to use modern JOIN syntax and separate joining from filtering. In this case, personally I'd use EXISTS too.
SELECT
D.*
FROM
Departments D
JOIN
Employees E ON D.DepartmentId = E.DepartmentId
WHERE
E.EmployeeId = #EmployeeId
AND
#IncludeDepartmentInfo = 1
When you use the 'if' statement, you may run only one query instead of two. I would think that one query would almost always be faster than two. Your point about query plans may be valid if the first query were complex and took a long time to run, and the second was trivial. However, the first query looks like it retrieves a single row based on a primary key - probably pretty fast every time. So, I would keep the 'if' - but I would test to verify.
The performance difference would be too small for anyone to notice.
Premature optimization is the root of all evil. Stop worrying about performance and start implementing features that make your customers smile.

Resources