Stored Procedure / CHANGETABLE function slow with parameters, fast with literal - sql-server

I'm using Change Tracking in SQL2008 R2 and seeing a very odd behaviour when trying to determine rows affected in a batch, a stored proc takes ~30 seconds to run when using a parameter value, but when I put the literal values in the call to the CHANGETABLE function it returns in <1s.
A call to the following takes ~30s:
DECLARE #sync_last_received_anchor BigInt;
DECLARE #sync_new_received_anchor BigInt;
SET #sync_last_received_anchor = 272361;
SET #sync_new_received_anchor = 273361;
SELECT [Customer].[CustomerID]
FROM dbo.tblCustomer AS [Customer] WITH (NOLOCK)
INNER JOIN CHANGETABLE(CHANGES [REDACTED].[dbo].[tblCustomer], #sync_last_received_anchor) AS [theChangeTable]
ON [theChangeTable].[CustomerID] = [Customer].[CustomerID]
WHERE ([theChangeTable].[SYS_CHANGE_OPERATION] = 'U'
AND [theChangeTable].[SYS_CHANGE_VERSION] <= #sync_new_received_anchor
)
However changing the CHANGETABLE line, as below, reduces it to ~1s.
INNER JOIN CHANGETABLE(CHANGES [REDACTED].[dbo].[tblCustomer], 272361) AS [theChangeTable]
As we're running SP1 I presume the patch released in SQL2008 CU4 for CHANGETABLE being slow has been fixed (http://support.microsoft.com/kb/2276330).
I'm at a loss though why changing a parameter to a literal value would make so much difference?

It is likely that the stored procedure is doing parameter sniffing - i.e. it thinks the values you supplied are a good match for a plan it has already cached and it it isn't a good match at all.
There are multiple articles on how to approach this issue, one you can test and try on a DEV environment would be this:
http://blogs.msdn.com/b/axperf/archive/2010/05/07/important-sql-server-change-parameter-sniffing-and-plan-caching.aspx

Related

Why does executing a scalar function result in a "create function" operation?

I needed to trim a number of strings in a select statement, so instead of repeating multiple ltrim(rtrim(' the string ') calls, I created a simple trim function as follows:
create function dbo.trim(#String varchar(max))
returns varchar(max)
as
begin
return rtrim(ltrim(#String))
end
go
However, during the execution of my select (which invokes the function multiple times) I run the following select in a different window as sa:
SELECT sqltext.TEXT,
req.session_id,
req.status,
req.command,
req.cpu_time,
req.total_elapsed_time
FROM sys.dm_exec_requests req
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext
and observe the following
TEXT session_id status command cpu_time total_elapsed_time
------------------------------------------------------------------
** 99 running SELECT 12045 12388
where the first column (replaced by ** above for readability) contains the following:
create function dbo.trim(#String varchar(max))
returns varchar(max)
as
begin
return rtrim(ltrim(#String))
end
Am I correctly interpreting this to mean that upon calling the trim function that SQL Server actually calls a create?
As you can see from the table above, the execution time is rather long and in some cases I've found the create operation to outright hang thus blocking the completion of the select and further write operations on the database such that I've had to kill the operation.
Can anyone shed some light on this?
Thanks!
Pab
Am I correctly interpreting this to mean that upon calling the trim
function that SQL Server actually calls a create?
No, SQL Server just shows you the full text of the SQL Module - which includes the CREATE. You can use statement_start_offset and statement_end_offset to see the statement actually being executed.
SideNote: You're better off not using scalar UDFs for something trivial like a TRIM function. These have well known performance problems. See Converting A Scalar User Defined Function To A Inline Table Valued Function for an alternative approach if this is worth encapsulating in a function at all.

SSIS Performance drop when adding parameters

I'm using an OLE DB Source in SSIS to pull data rows from a SQL Server 2012 database:
SELECT item_prod.wo_id, item_prod.oper_id, item_prod.reas_cd, item_prod.lot_no, item_prod.item_id, item_prod.user_id, item_prod.seq_no, item_prod.spare1, item_prod.shift_id, item_prod.ent_id, item_prod.good_prod, item_cons.lot_no as raw_lot_no, item_cons.item_id as rm_item_id, item_cons.qty_cons
FROM item_prod
LEFT OUTER JOIN item_cons on item_cons.wo_id=item_prod.wo_id AND item_cons.oper_id=item_prod.oper_id AND item_cons.seq_no=item_prod.seq_no AND item_prod.lot_no=item_cons.fg_lot_no
This works great, and is able to pull around 1 million rows per minute currently. A left outer join is used instead of a lookup due to much better performance when using no cache, and both tables may contain upwards of 40 million rows.
We need the query to only pull rows that haven't been pulled in a previous run. The last run row_id gets stored in a variable and put at the end of the above query:
WHERE item_prod.row_id > ?
On the first run, the parameter will be -1 (to parse everything). Performance drops between 5-10x by adding the where clause (1 million rows per 5-10 minutes). What is causing such a significant performance drop, and is there a way to optimize it?
It turns out, SSIS creates a stored procedure when executing a query with parameters. This was discovered by looking at the execution in SQL Server Profiler.
As a result, there was a performance hit, which I believe is related to parameter sniffing.
I changed the source to use a SQL Query from Variable and built my query using an expression instead, and this fixed the performance.
Edit: The following are the commands seen in SQL Server Profiler when executing the question's code with the where parameter:
exec [sys].sp_describe_undeclared_parameters N'SELECT item_prod.wo_id, item_prod.oper_id, item_prod.reas_cd, item_prod.lot_no, item_prod.item_id, item_prod.user_id, item_prod.seq_no, item_prod.spare1, item_prod.shift_id, item_prod.ent_id, item_prod.good_prod, item_cons.lot_no as raw_lot_no, item_cons.item_id as rm_item_id, item_cons.qty_cons
FROM item_prod
LEFT OUTER JOIN item_cons on item_cons.wo_id=item_prod.wo_id AND item_cons.oper_id=item_prod.oper_id AND item_cons.seq_no=item_prod.seq_no AND item_prod.lot_no=item_cons.fg_lot_no
WHERE item_prod.row_id > #P1'
declare #p1 int
set #p1=1
exec sp_prepare #p1 output,N'#P1 int',N'SELECT item_prod.wo_id, item_prod.oper_id, item_prod.reas_cd, item_prod.lot_no, item_prod.item_id, item_prod.user_id, item_prod.seq_no, item_prod.spare1, item_prod.shift_id, item_prod.ent_id, item_prod.good_prod, item_cons.lot_no as raw_lot_no, item_cons.item_id as rm_item_id, item_cons.qty_cons
FROM item_prod
LEFT OUTER JOIN item_cons on item_cons.wo_id=item_prod.wo_id AND item_cons.oper_id=item_prod.oper_id AND item_cons.seq_no=item_prod.seq_no AND item_prod.lot_no=item_cons.fg_lot_no
WHERE item_prod.row_id > #P1',1
select #p1
exec [sys].sp_describe_first_result_set N'SELECT item_prod.wo_id, item_prod.oper_id, item_prod.reas_cd, item_prod.lot_no, item_prod.item_id, item_prod.user_id, item_prod.seq_no, item_prod.spare1, item_prod.shift_id, item_prod.ent_id, item_prod.good_prod, item_cons.lot_no as raw_lot_no, item_cons.item_id as rm_item_id, item_cons.qty_cons
FROM item_prod
LEFT OUTER JOIN item_cons on item_cons.wo_id=item_prod.wo_id AND item_cons.oper_id=item_prod.oper_id AND item_cons.seq_no=item_prod.seq_no AND item_prod.lot_no=item_cons.fg_lot_no
WHERE item_prod.row_id > #P1',N'#P1 int',1
Since I'm not entirely sure what the above generated code does, there may be other related commands that I missed. Originally, I assumed SSIS variables would be inserted into the query, but the introduction of the #P1 parameter led me to look at stored procedure implications instead.

performance issue on stored procedure/udf vs plain query from sql mgmnt console in sql server

I have a query that uses nested CTEs which is in a user defined function. I have to use nested CTEs because I want to re-use some calculations/case statements from the previous selects. The query looks similar to what is below.
;with cte1 as
(
select a, b from Table1
),
ct2 as
(
case when a =1 then 1 else 0 end as c, b from cte2
)
select * from cte2
I have this in a udf that is called from multiple stored procs. There are a large number of calculations being done inside this query. I'm noticing a performance difference when the query is run outside of the function. For around 12,000 records, it runs under 11 seconds when the query is run from the SQL management studio, applying all the parameters. When the same parameters are supplied to the udf, it takes around 55 seconds. I tried to put the query inside a stored proc instead of udf, but still the same 55 seconds. It looks like when the query is run from the management console, it uses parallelism for the query but not for function or stored proc.
This is not a major problem at this point but I would like to achieve the same 11 second performance if i can. Has anyone run into a similar scenario before?
display your "settings" from within the stored procedure and from just with SSMS. I have this same thing, faster in SSMS and slower in procedure. You can sometimes resolve this because SSMS is running with differeent settings than the procedure, get them same and you might be able to see the same performance in the procedure. Here is some example code to display the settings:
SELECT SESSIONPROPERTY ('ANSI_NULLS') --Specifies whether the SQL-92 compliant behavior of equals (=) and not equal to (<>) against null values is applied.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ANSI_PADDING') --Controls the way the column stores values shorter than the defined size of the column, and the way the column stores values that have trailing blanks in character and binary data.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ANSI_WARNINGS') --Specifies whether the SQL-92 standard behavior of raising error messages or warnings for certain conditions, including divide-by-zero and arithmetic overflow, is applied.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ARITHABORT') -- Determines whether a query is ended when an overflow or a divide-by-zero error occurs during query execution.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('CONCAT_NULL_YIELDS_NULL') --Controls whether concatenation results are treated as null or empty string values.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('NUMERIC_ROUNDABORT') --Specifies whether error messages and warnings are generated when rounding in an expression causes a loss of precision.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('QUOTED_IDENTIFIER') --Specifies whether SQL-92 rules about how to use quotation marks to delimit identifiers and literal strings are to be followed.
--1 = ON
--0 = OFF
you can just add them to your result set::
SELECT
col1, col2
,SESSIONPROPERTY ('ARITHABORT') AS ARITHABORT
,SESSIONPROPERTY ('ANSI_WARNINGS') AS ANSI_WARNINGS
,SESSIONPROPERTY ('...
FROM ...
if you ry them one at a time, try ARITHABORT first.
see: Resolving an ADO timeout issue in VB6
and Why would SET ARITHABORT ON dramatically speed up a query?

SQL Server Query: Fast with Literal but Slow with Variable

I have a view that returns 2 ints from a table using a CTE. If I query the view like this it runs in less than a second
SELECT * FROM view1 WHERE ID = 1
However if I query the view like this it takes 4 seconds.
DECLARE #id INT = 1
SELECT * FROM View1 WHERE ID = #id
I've checked the 2 query plans and the first query is performing a Clustered index seek on the main table returning 1 record then applying the rest of the view query to that result set, where as the second query is performing an index scan which is returning about 3000 records records rather than just the one I'm interested in and then later filtering the result set.
Is there anything obvious that I'm missing to try to get the second query to use the Index Seek rather than an index scan. I'm using SQL 2008 but anything I do needs to also run on SQL 2005. At first I thought it was some sort of parameter sniffing problem but I get the same results even if I clear the cache.
Probably it is because in the parameter case, the optimizer cannot know that the value is not null, so it needs to create a plan that returns correct results even when it is. If you have SQL Server 2008 SP1 you can try adding OPTION(RECOMPILE) to the query.
You could add an OPTIMIZE FOR hint to your query, e.g.
DECLARE #id INT = 1
SELECT * FROM View1 WHERE ID = #id OPTION (OPTIMIZE FOR (#ID = 1))
In my case in DB table column type was defined as VarChar and in parameterized query parameter type was defined as NVarChar, this introduced CONVERT_IMPLICIT in the actual execution plan to match data type before comparing and that was culprit for sow performance, 2 sec vs 11 sec. Just correcting parameter type made parameterized query as fast as non parameterized version.
One possible way to do that is to CAST the parameters, as such:
SELECT ...
FROM ...
WHERE name = CAST(:name AS varchar)
Hope this may help someone with similar issue.
I ran into this problem myself with a view that ran < 10ms with a direct assignment (WHERE UtilAcctId=12345), but took over 100 times as long with a variable assignment (WHERE UtilAcctId = #UtilAcctId).
The execution-plan for the latter was no different than if I had run the view on the entire table.
My solution didn't require tons of indexes, optimizer-hints, or a long-statistics-update.
Instead I converted the view into a User-Table-Function where the parameter was the value needed on the WHERE clause. In fact this WHERE clause was nested 3 queries deep and it still worked and it was back to the < 10ms speed.
Eventually I changed the parameter to be a TYPE that is a table of UtilAcctIds (int). Then I can limit the WHERE clause to a list from the table.
WHERE UtilAcctId = [parameter-List].UtilAcctId.
This works even better. I think the user-table-functions are pre-compiled.
When SQL starts to optimize the query plan for the query with the variable it will match the available index against the column. In this case there was an index so SQL figured it would just scan the index looking for the value. When SQL made the plan for the query with the column and a literal value it could look at the statistics and the value to decide if it should scan the index or if a seek would be correct.
Using the optimize hint and a value tells SQL that “this is the value which will be used most of the time so optimize for this value” and a plan is stored as if this literal value was used. Using the optimize hint and the sub-hint of UNKNOWN tells SQL you do not know what the value will be, so SQL looks at the statistics for the column and decides what, seek or scan, will be best and makes the plan accordingly.
I know this is long since answered, but I came across this same issue and have a fairly simple solution that doesn't require hints, statistics-updates, additional indexes, forcing plans etc.
Based on the comment above that "the optimizer cannot know that the value is not null", I decided to move the values from a variable into a table:
Original Code:
declare #StartTime datetime2(0) = '10/23/2020 00:00:00'
declare #EndTime datetime2(0) = '10/23/2020 01:00:00'
SELECT * FROM ...
WHERE
C.CreateDtTm >= #StartTime
AND C.CreateDtTm < #EndTime
New Code:
declare #StartTime datetime2(0) = '10/23/2020 00:00:00'
declare #EndTime datetime2(0) = '10/23/2020 01:00:00'
CREATE TABLE #Times (StartTime datetime2(0) NOT NULL, EndTime datetime2(0) NOT NULL)
INSERT INTO #Times(StartTime, EndTime) VALUES(#StartTime, #EndTime)
SELECT * FROM ...
WHERE
C.CreateDtTm >= (SELECT MAX(StartTime) FROM #Times)
AND C.CreateDtTm < (SELECT MAX(EndTime) FROM #Times)
This performed instantly as opposed to several minutes for the original code (obviously your results may vary) .
I assume if I changed my data type in my main table to be NOT NULL, it would work as well, but I was not able to test this at this time due to system constraints.
Came across this same issue myself and it turned out to be a missing index involving a (left) join on the result of a subquery.
select *
from foo A
left outer join (
select x, count(*)
from bar
group by x
) B on A.x = B.x
Added an index named bar_x for bar.x
DECLARE #id INT = 1
SELECT * FROM View1 WHERE ID = #id
Do this
DECLARE #sql varchar(max)
SET #sql = 'SELECT * FROM View1 WHERE ID =' + CAST(#id as varchar)
EXEC (#sql)
Solves your problem

SQL Server Stored Procedure - 'IF statement' vs 'Where criteria'

The question from quite a long time boiling in my head, that out of the following two stored procedures which one would perform better.
Proc 1
CREATE PROCEDURE GetEmployeeDetails #EmployeeId uniqueidentifier,
#IncludeDepartmentInfo bit
AS
BEGIN
SELECT * FROM Employees
WHERE Employees.EmployeeId = #EmployeeId
IF (#IncludeDepartmentInfo = 1)
BEGIN
SELECT Departments.* FROM Departments, Employees
WHERE Departments.DepartmentId = Employees.DepartmentId
AND Employees.EmployeeId = #EmployeeId
END
END
Proc 2
CREATE PROCEDURE GetEmployeeDetails #EmployeeId uniqueidentifier,
#IncludeDepartmentInfo bit
AS
BEGIN
SELECT * FROM Employees
WHERE Employees.EmployeeId = #EmployeeId
SELECT Departments.* FROM Departments, Employees
WHERE Departments.DepartmentId = Employees.DepartmentId
AND Employees.EmployeeId = #EmployeeId
AND #IncludeDepartmentInfo = 1
END
the only difference between the two is use of 'if statment'.
if proc 1/proc 2 are called with alternating values of #IncludeDepartmentInfo then from my understanding proc 2 would perform better, because it will retain the same query plan irrespective of the value of #IncludeDepartmentInfo, whereas proc1 will change query plan in each call
answers are really appericated
PS: this is just a scenario, please don't go to the explicit query results but the essence of example. I am really particular about the query optimizer result (in both cases of 'if and where' and their difference), there are many aspects which I know could affect the performance which I want to avoid in this question.
SELECT Departments.* FROM Departments, Employees
WHERE Departments.DepartmentId = Employees.DepartmentId
AND Employees.EmployeeId = #EmployeeId
AND #IncludeDepartmentInfo = 1
When SQL compiles a query like this it must be compiled for any value of #IncludeDepartmentInfo. The resulted plan can well be one that scans the tables and performs the join and after that checks the variable, resulting in unnecessary I/O. The optimizer may be smart and move the check for the variable ahead of the actual I/O operations in the execution plan, but this is never guaranteed. This is why I always recommend to use explicit IFs in the T-SQL for queries that need to perform very differently based on a variable value (the typical example being OR conditions).
gbn's observation is also an important one: from an API design point of view is better to have a consistent return type (ie. always return the same shaped and number of result sets).
From a consistency perspective, number 2 will always return 2 datasets. Overloading aside, you wouldn't have a client code method that may be returns a result, maybe not.
If you reuse this code, the other calling client will have to know this flag too.
If the code does 2 different things, then why not 2 different stored procs?
Finally, it's far better practice to use modern JOIN syntax and separate joining from filtering. In this case, personally I'd use EXISTS too.
SELECT
D.*
FROM
Departments D
JOIN
Employees E ON D.DepartmentId = E.DepartmentId
WHERE
E.EmployeeId = #EmployeeId
AND
#IncludeDepartmentInfo = 1
When you use the 'if' statement, you may run only one query instead of two. I would think that one query would almost always be faster than two. Your point about query plans may be valid if the first query were complex and took a long time to run, and the second was trivial. However, the first query looks like it retrieves a single row based on a primary key - probably pretty fast every time. So, I would keep the 'if' - but I would test to verify.
The performance difference would be too small for anyone to notice.
Premature optimization is the root of all evil. Stop worrying about performance and start implementing features that make your customers smile.

Resources