currenty i am working on a report system for our data archive.
the aim is to select data for every 1st of a month, every full hour and so on.
So I have a bunch of parameters to select the data down to a single hour.
To achieve that I used CASE statements to adjust the select like this:
SELECT
MIN(cd.Timestamp) as Mintime,
--Hours
CASE
WHEN
#SelHour IS NOT NULL
THEN
DATEPART(HOUR, cd.Timestamp)
END as Hour,
... -- more CASES up to DATEPART(YEAR, cd.Timestamp)
FROM dbo.CustomerData cd
... -- filter data and other stuff
This statements works good for me so far, but I am a bit worried about the performance of the stored procedure. Because I don't know how the server will behave with this "changing" statement. The result can vary between a 20 row result up to a 250.000 rows and more. Depending on the given parameters. As far as I know the sql server saves the query plan and reuses it for future execution.
When it saves the plan for the 20 row result the performance for the 250.000 result is propably pretty poor.
Now I am wondering whats the better aproach. Using this stored procedure or create the statement inside my c# backend and pass the "adjusted" statement to the sql server?
Thanks and greetings
For 20 rows result set it will work good anywhere. But for returning 250k records to c# code seems change in design for this code since loading 250k records in memory & looping will also consume significant memory and such concurrent requests from different session/user will multiply load exponentially.
Anyway to address problem with SQL Server reusing same query plan, you can recompile query plans selectively or every time. These are options available for Recompile execution plan:
OPTION(RECOMPILE)
SELECT
MIN(cd.Timestamp) as Mintime,
--Hours
CASE
WHEN
#SelHour IS NOT NULL
THEN
DATEPART(HOUR, cd.Timestamp)
END as Hour,
... -- more CASES up to DATEPART(YEAR, cd.Timestamp)
FROM dbo.CustomerData cd
... -- filter data and other stuff
OPTION(RECOMPILE)
WITH RECOMPILE Option this will recompile execution plan every time
CREATE PROCEDURE dbo.uspStoredPrcName
#ParamName varchar(30) = 'abc'
WITH RECOMPILE
AS
...
RECOMPILE Query Hint providing WITH RECOMPILE in execute
NOTE: this will require CREATE PROCEDURE permission in the database and ALTER permission on the schema in which the procedure is being created.
EXECUTE uspStoredPrcName WITH RECOMPILE;
GO
sp_recompile System Stored Procedure
NOTE: Requires ALTER permission on the specified procedure.
EXEC sp_recompile N'dbo.uspStoredPrcName ';
GO
For more details on Recompile refer Microsoft Docs:
https://learn.microsoft.com/en-us/sql/relational-databases/stored-procedures/recompile-a-stored-procedure?view=sql-server-ver15
Related
I have a Snowflake stored procedure which is running for 8 hrs. Upon checking the query profiler to see which query is running long I just see a single entry for the call stored procedure statement.
However, Through the logs I know that an insert statement is running which is like:
insert into Snowflake_table
select *
from External_table(over S3 bucket)
I want to check and find out why reading from external table is taking lot of time but the insert query is not showing up in the Query profiler. I have tried querying the information_schema.Query_history but its not showing another query running apart from the call stored procedure statement
SELECT *
FROM Table(information_schema.Query_history_by_warehouse('ANALYTICS_WH_PROD'))
WHERE execution_status = 'RUNNING'
ORDER BY start_time desc;
Please suggest how to find the bottleneck here
Docs is stating that Queries on INFORMATION_SCHEMA views do not guarantee consistency with respect to concurrent DDL: https://docs.snowflake.com/en/sql-reference/info-schema.html
This means that it is possible that your insert-statement is running but is not shown as a result of your query. It could be included but it's not a must.
You could now change the filter to execution_status IN 'success', 'failed' and check again after the procedure finished.
I am stress testing a system that is using a temporary table in dynamic SQL. The table is created early on in the transaction and is filled by several dynamic SQL statements in several stored procedures that are executed as part of the batch using statements of the form:
INSERT #MyTable (...)
SELECT ...
where the SELECT statement is reasonably complicated in that it may contain UNION ALL and UNPIVOT statements and refer to several UDFs. All strings are executed using sp_executesql and Parameter Sniffing is enabled.
I have noticed that under load I am seeing a lot of RESOURCE_SEMAPHORE_QUERY_COMPILE waits where the query text being recompiled is present and identical in several waits at the same time and appears throughout the stress test which lasts about 5mins. The memory consumption on the server usually sits around 60% utilization and there is no limit on how much SQL Server can consume. The limiting factor appears to be CPU, which is constantly at >95% during the test.
I have profiled the server during the test to observe the SQL:StmtRecompile event which highlights the reason for the recompile is:
5 - Temp table changed
but the temp table is the same every time and there are no DDL statements performed against the table once it has been created, apart from when it is dropped at the end of the batch.
So far, I have tried:
Enabling the "optimize for ad hoc workloads" option
OPTION(KEEPFIXED PLAN)
Changing the dynamic statement to just the SELECT and then using INSERT ... EXEC so the temp table is not in the executed string
All of these have made no difference and the waits persist.
Why would SQL think that is needs to recompile these identical queries each time they are executed and how can I get it to keep and reuse the cached plans it is creating?
Note: I cannot change the temp table to an In-Memory table because sometimes the stored procedures using this may have to query another database on the same instance.
This is using SQL Server 2016 SP1 CU7.
It appears that removing the insertion into the temp table in the dynamic SQL strings improves performance significantly. For example, changing this:
EXEC sp_executesql
N'INSERT #tempTable (...) SELECT ... FROM ...'
where the SELECT statement is non-trivial, to this:
INSERT #tempTable (...)
EXEC sp_executesql
N'SELECT ... FROM ...'
greatly reduced the number of blocks created during compilation. Unfortunately, recompilation of the queries is not avoided it's just that the queries being recompiled are now much simpler and therefore less CPU intensive.
I have also found it more performant to create an In-Memory Table Type with the same columns as the temp table, perform the complex insertions into a table variable of that type and perform a single insert from the table variable into the temp table at the end.
Pardon me if this is a duplicate. The closest I could find was Random timeout running a stored proc - drop recreate fixes but I'm not certain the answers there about recompiling the stored procedure apply.
I have an Azure SQL database, latest version, that has a lot of traffic from an Azure web app front end. I have a nightly remote job that runs a batch to rebuild indexes on the Azure SQL database as that seems to help greatly with controlling database size and performance.
Normally, the rebuilding of indexes takes about 20 minutes. Last night it timed out after 2 hours. The error handler in that batch did not log any errors.
Soon after rebuilding indexes was started, one particular stored procedure starting timing out for every client calling it. Other stored procedures using the same tables were not having any issues. When I discovered the problem, I could alleviate all the timeouts and suspended processes by altering the stored procedure to immediately return. When I altered the stored procedure again to behave normally, the issues reappeared immediately. My understanding is that altering the stored procedure forced it to recompile but that didn't fix it.
Ultimately, I completely dropped and recreated the procedure with the original code and the issue was resolved.
This procedure and the schema it uses have been completely stable for many months. The procedure itself is quite simple:
CREATE Procedure [dbo].[uspActivityGet] (#databaseid uniqueidentifier) AS
begin
SET NOCOUNT ON;
--There may be writing activities to the table asynchronously, do not use nolock on tblActivity - the ActivityBlob might be null in a dirty read.
select top 100 a.Id, h.HandsetNumber, a.ActivityBlob, a.ActivityReceived
from dbo.tblDatabases d with(nolock) join dbo.tblHandsets h with(nolock) on d.DatabaseId = h.DatabaseId join dbo.tblActivity a on h.Id = a.HandsetId
where d.DatabaseId = #databaseid and a.ActivitySent is null
order by a.ActivityReceived
end
While the procedure would hang and time out with something like this:
exec dbo.uspActivityGet 'AF3EA01B-DB22-4A39-9E1C-D096D2DF1215'
Running the identical select in a query window would return promptly and successfully:
declare #databaseid uniqueidentifier; set #databaseid = 'AF3EA01B-DB22-4A39-9E1C-D096D2DF1215'
select top 100 a.Id, h.HandsetNumber, a.ActivityBlob, a.ActivityReceived
from dbo.tblDatabases d with(nolock) join dbo.tblHandsets h with(nolock) on d.DatabaseId = h.DatabaseId join dbo.tblActivity a on h.Id = a.HandsetId
where d.DatabaseId = #databaseid and a.ActivitySent is null
order by a.ActivityReceived
Any ideas how I can prevent this from happening in the future? Thank you.
Edit - Adding execution plan screenshot
Edit - Adding query used to view running processes. There were many, guessing aproximately 150, in the suspended state and they were all for the same stored procedure - uspActivityGet. Also, Data IO Percentage was maxed out the whole time when it normally runs 20 - 40% in peak demand times. I don't recall what the wait type was. Here is the query used to view that.
select * from sys.dm_Exec_requests r with(nolock) CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) order by r.total_elapsed_time desc
Edit - It happened again tonight. Here is the execution plan of the same procedure during the issue. After dropping and creating the procedure again, the execution plan returned to normal and the issue was resolved.
During the issue, sp_executesql with the identical query took about 5 minutes to execute and I believe that is representative of what was happening. There were about 50 instances of uspActivityGet suspended with wait type SLEEP_TASK or IO_QUEUE_LIMIT.
Perhaps the next question is why is index rebuilding or other nightly maintenance doing this to the execution plan?
The clues are in the query and the troublesome execution plan. See Poor Performance with Parallelism and Top
The normal execution plan seems quite efficient and shouldn't need recompiled as long as the relevant schema doesn't change. I also want to avoid parallelism in this query. I added the following two options to the query for assurance on both points and all is happy again.
OPTION (KEEPFIXED PLAN, MAXDOP 1)
I need to use the same query twice but have a slightly different where clause. I was wondering if it would be efficient to simply call the same stored proc with a bit value, and have an IF... ELSE... statement, deciding on which fields to compare.
Or should I make two stored procs and call each one based on logic in my app?
I'd like to know this more in detail though to understand properly.
How is the execution plan compiled for this? Is there one for each code block in each IF... ELSE...?
Or is it compiled as one big execution plan?
You are right to be concerned about the execution plan being cached.
Martin gives a good example showing that the plan is cached and will be optimized for a certain branch of your logic the first time it is executed.
After the first execution that plan is reused even if you call the stored procedure (sproc) with a different parameter causing your executing flow to choose another branch.
This is very bad and will kill performance. I've seen this happen many times and it takes a while to find the root cause.
The reason behind this is called "Parameter Sniffing" and it is well worth researching.
A common proposed solution (one that I don't advice) is to split up your sproc into a few tiny ones.
If you call a sproc inside a sproc that inner sproc will get an execution plan optimized for the parameter being passed to it.
Splitting up a sproc into a few smaller ones when there is no good reason (a good reason would be modularity) is an ugly workaround. Martin shows that it's possible for a statement to be recompiled by introducing a change to the schema.
I would use OPTION (RECOMPILE) at the end of the statement. This instructs the optimizer to do a statement recompilation taking into account the current value of all variables: not only parameters but local variables are also taken into account which can makes the difference between a good and a bad plan.
To come back to your question of constructing a query with a different where clause according to a parameter. I would use the following pattern:
WHERE
(#parameter1 is null or col1 = #parameter1 )
AND
(#parameter2 is null or col2 = #parameter2 )
...
OPTION (RECOMPILE)
The down side is that the execution plan for this statement is never cached (it doesn't influence caching up to the point of the statement though) which can have an impact if the sproc is executed many time as the compilation time should now be taken into account. Performing a test with production quality data will give you the answer if it's a problem or not.
The upside is that you can code readable and elegant sprocs and not set the optimizer on the wrong foot.
Another option to keep in mind is that you can disable execution plan caching at the sproc level (as opposed to the statement level) level which is less granular and, more importantly, will not take into account the value of local variables when optimizing.
More information at
http://www.sommarskog.se/dyn-search-2005.html
http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/
It is compiled once using the initial value of the parameters passed into the procedure. Though some statements may be subject to deferred compile in which case they will be compiled with whatever the parameter values are when eventually compiled.
You can see this from running the below and looking at the actual execution plans.
CREATE TABLE T
(
C INT
)
INSERT INTO T
SELECT 1 AS C
UNION ALL
SELECT TOP (1000) 2
FROM master..spt_values
UNION ALL
SELECT TOP (1000) 3
FROM master..spt_values
GO
CREATE PROC P #C INT
AS
IF #C = 1
BEGIN
SELECT '1'
FROM T
WHERE C = #C
END
ELSE IF #C = 2
BEGIN
SELECT '2'
FROM T
WHERE C = #C
END
ELSE IF #C = 3
BEGIN
CREATE TABLE #T
(
X INT
)
INSERT INTO #T
VALUES (1)
SELECT '3'
FROM T,
#T
WHERE C = #C
END
GO
EXEC P 1
EXEC P 2
EXEC P 3
DROP PROC P
DROP TABLE T
Running the 2 case shows an estimated number of rows coming from T as 1 not 1000 because that statement was compiled according to the initial parameter value passed in of 1. Running the 3 case gives an accurate estimated count of 1000 as the reference to the (yet to be created) temporary table means that statement was subject to a deferred compile.
What is the best way to accurately measure the performance (time to complete) of a stored procedure?
I’m about to start an attempt to optimize a monster stored procedure, and in order to correctly determine if my tweaks have any effect, I need something to compare the before and after.
My ideas so far:
Looking a query execution time SQL Management Studio: Not very accurate, but very convenient.
Adding timers in the stored procedure and printing the elapsed time: Adding debug code like that stinks.
Using the SQL Server Profiler, adding filters to target just my stored procedure. This is my best option so far.
Any other options?
There's lots of detailed performance information in the DMV dm_exec_query_stats
DECLARE #procname VARCHAR(255)
SET #procname = 'your proc name'
SELECT * FROM sys.dm_exec_query_stats WHERE st.objectid = OBJECT_ID(#procname)
This will give you cumulative performance data and execution counts per cached statement.
You can use DBCC FREEPROCCACHE to reset the counters (don't run this in a production system, since will purge all the cached query plans).
You can get the query plans for each statement by extending this query:
SELECT SUBSTRING(st.text, (qs.statement_start_offset/2)+1,
((CASE statement_end_offset WHEN -1 THEN DATALENGTH(st.text) ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)+1) [sub_statement]
,*, CONVERT(XML, tqp.query_plan)
FROM sys.dm_exec_query_stats qs CROSS APPLY
sys.dm_exec_sql_text(sql_handle) st CROSS APPLY
sys.dm_exec_query_plan(plan_handle) qp CROSS APPLY
sys.dm_exec_text_query_plan(plan_handle, statement_start_offset, statement_end_offset ) tqp
WHERE st.objectid = OBJECT_ID(#procname)
ORDER BY statement_start_offset, execution_count
This will give you pointers about which parts of the SP are performing badly, and - if you include the execution plans - why.
Profiler is the most reliable method. You can also use SET STATISTICS IO ON and SET STATISTICS TIME ON but these don't include the full impact of scalar UDFs.
You can also turn on the "include client statistics" option in SSMS to get an overview of the performance of the last 10 runs.
One possible improvement on your timers/debug option is to store the results in a table. In this way you can slice-and-dice the resulting timing data with SQL queries rather than just visually parsing your debug output.
You want to ensure that you are performing fair tests i.e. comparing like with like. Consider running your tests using a cold cache in order to force your stored procedure execution to be served from the IO Subsystem each time you perform your tests.
Take a look at the system stored procedures DBCC FREEPROCCACHE and DBCC FREESYSTEMCACHE