I'm using SQL Server and I want to benefit from reusing query plan. I found this document, but it remains unclear for me whether the plan for my query is being reused or not.
declare #su dbo.IntCollection -- TABLE (Value int not null)
insert into #su values (1),(2),(3) --... about 500 values
update mt
set mt.MyField = getutcdate()
from MyTable mt
join #su vsu on mt.Id = vsu.Value -- Clustered PK, int
Technically the text of batch differs from run to run, as different values are being inserted in #su. But the text of update query remains the same. If I were using .NET I would basically pass a table variable to SQL command, but I'm using Python and it looks like there no way to pass table parameter from my program.
Question 1: does the plan for update query get reused? Or does optimizer look that text of batch is different and does not analyze single queries in batch? In other words, is it the same as
update MyTable
set MyField = getutcdate()
where Id in (1, 2, 3 ...)
Question 2: I can force SQL to remain the same between calls by introducing a stored procedure with table parameter, but will I benefit from it?
Question 3: how to identify for a given query whether its plan was reused or computed again?
Question 4: should I worry about all above in my specific case? After all it is just an update of table on bunch of IDs...
Just answers to your questions..
Question 1: does the plan for update query get reused? Or does optimizer look that text of batch is different and does not analyze single queries in batch? In other words, is it the same as
Your both update statements are treated as new queries,since SQL tries to calculate hash of the query and any simple change will not match with old hash
Question 2: I can force SQL to remain the same between calls by introducing a stored procedure with table parameter, but will I benefit from it?
this sounds like a good approach to me..rather than a bunch of IN's
Question 3: how to identify for a given query whether its plan was reused or computed again?
select usecounts from sys.dm_exec_cached_plans ec
cross apply
sys.dm_exec_sql_text(ec.plan_handle) txt
where txt.text like '%your query text%'
Question 4: should I worry about all above in my specific case? After all it is just an update of table on bunch of IDs...
it seems to me,you are worrying much..There are many rules which enforce query plan reuse behaviour as pointed out in the white paper you referred..so most of the times,query plan will be reused..
I would start worrying about plan re usability only when i see high SQL Compilations/sec coupled with Batch Requests/sec
Taken from Answer here :https://dba.stackexchange.com/questions/19544/how-badly-do-sql-compilations-impact-the-performance-of-sql-server
SQL Compilations/sec is a good metric, but only when coupled with Batch Requests/sec. By itself, compilations per sec doesn't really tell you much.
You are seeing 170. If batch req per sec is only 200 (a little exaggerated for effect) then yes, you need to get down to the bottom of the cause (most likely an overuse of ad hoc querying and single-use plans). But if your batch req per sec is measuring about 5000 then 170 compilations per sec is not bad at all. It's a general rule of thumb that Compilations/sec should be at 10% or less than total Batch Requests/sec.
Related
From this SO answer a view should provide the same performance as using the same query directly.
Is querying over a view slower than executing SQL directly?
I have a view where this is not true.
This query targeting a view
SELECT
*
FROM
[Front].[vw_Details] k
WHERE
k.Id = 970435
Takes 10 seconds to complete. Copying the query from the view and adding WHERE k.Id = 970435 to it completes in less than 1 second. The view is nothing special, 4 LEFT JOINs, and a few CASE directives to clean up data.
How can I figure out what the problem is, or what do I need to complete this question with in order for this to be answerable?
Update 1:
SQL Server Version: 12.0.4436.0
Query plan for view: https://pastebin.com/RY40Ab0k
Query plan for select: https://pastebin.com/gwahhgpu
Your query plan is no longer visible, but if you look in the plan, you will most likely see a triangle complaining about the cardinality estimation and/or implicite declaration. What it means is that you are joining tables in a way where your keys are hard to guess for the SQL engine.
It is instant when you run from a query directly, probably because it doesn't need to guess the size of your key is
For example:
k.Id = 970435
SQLSERVER already knows that it is looking for 970435 a 6 digit number.
It can eliminate all the key that doesn't start by 9 and doesn't have 6 digits. :)
However, in a view, it has to build the plan in a way to account for unknown. Because it doesn't know what kind of key it may hold.
See the microsoft for various example and scenario that may help you.
https://learn.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide?view=sql-server-ver15
If you are always looking for an int, one work around is to force the type with a cast or convert clause. It's may cause performance penalty depending on your data, but it is a trick in the toolbox to tell sql to not attempt the query a plan as varchar(max) or something along that line.
SELECT *
FROM [Front].[vw_Details] k
WHERE TRY_CONVERT(INT,k.Id) = 970435
use a stored procedure to return results. stored procedures use indexes, whereas, views often don't
or
use a table function and query the table function
I've been tasked with improving the performance (and this is my first real-world performance tuning taks) of a reporting stored procedure which is called by an SSRS front-end and the stored procedure currently takes about 30 seconds to run on the largest amount of data (based on filters set from the report frontend).
This stored procedure has a breakdown of 19 queries executing in it, most of which are transforming the data from an initial (legacy) format from inside the base tables into a meaningful dataset to be displayed to the business side.
I've created a query based on a few DMV's in order to find out which are the most resource-consuming queries from the stored procedure (small snippet below) and I have found one query which takes about 10 seconds, in average, to complete.
select
object_name(st.objectid) [Procedure Name]
, dense_rank() over (partition by st.objectid order by qs.last_elapsed_time desc) [rank-execution time]
, dense_rank() over (partition by st.objectid order by qs.last_logical_reads desc) [rank-logical reads]
, dense_rank() over (partition by st.objectid order by qs.last_worker_time desc) [rank-worker (CPU) time]
, dense_rank() over (partition by st.objectid order by qs.last_logical_writes desc) [rank-logical write]
...
from sys.dm_exec_query_stats as qs
cross apply sys.dm_exec_sql_text (qs.sql_handle) as st
cross apply sys.dm_exec_text_query_plan (qs.plan_handle, qs.statement_start_offset, qs.statement_end_offset) as qp
where st.objectid in ( object_id('SuperDooperReportingProcedure') )
, [rank-execution time]
, [rank-logical reads]
, [rank-worker (CPU) time]
, [rank-logical write] desc
Now, this query is a bit strange in the sense that the execution plan shows that shows that the bulk of the work (~80%) is done when inserting the data into the local temporary table and not when interrogating the other tables from which the source data is taken and then manipulated. (screenshot below is from SQL Sentry Plan Explorer)
Also, in terms of row estimates, the execution plan has way off estimates for this, in the sense that there are only 4218 rows inserted into the local temporary table as opposed to the ~248k rows that the execution plan thinks its moving into the local temporary table. So, becasue of this, I'm thinking "statistics", but still do those even matter if ~80% of the work is the actual insert into the table?
One of my first recommendations was to re-write the entire process and the stored procedure so as to not include the moving and transforming of the data into the reporting stored procedure and to do the data transformation nightly into some persisted tables (real-time data is not required, only relevant data until end of previous day). But the business side does not want to invest time and resources into redesigning this and instead "suggests" I do performance tuning in the sense of finding where and what indexes I can add to speed this up.
I don't believe that adding indexes to base tables will improve the performance of the report since most of the time needed for running the query is saving the data into a temporary table (which from my knowledge it will hit tempdb, which means that they will be written to disk -> increased time due to I/O latency).
But, even so, as I've mentioned this is my first performance tuning task and I've tried to read as much as possible related to this in the last couple of days and these are my conclusions so far, but I'd like to ask for advice from a broader audience and hopefully get a few more insights and understanding on what I can do to improve this procedure.
As a few clear questions I'd appreciate if could be answered are:
Is there anything incorrect in what I have said above (in my understanding of the db or my assumptions) ?
Is it true that adding an index to a temporary table will actually increase the time of execution, since the table (and its associated index(es) is/are being rebuilt on each execution)?
Could there anything else be done in this scenario without having to re-write the procedure / queries and only be done via indexes or other tuning methods? (I've read a few article headlines that you could also "tune tempdb", but I didn't get into the details of those, yet).
Any help is very much appreciated and if you need more details I'll be happy to post.
Update (2 Aug 2016):
The query in question is (partially) below. What is missing are a few more aggregate columns and their corresponding lines in the GROUP BY section:
select
b.ProgramName
,b.Region
,case when b.AM IS null and b.ProgramName IS not null
then 'Unassigned'
else b.AM
end as AM
,rtrim(ltrim(b.Store)) Store
,trd.Store_ID
,b.appliesToPeriod
,isnull(trd.countLeadActual,0) as Actual
,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between #start_date and #end_date then b.budgetValue else 0 end),0) as Budget
,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between #start_date and #end_date and (trd.considerMe = -1 or b.StoreID < 0) then b.budgetValue else 0 end),0) as CleanBudget
...
into #SalvesVsBudgets
from #StoresBudgets b
left join #temp_report_data trd on trd.store_ID = b.StoreID and trd.newSourceID = b.ProgramID
where (b.StoreDivision is not null or (b.StoreDivision is null and b.ProgramName = 'NewProgram'))
group by
b.ProgramName
,b.Region
,case when b.AM IS null and b.ProgramName IS not null
then 'Unassigned'
else b.AM
end
,rtrim(ltrim(b.Store))
,trd.Store_ID
,b.appliesToPeriod
,isnull(trd.countLeadActual,0)
I'm not sure if this is actually helpful, but since #kcung requested it, I added the information.
Also, to answer some his questions:
the temporary tables have no indexes on them
RAM size: 32 GB
Update (3 Aug 2016):
I have tried #kcung's suggestions to move the CASE statements from the aggregate-generating query and unfortunately, overall, the procedure time has not improved, noticeably, as it still fluctuates in the range of ±0.25 to ±1.0 second (yes, both lower and higher time than the original version of the stored procedure - but I'm guessing this is due to variable workload on my machine).
The execution plan for the same query, but modified to remove the CASE conditions, leaving only the SUM aggregates, is now:
Adding indexes to the temporary table will definitely improve the read call but slows down the write calls to the temporary table.
Here, as you mentioned, there are 19 queries executing in the procedure, so analyzing only one query with execution plan would not be more helpful.
Adding more, if possible, execute this query only & check how much time it takes (rows affected).
Other approach you may try, not sure if possible in your case, try using table variable instead of temporary table. This is because, using table variable over the temporary table has additional advantages such as, procedure is pre-compiled, no transactional logs are maintained. & more, you don't need to write drop table.
Any chance I can see the query ? and the indexes on both tables ?
How big is your ram ? how big is the row in each table(roughly) ?
Can you update statistics for both table and resend the query planner ?
To answer your question :
You're mostly right, except in the part of adding indexes. Adding indexes will help the query to do lookup. It will also give chance to the query planner to consider nested loop join plan instead of the hash join plan. Unfortunately, I can't answer more until my question being answered.
You shouldn't need to add index to the temp table. Adding index to this temp(or any insert destination table) table will increase write time, because the insert will need to update that index. Just imagine an index as copy of your table with less information and it sits on top of your table and it needs to be in sync with your table. Every write (insert, update, delete) needs to update this index.
Looking at both tables total rows, this query should run way faster than 10s, unless you have a lemon PC, then it's a different story.
EDIT:
Just want to point out for point 2, I didn't realise you're source table is temp table as well. Temporary table is destroyed after each session of a connection ended. Adding index to temporary table means that you will add extra time to create this index everytime you create this temporary table.
EDIT:
Sorry, I'm using phone now. I'm just gonna be short.
So essentially 2 things :
add primary key on temp table creation time so you do it in one go. Don't bother with adding nonclustered index or any covering index you will end up spending more time creating those.
see your query, all of the case when statement, instead of doing it in this query, why don't you add them as another column in the table. Essentially you want to avoid calculation on the fly when doing group by. You can leave the sum() in the query as it's an aggregate query, but try and reduce run time calculation as much as possible.
Sample :
case when b.AM IS null and b.ProgramName IS not null
then 'Unassigned'
else b.AM
end as AM
You can create a column named AM when creating table b.
Also those rtrim and ltrim. Please remove those and stick it in table creation time. :)
One suggestion is to increase the execution time of stored procedure.
cmd.CommandTimeout = 200 // in seconds.
You can also generate a report link and email it to user when the report was generated.
Other than that use CTE never use temp tables as they are more expensive.
Does cfquery becomes a prepared statement as long as there's 1 cfqueryparam? Or are there other conditions?
What happen when the ORDER BY clause or FROM clause is dynamic? Would every unique combination becomes a prepared statement?
And what happen when we're doing cfloop with INSERT, with every value cfqueryparam'ed, and invoke the cfquery with different number of iterations?
Any potential problems with too many prepared statements?
How does DB handle prepared statement? Will they be converted into something similar to store procedure?
Under what circumstances should we Not use prepared statement?
Thank you!
I can answer some parts of your question:
a query will become a preparedStatement as long as there is one <queryparam. I have in the past added a
where 1 = <cfqueryparam value="1" to queries which didn't have any dynamic parameters, in order to get them run as preparedStatements
Most DBs handle preparedStarements similarly to Stored Procedures, just held temporarily, rather than long-term, however the details are likely to be DB-specific.
Assuming you are using the drivers supplied with ColdFusion, if you turn on the 'Log Activity' checkbox in the advanced panel of the DataSource setup, then you'll get very detailed information about how CF is interacting with he DB and when it is creating a new preparedStatement and when it is re-using them. I'd recommend trying this out for yourself, as so many factors are involved (DB setup, Driver, CF version etc). If you do use the DB logging, re-start CF before running your test code, so you can see it creating the prepared statements, otherwise you'll just see it re-using statements by ID, without seeing what those statements are.
In addition, if you are asking about execution plans then there is more involved than just the number PreparedStatement's generated. It is a huge topic and very database dependent. I do not have a DBA's grasp on it, but I can answer a few of the questions about MS SQL.
What happen when the ORDER BY clause or FROM clause is dynamic? Would
every unique combination becomes a prepared statement?
The base sql is different. So you will end up with separate execution plans for each unique ORDER BY clause.
And what happen when we're doing cfloop with INSERT, with every value
cfqueryparam'ed, and invoke the cfquery with different number of
iterations?
MS SQL should reuse the same plan for all iterations because only the parameters change.
The sys.dm_exec_cached_plans view is very useful for seeing what plans are cached and how often they are reused.
SELECT p.usecounts, p.cacheobjtype, p.objtype, t.text
FROM sys.dm_exec_cached_plans p
CROSS APPLY sys.dm_exec_sql_text( p.plan_handle) t
ORDER BY p.usecounts DESC
To clear the cache first, use DBCC FLUSHPROCINDB. Obviously do not use it on a production server.
DECLARE #ID int
SET #ID = DB_ID(N'YourTestDatabaseName')
DBCC FLUSHPROCINDB( #ID )
This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
Why does the Execution Plan include a user-defined function call for a computed column that is persisted?
In SQL Server 2008 I'm running the SQL profiler on a long running query and can see that a persisted computed column is being repeatedly recalculated. I've noticed this before and anecdotally I'd say that this seems to occur on more complex queries and/or tables with at least a few thousand rows.
This recalculation is definitely the cause of the long execution as it speeds up dramatically if I comment out that one column from the returned results (The field is computed by running an XPath against an Xml field).
EDIT: Offending SQL has the following structure:
DECLARE #OrderBy nvarchar(50);
SELECT
A.[Id],
CASE
WHEN #OrderBy = 'Col1' THEN A.[ComputedCol1]
WHEN #OrderBy = 'Col2' THEN C.[ComputedCol2]
ELSE C.[ComputedCol3]
END AS [Order]
FROM
[Stuff] AS A
INNER JOIN
[StuffCode] AS SC
ON
A.[Code] = SC.[Code]
All columns are nvarchar(50) except for ComputedCol3 which is nvarchar(250).
The query optimizer always tries to pick the cheapest plan, but it may not make the right choice. By persisting a column you are putting it in the main table (in the clustered index or the heap) but in order to pull out these values, normal data access paths are still required.
This means that the engine may choose other indexes instead of the main table to satisfy the query, and it could choose to recalculate the computed column if it thinks doing so combined with its chosen I/O access pattern will cost less. In general, a fair amount of CPU is cheaper than a little I/O, but no internal analysis of the cost of the expression is done, so if your column calls an expensive UDF it may make the wrong decision.
Putting an index on the column could make a difference. Note that you don't have to make the column persisted to put an index on it. If after making an index, the engine is still making mistakes, check to see if you have proper statistics being collected and frequently updated on all the indexes on the table.
It would help us help you if you posted the structure of your table (just the important columns) and the definitions of any indexes, along with some ideas of what the execution plan looks like when things go badly.
One thing to consider is that it may actually be better to recompute the column in some cases, so make sure that it's really correct to force the engine to go get it before doing so.
I have a query that has been running every day for a little over 2 years now and has typically taken less than 30 seconds to complete. All of a sudden, yesterday, the query started taking 3+ hours to complete and was using 100% CPU the entire time.
The SQL is:
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER JOIN vwSomeData alpha ON beta.id = alpha.id
alpha.id is a BIGINT type and beta.id is an INT type. dbo.fnGetStuff() is a simple SELECT statement with 2 INNER JOINs on tables in the same DB, using a WHERE id = #id. The function returns approximately 11000 results.
The view vwSomeData is a simple SELECT statement with two INNER JOINs that returns about 590000 results.
Both the view and the function will complete in less than 10 seconds when executed by themselves. Selecting the results of the function into a temporary table first and then joining on that makes the query finish in < 10 seconds.
How do I troubleshoot what's going on? I don't see any locks in the activity manager.
Look at the query plan. My guess is that there is a table scan or more in the execution plan. This will cause huge amounts of I/O for the few record you get in the result.
You could use the SQL Server Profiler tool to monitor what queries are running on SQL Server. It doesn't show the locks, but it can for instance also give you hints on how to improve your query by suggesting indexes.
If you've got a reasonably recent version of SQL Server Management Studio, it has a Database Tuning Adviser as well, under Tools. It takes a trace from the Profiler and makes some, sometimes highly useful, suggestions. Makes sure there's not too many queries - it takes a long time to build advice.
I'm not an expert on it, but have had some luck with it in the past.
Do you need to use a function? Can you re-write the entire thing into a stored procedure in which you pass in the #ID as a parameter.
Even if your table has indexes because you pass the #ID as a variable to the WHERE clause potentially greatly increasing the amount of time for the query to run.
The reason the indexes may not be used is because the Query Analyzer does not know the value of the variables when it selects an access method to perform the query. Because this is a batch file, only one pass is made of the Transact-SQL code, preventing the Query Optimizer from knowing what it needs to know in order to select an access method that uses the indexes.
You might want to consider an INDEX query hint if you cannot re-write the SQL.
it might also be possible, since this just started happening, that the INDEXes have become fragmented and might need to be rebuilt.
I've had similar problems with joining functions that return large datasets. I had to do what you've already suggested. Put the results in a temp table and join on that.
Look at the estimated plan, this will probably shed some light. Typically when query cost gets orders of magnitude more expensive it is because a loop or merge join is being used where a hash join is more appropriate. If you see a loop or merge join in the estimated plan, look at the number of rows it expects to process - is it far smaller than the number of rows you know will actually be in play? You can also specify a hint to use a hash join and see if it performs much better. If so, try updating statistics and see if it goes back to a hash join without a hint.
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER HASH JOIN vwSomeData alpha ON beta.id = alpha.id
-- having no idea what type of schema is in place and just trying to throw out ideas:
Like others have said... use Profiler and find the source of pain... but I'm thinking it is the function on the other database. Since that function might be a source of pain, have you thought about a little denormalization or anything on [DifferentDatabase]. I think you'll find a bit more scalability in joining to a more flattened table with indexes than a costly function.
Run this command:
SET SHOWPLAN_ALL ON
Then run your query. It will display the execution plan, look for a "SCAN" on an index or a table. That is most likely what is happening to your query now. If that is the case, try to figure out why it is not using indexes now (refresh statistics, etc)