Query plans are stored in the plan cache for both views and ordinary
SQL
from here
ok.
Once and for all : How does it help me ?
Even if I have the Query plans in the cache for a query : Once I run the query he WILL scan the whole table/s + aggregates+.... .
and if i run it tomorrow - It will scan the whole table/s + aggregates again ....
there's can't be a situation like this : " AH !!! I have Data in cache , so Ill take it from there.......( because maybe the table has changed...)
so , where is the true benefit ?
I seems to be missing something.
thank you.
Suppose we have a query such as
SELECT *
FROM
A
INNER JOIN B ON -- omitted
INNER JOIN C ON -- omitted
-- omitted
INNER JOIN Q ON -- omitted
with however many tables that is. Obviously, the order these joins are performed in will affect performance. Also, deciding the best order, given the table statistics, also takes an amount of time.
By caching the query plan, we can pay the cost of deciding the best order just once - every subsequent time the query is run, we already know to first take K, join it to E, then to H, and so on.
Of course, this means that a significant change in the data statistics invalidates our plan, but caching anything always involves a trade-off.
A resource you may find useful for learning more about the hows and whys of query planning is SQL Coach - start with THE Analogy.
The answer is that the query plan is cached to prevent the cost of compiling the query plan every time. The second time you run the query (or another that can use the same plan) it doesn't have to pay to run the compilation process all over again, it just pulls the plan from the cache.
Put simply, an execution plan is an explanation of how the query will be done, not the actual data involved in the query (so an execution plan can be applied over and over again as you re-run a query).
An analogy would be to say that an execution plan is similar to a recipe - it is the method of getting the data/making the meal, not the data/meal itself.
The improvement is that it takes time for the DB engine to work out the execution plan for a query, so if its cached you don't need that overhead next time you run the same query.
When you submit a query to SQL, it goes through some steps to display the result. The main ones are parsing, algebrizer and query optimizer.
The query optimizer is the responsible to build an execution plan, or select one from the cache and as I understand the process of building a plan is very expensive, so its better if you can reuse one.
The mains point is that the exec plan doesn't contain the data itself, only a way of retrieving it from the BD. So once the plan is "defined", it gets passed to the storage engine and used to retrieve the data.
Related
I have a query that takes about 1.5 minutes to run if it runs as an insert into an empty temp table, but selecting the data doesn't complete even after 20 minutes. Why would a select take so much longer than an insert? Will SQL Server draw different plans based on that?
The estimated execution plans are the same for both, the actual execution plan is different than the estimated.
Update: It seems that the INSERT version of the query is using
parallelism, while the select version is not. My question still
remains as to why that is?
Once a single query plan reaches a certain level of complexity, I often have a better experience sequencing a collection of smaller queries together than one really big nasty query. This enables the query optimizer to evaluate a smaller request.
Locking duration in a highly transnational database can also be a significant contributing factor.
It sounds like you're on to something with the parallelism. There are several operators within SQL that will prevent parallelism.
Usual blather... query takes too long to run... blah blah. Long question. blah.
Obviously, I am looking at different ways of rewriting the query; but that is not what this post is about.
To resolve a "spill to tempdb" warning in a query, I have already
rebuilt all of the indexes in the database
updated all of the statistics on the tables and indexes
This fixed the "spill to tempdb" warning and improved the query performance.
Since rebuilding indexes and statistics resulted in a huge performance gain for one query (with out having to rewrite it), this got me thinking about how to improve the performance of other queries without rewriting them.
I have a nice big query that joins about 20 tables, does lots of fancy stuff I am not posting here, but takes about 6900ms to run.
Looking at the actual execution plan, I see 4 steps that have a total cost of 79%; so "a-hah" that is where the performance problem is. 3 steps are "clustered index seek" on PK_Job and the 4th step is an "Index lazy spool".
execution plan slow query
So, I break out those elements into a standalone query to investigate further... I get the "same" 4 steps in the execution plan, with a cost of 97%, only the query time is blazing fast 34ms. ... WTF? where did the performance problem disappear to?
execution plan fast query
I expected the additional tables to increase the query time; but I am not expecting the execution time to query this one Job table to go from 30ms to 4500ms.
-- this takes 34ms
select *
from equip e
left join job jf on (jf.jobid = e.jobidf)
left join job jd on (jd.jobid = e.jobidd)
left join job jr on (jr.jobid = e.jobidd)
-- this takes 6900ms
select *
from equip e
left join job jf on (jf.jobid = e.jobidf)
left join job jd on (jd.jobid = e.jobidd)
left join job jr on (jr.jobid = e.jobidd)
-- add another 20 tables in here..
Question 1: what should I look at in the two execution plans to identify why the execution time (of the clustered index seek) on this table goes from 30ms to 4500ms?
So, thinking this might have something to do with the statistics I review the index statistics on the PK_Job = JobID (which is an Int column) the histogram ranges look useless... all the "current" records are lumped together in one range (row 21 in the image). Standard problem with a PK that increments, new data is always in the last range; that is 99.999% of the JobID values that are referenced are in the one histogram range. I tried adding a filtered statistic, but that had no impact on the actual execution plan.
output from DBCC SHOW_STAT for PK_Job
Question 2: are the above PK_Job statistics a contributing factor to the complicated query being slow? That is, would "fixing" the statistics help with the complicated query? if so, what could that fix look like?
Again: I know, rewrite the query. Post more of the code (all 1500 lines of it that no one will find of any use). blah, blah.
What I would like are tips on what to look at in order to answer Q1 and Q2.
Thanks in advance!
Question 3: why would a simple IIF add 100ms to a query? the "compute scalar" nodes all show a cost of 0%, but the IIF doubles the execution time of the query.
adding this to select doubles execution time from 90ms to 180ms; Case statements are just as bad too.
IFF(X.Okay = 1, '', 'N') As OkayDesc
Next observation: Actual execution plan shows query cost relative to batch of 98%; but STATISTICS TIME shows cpu time of 141 ms; however batch cpu time is 3640 ms.
Question 4: why doesn't the query cost % (relative to batch) match up with statement cpu time?
The SQL Engine is pretty smart in optimizing badly written queries in most of the cases. But, when a query is too complex, sometimes it cannot use these optimizations and even perform bad.
So, you are asking:
I break out those elements into a standalone query to investigate
further... I get the "same" 4 steps in the execution plan, with a cost
of 97%, only the query time is blazing fast 34ms? where did
the performance problem disappear to?
The answer is pretty simple. Breaking the queries and materializing the data in #table or #table helps the engine to understand better with what amount of that it is working and built a better plan.
Brent Ozar wrote about this yesterday giving an example how bad a big query can be.
If you want more details about how to optimize your query via rewriting, you need to provide more details, but in my practice, in most of the cases simplifying the query and materializing the data in #temp tables (as we can use parallel operations using them) is giving good results.
I have two derived tables system_type_a and system_type_b which both use the table system along with type_a and type_b respectively. type_a has ~15k records and type_b has ~5k records. I am doing a left join on both tables with system. However system_type_b is taking much higher execution time as compared to system_type_b. I tried viewing the execution plan for both queries as suggested in a stack exchange response to a similar query however I am not able to make much sense of it.
The selectivity of joins might force different execution plans.
Maybe the statistics are old or not updated.
Also you can check fragmentation of the bad performing table. Maybe it requires more physical reads than the other one.
When I run a query which returns millions of rows from within Sql Management tools it looks like the query executes instantly. There is virtually no execution time as far as I can tell. What makes the query take time to complete is returning all the rows.
This got me thinking I've done a good job! But not so fast... As I look at the query in the profiler tool, it states that the query used 7600 CPU. Duration was 15000.
I'm not sure I know how to interpret these stats.
On one hand, the query seems to run fast but the profiler report makes me think otherwise. How come the query is executed instantly in Mgmt Tools? There obviously should be some kind of delayed execution as far as I can tell: at least 7600 ms. I have to wait longer than both the cpu and the duration stats when I run the query in mgmt tools for it to complete the query.
it looks like the query executes instantly
It might be that the query plan allows to start returning the rows quickly.
For example, if you do SELECT * FROM a_large_table you will see some rows immediately, but retrieval of the whole resultset will take some time. What is the actual execution time reported by Mgmt Studio (shown in the status bar after the query is complete)?
If you want to test the query performance without retrieving data to the client, you can do SELECT INTO #temp_table. This would require some additional I/O, but would still give you a rather good estimate of the execution time.
UPD.
You could also run something like SELECT COUNT(*) FROM (<your query here>) or SELECT SUM(<some field>) FROM (<your query here>) - with some luck, it will make the server execute the query and aggregate the result, basically doing the same work plus a little extra. But it is very easy to skew the results this way - query optimizer is smart, and you need to be very careful to be sure you are measuring what you want to measure (because measuring a query with a different execution plan makes no sense at all).
I suggest you to think again on what you want to measure and why. In any real-life scenario you are not interested in "pure" query duration - because you never want to discard the query result (the result is why you need this query in the first place, right?). So you either need to return the result to the client, or store it somewhere, or join it with another table and so on - and usually you want to measure query execution including the time used for processing its result.
One final notice. If you hope you can somehow force the server to execute this query in 1 second because you think that server does nothing for other 13 seconds, you are wrong.. As they say, SELECT ain't broken.
What might help is query optimization - and for a single query a profiler won't help you much with it. Analyze the query plan, tune your table structure, try to rewrite the query, post another question on SO if in trouble.
I've been hearing a lot lately that I ought to take a look at the execution plan of my SQL to make a judgment on how well it will perform. However, I'm not really sure where to begin with this feature or what exactly it means.
I'm looking for either a good explanation of what the execution plan does, what its limitations are, and how I can utilize it or direction to a resource that does.
It describes actual algorithms which the server uses to retrieve your data.
An SQL query like this:
SELECT *
FROM mytable1
JOIN mytable2
ON …
GROUP BY
…
ORDER BY
…
, describes what should be done but not how it should be done.
The execution plan shows how: which indexes are used, which join methods are chosen (nested loops or hash join or merge join), how the results are grouped (using sorting or hashing), how they are ordered etc.
Unfortunately, even modern SQL engines cannot automatically find the optimal plans for more or less complex queries, it still takes an SQL developer to reformulate the queries so that they are performant (even they do what the original query does).
A classical example would be these too queries:
SELECT (
SELECT COUNT(*)
FROM mytable mi
WHERE mi.id <= mo.id
)
FROM mytable mo
ORDER BY
id
and
SELECT RANK() OVER (ORDER BY id)
FROM mytable
, which do the same and in theory should be executed using the same algorithms.
However, no actual engine will optimize the former query to implement the same algorithms, i. e. store a counter in a variable and increment it.
It will do what it's told to do: count the rows over and over and over again.
To optimize the queries you need to actually see what's happening behind the scenes, and that's what the execution plans show you.
You may want to read this article in my blog:
Double-thinking in SQL
Here and Here are some article check it out. Execution plans lets you identify the area which is time consuming and therefore allows you to improve your query.
An execution plan shows exactly how SQL Server processes a query
it is produced as part of the query optimisation process that SQL Server does. It is not something that you directly create.
it will show what indexes it has decided are best to be used, and basically is a plan for how SQL server processes a query
the query optimiser will take a query, analyse it and potentially come up with a number of different execution plans. It's a cost-based optimisation process, and it will choose the one that it feels is the best.
once an execution plan has been generated, it will go into the plan cache so that subsequent calls for that same query can reuse the same plan again to save having to redo the work to come up with a plan.
execution plans automatically get dropped from the cache, depending on their value (low value plans get removed before high value plans do in order to provide maximum performance gain)
execution plans help you spot performance issues such as where indexes are missing
A way to ease into this, is simply by using "Ctrl L" (Query | Display Estimated Execution Plan) for some of your queries, in SQL Management Studio.
This will result in showing a graphic view of Execution Plan, which, at first are easier to "decode" than the text version thereof.
Query plans in a tiny nutshell:
Essentially the query plan show the way SQL Server intends to use in resolving a query.
There are indeed many options, even with simple queries.
For example when dealing with a JOIN, one needs to decide whether to loop through the [filtered] rows of "table A" and to lookup the rows of "table B", or to loop through "table B" first instead (this is a simplified example, as there are many other tricks which can be used in dealing with JOINs). Typically, SQL will estimate the number of [filtered] rows which will be produced by either table and pick the one which the smallest count for the outer loop (as this will reduce the number of lookups in the other table)
Another example, is to decide which indexes to use (or not to use).
There are many online resources as well as books which describe the query plans in more detail, the difficulty is that SQL performance optimization is a very broad and complex problem, and many such resources tend to go into too much detail for the novice; One first needs to understand the fundamental principles and structures which underlie SQL Server (the way indexes work, the way the data is stored, the difference between clustered indexes and heaps...) before diving into many of the [important] details of query optimization. It is a bit like baseball: first you need to know the rules before understanding all the subtle [and important] concepts related to the game strategy.
See this related SO Question for additional pointers.
Here's a great resource to help you understand them
http://downloads.red-gate.com/ebooks/HighPerformanceSQL_ebook.zip
This is from red-gate which is a company that makes great SQL server tools, it's free and it's well worth the time to download and read.
it is a very serious part of knowledge. And I highly to recommend special training courses about that. As for me after spent week on courses I boosted performance of queries about 1000 times (nostalgia)
The Execution Plan shows you how the database is fetching, sorting and filtering the data required for your query.
For example:
SELECT
*
FROM
TableA
INNER JOIN
TableB
ON
TableA.Id = TableB.TableAId
WHERE
TableB.TypeId = 2
ORDER BY
TableB.Date ASC
Would result in an execution plan showing the database getting records from TableA and TableB, matching them to satisfy the JOIN, filtering to satisfy the WHERE and sorting to satisfy the ORDER BY.
From this, you can work out what is slowing down the query, whether it would be beneficial to review your indexes or if you can speed things up in another way.