So I'm (still) going through some slow legacy sql views used to do calculate some averages and standarddeviations on a (sometimes) large set of data. What I end up with are views joining views joining views etc.
So I though I would review the execution plan for my query. And it immediately suggested a missing index, which I then implemented. But it's still unbearably slow (so slow it times out the VB6 app querying it for data ;) )
So upon studying the execution plan further, I see that what costs the most (about 8% each in my case) are "Paralellism" cases. Mostly "Distribute Streams" and "Repartition Streams". What are these?
Distribute Streams and Repartion Streams are operations that occur when the SQL optimizer chooses to use Parallel Query Processing. If you suspect that this is causing an issue with your query, you can force SQL Server to only use one CPU with the MAXDOP query hint, as illustrated below.
select *
from sys.tables
option (maxdop 1)
Related
I have a query that takes about 1.5 minutes to run if it runs as an insert into an empty temp table, but selecting the data doesn't complete even after 20 minutes. Why would a select take so much longer than an insert? Will SQL Server draw different plans based on that?
The estimated execution plans are the same for both, the actual execution plan is different than the estimated.
Update: It seems that the INSERT version of the query is using
parallelism, while the select version is not. My question still
remains as to why that is?
Once a single query plan reaches a certain level of complexity, I often have a better experience sequencing a collection of smaller queries together than one really big nasty query. This enables the query optimizer to evaluate a smaller request.
Locking duration in a highly transnational database can also be a significant contributing factor.
It sounds like you're on to something with the parallelism. There are several operators within SQL that will prevent parallelism.
We have a query in our system that has been a problem in the amount of logical reads it is using. The query is run often enough (a few times a day), but it is report in nature (i.e. gathering data, it is not transactional).
After having a couple of people look at it we are mulling over a few different options.
Using OPTION (FORCE ORDER) and a few MERGE JOIN hints to get the optimizer to process the data more efficiently (at least on the data that has been tested).
Using temp tables to break up the query so the optimizer isn't dealing with a very large query which is allowing it process it more efficiently.
We do not really have the option of doing a major schema change or anything, tuning the query is kind of the rallying point for this issue.
The query hints option is performing a little better than the other option, but both options are acceptable in terms of performance at this point.
So the question is, which would you prefer? The query hints are viewed as slightly dangerous because it we are overriding the optimizer etc. The temp table solution needs to write out to the tempdb etc.
In the past we have been able to see large performance gains using temp tables on our larger reporting queries but that has generally been for queries that are run less frequently than this query.
if you have exhausted optimizing via indexes and removed non-SARGABLE sql then I recommend going for the temp tables option:
temp tables provide repeatable performance, provided they do not put excessive pressure on the tempdb in terms of size increase and performance - you will need to monitor those
sql hints may stop being effective because of other table/index changes in the future
remember to clean up temp tables when you are finished.
I've got a SQL 2005 DB running under a virtual environment.
To simplify things, let's say I have two SQL SELECT Queries. They both do the exact same thing. But I'm trying to analyze them for performance purposes.
Generally, I'd fire up a local DB, load up some data and using timing to compare one variant to other variants.
But in this case, since the DB is large and it's a testbox, the client has placed it on a host that's serving other VM's as well.
The DB is too large to pull down locally, so that's out (at least for now).
But my main issue is that when I run queries against the server, the timing is all over the place. I can run the +exact+ same query 4 times and get timings of 7secs, 8 minutes, 3:45min and 15min.
My first thought was use SET STATISTICS IO ON.
But, that yields basically read and write stats on the tables being queries, which, depending on the variations in the queries (temp tables, vs views, vs joins, etc) can't really be accurately compared, except in aggregate.
I then though of SET STATISTICS TIME ON, and just using the CPU time, but that seems to discount all the IO, which also doesn't make for a good baseline.
My question is is there any other statistic or performance analysis technique that could be useful in a situation like this?
The STATISTICS IO information will still be useful. You may see significantly different numbers of reads, writes and scans that will make it obvious which query is better.
You can also view Execution Plan information for each query. You can select Query -> Display Estimated Execution Plan to see a graphical presentation of the SQL Server estimate to run the query. You can also use the Query -> Include Actual Execution Plan to show the actual plan used.
And, you can also use SET SHOWPLAN_TEXT, SET SHOWPLAN_ALL or SET SHOWPLAN_XML to include the execution plan to view a textual display of the plan.
When viewing the results of the execution plan, you can look at the estimated cost value and compare the values for each query. The estimated cost is a relative value that can be used to compare the cost of each option.
Setup
Cost of Threshold for Parallelism : 5
Max Degree of Parallelism : 4
Number of Processors : 8
SQL Server 2008 10.0.2.2757
I have a query with many joins, many records.
The design is a star. ( Central table with fks to the reference tables )
The central table is partitioned on the relevant date column.
The partition schema is split by days
The data is very well split across the partition schema - as judged by comparing the sizes of the files in the filegroups assigned to the partition schema
Queries involved have the predicate set over the partitioned column. such as ( cs.dte >= #min_date and cs.dte < #max_date )
The values of the date parameters are a day apart # midnight so, 2010-02-01, 2010-02-02
The estimated query plan shows no parallelism
a) This question is in regards to Sql Server 2008 Database Engine. When a query in the OLTP engine is running, I would like to see / have the sort of insight one gets when profiling an SSAS Query using Progress End event - where one sees something like "Done reading PartititionXYZ".
b) if the estimated query plan or the actual query plan shows no parallel processing does that mean that all partitions will be / were checked / read? * What I was trying to say here was - just b/c I don't see parallelism in a query plan, that doesn't guarantee the query isn't hitting multiple partitions - right? Or - is there a solid relationship between parallelism and # partitions accessed?
c) suggestions? Is there more information that I need to provide?
d) how can I tell if a query is processing in parallel w/o looking # the actual query plan? * I'm really only interested in this if it is helpful in pinning down what partitions are being used.
Added Nov 10
Try this:
Create querys that should hit 1, 3, and all your partitions
Open an SSMS query window, and run SET SHOWPLAN_XML ON
Run each query one by one in that window
Each run will kick out a chunk of XML
Compare these XML results (I use a text diff tool, “CompareIt”, but any similar tool would do)
You should see that the execution plans are significantly different. In my “3” and “All” querys, there’s a chunk of text tagged as “ConstantScan” that has an entry for (respectively) 3 and All partitions in the table, and that section is not present for the “1 partition” query. I use this to infer that yes indeed, SQL doing what it says it will do, to wit: only read as much of the table as it believes it needs to in order to resovle the query.
Got a pretty good answer here: http://www.sqlservercentral.com/Forums/Topic1064946-391-1.aspx#bm1065048
a) I am not aware of any way to determine how a query has progressed while the query is still running. Maybe something finicky with the latching and locking system views, but I doubt it. (I am, alas, not familiar enough with SSAS to draw parallels between the two.)
b) SQL will probably use parallelism when working with multiple partitions within a single table, in which case you will see parallel processing "tokens" in your query plan. However, if for whatever reason parallelism is not invoked yet multiple partitions must be read, they will be read without the use of parallelism.
d) Another thing that perhaps cannot be done. Under very controlled cirsumstances, you could use System Monitor (Perfmon) to track CPU usage or perhaps disk reads during the execution of they query. This won't help if the server is performing other work, or the data is resident in memory (the buffer cache), and so may be of limited use.
c) What is it you are actually trying to figure out? Which partitions (if any) are being accessed by users over a period of time? Is SQL generating a "smart" query plan? Without details of the data, structure, and query, it's hard to come up with advice.
I've been hearing a lot lately that I ought to take a look at the execution plan of my SQL to make a judgment on how well it will perform. However, I'm not really sure where to begin with this feature or what exactly it means.
I'm looking for either a good explanation of what the execution plan does, what its limitations are, and how I can utilize it or direction to a resource that does.
It describes actual algorithms which the server uses to retrieve your data.
An SQL query like this:
SELECT *
FROM mytable1
JOIN mytable2
ON …
GROUP BY
…
ORDER BY
…
, describes what should be done but not how it should be done.
The execution plan shows how: which indexes are used, which join methods are chosen (nested loops or hash join or merge join), how the results are grouped (using sorting or hashing), how they are ordered etc.
Unfortunately, even modern SQL engines cannot automatically find the optimal plans for more or less complex queries, it still takes an SQL developer to reformulate the queries so that they are performant (even they do what the original query does).
A classical example would be these too queries:
SELECT (
SELECT COUNT(*)
FROM mytable mi
WHERE mi.id <= mo.id
)
FROM mytable mo
ORDER BY
id
and
SELECT RANK() OVER (ORDER BY id)
FROM mytable
, which do the same and in theory should be executed using the same algorithms.
However, no actual engine will optimize the former query to implement the same algorithms, i. e. store a counter in a variable and increment it.
It will do what it's told to do: count the rows over and over and over again.
To optimize the queries you need to actually see what's happening behind the scenes, and that's what the execution plans show you.
You may want to read this article in my blog:
Double-thinking in SQL
Here and Here are some article check it out. Execution plans lets you identify the area which is time consuming and therefore allows you to improve your query.
An execution plan shows exactly how SQL Server processes a query
it is produced as part of the query optimisation process that SQL Server does. It is not something that you directly create.
it will show what indexes it has decided are best to be used, and basically is a plan for how SQL server processes a query
the query optimiser will take a query, analyse it and potentially come up with a number of different execution plans. It's a cost-based optimisation process, and it will choose the one that it feels is the best.
once an execution plan has been generated, it will go into the plan cache so that subsequent calls for that same query can reuse the same plan again to save having to redo the work to come up with a plan.
execution plans automatically get dropped from the cache, depending on their value (low value plans get removed before high value plans do in order to provide maximum performance gain)
execution plans help you spot performance issues such as where indexes are missing
A way to ease into this, is simply by using "Ctrl L" (Query | Display Estimated Execution Plan) for some of your queries, in SQL Management Studio.
This will result in showing a graphic view of Execution Plan, which, at first are easier to "decode" than the text version thereof.
Query plans in a tiny nutshell:
Essentially the query plan show the way SQL Server intends to use in resolving a query.
There are indeed many options, even with simple queries.
For example when dealing with a JOIN, one needs to decide whether to loop through the [filtered] rows of "table A" and to lookup the rows of "table B", or to loop through "table B" first instead (this is a simplified example, as there are many other tricks which can be used in dealing with JOINs). Typically, SQL will estimate the number of [filtered] rows which will be produced by either table and pick the one which the smallest count for the outer loop (as this will reduce the number of lookups in the other table)
Another example, is to decide which indexes to use (or not to use).
There are many online resources as well as books which describe the query plans in more detail, the difficulty is that SQL performance optimization is a very broad and complex problem, and many such resources tend to go into too much detail for the novice; One first needs to understand the fundamental principles and structures which underlie SQL Server (the way indexes work, the way the data is stored, the difference between clustered indexes and heaps...) before diving into many of the [important] details of query optimization. It is a bit like baseball: first you need to know the rules before understanding all the subtle [and important] concepts related to the game strategy.
See this related SO Question for additional pointers.
Here's a great resource to help you understand them
http://downloads.red-gate.com/ebooks/HighPerformanceSQL_ebook.zip
This is from red-gate which is a company that makes great SQL server tools, it's free and it's well worth the time to download and read.
it is a very serious part of knowledge. And I highly to recommend special training courses about that. As for me after spent week on courses I boosted performance of queries about 1000 times (nostalgia)
The Execution Plan shows you how the database is fetching, sorting and filtering the data required for your query.
For example:
SELECT
*
FROM
TableA
INNER JOIN
TableB
ON
TableA.Id = TableB.TableAId
WHERE
TableB.TypeId = 2
ORDER BY
TableB.Date ASC
Would result in an execution plan showing the database getting records from TableA and TableB, matching them to satisfy the JOIN, filtering to satisfy the WHERE and sorting to satisfy the ORDER BY.
From this, you can work out what is slowing down the query, whether it would be beneficial to review your indexes or if you can speed things up in another way.