Estimated execution plan show needed indexes?

Estimated execution plan show needed indexes? - sql-server

Ok,
I don't know if I am going crazy or not, but didn't the Estimated exexution plan use to show you what Indexes you would need to improve your performance? I used to this at my old job, but now it seems I have to using the tuning advisor. I don't mind the tuning advisor, but the way I did it before was so simple!
Thanks

In both SSMS 2008 and in SSMS 2012 see this is working fine for both estimated and actual plans.
Here is a quick example to show that estimated and actual execution plans will both show missing indexes:
USE tempdb;
GO
CREATE TABLE dbo.smorg(blamp INT);
GO
INSERT dbo.smorg(blamp) SELECT n FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY c1.object_id)
FROM sys.all_objects AS c1, sys.all_objects AS c2
) AS x(n);
GO
Now highlight this and choose estimated execution plan, or turn on actual execution plan and hit Execute:
SELECT blamp FROM dbo.smorg WHERE blamp BETWEEN 100 AND 105;
You should see a missing index recommendation. And you will see it represented here:
SELECT *
FROM sys.dm_db_missing_index_details
WHERE [object_id] = OBJECT_ID('dbo.smorg');
You can read more about the DMV here:
http://msdn.microsoft.com/en-us/library/ms345434.aspx
Also you should investigate SQL Sentry Plan Explorer (disclaimer: I work for SQL Sentry). This is a free tool that shows the missing indexes for estimated plans (if they are in the XML provided by SQL Server), doesn't have bugs like SSMS (where it repeats the same recommendation across multiple batches, even batches that don't mention the same tables), and generates actual execution plans without pulling all of the results across the network to the client - it just discards them (so the network and data overhead don't factor into to plan analysis).

Related

Distinct and Group By - query performance

Query A
SELECT Distinct ord_no FROM Orders
ORDER BY ord_no
Query B
SELECT ord_no FROM Orders
GROUP BY ord_no
ORDER BY ord_no
In Orders tabel, ord_no is varchar type and has duplicate. Here it's a composite key with a identity column.
May I know which query is better?
How could we check the query performance using MS SQL Server 2008 R2
(express version)?

You can see the amount of time each query takes in milli seconds on the SQL profiler. From management studio, go to Tools --> Profiler and start a trace on your DB. The run your queries. You can see the duration they took. Mind you, you'll need to have considerable amount of data to see the difference.
You can use SQL Express Profiler if you are not on the full blown version of SQL.

Check the execution plans for both queries. It's very likely that they will be the same, especially with such a simple query (you'll probably see a stream aggregate operator, doing the grouping, in both cases).
If the execution plans are the same, then there is no (statistically significant) difference in performance between the two.
Having said that, use group by instead distinct whenever in doubt.

Performance with materialized (indexed) view

Consider this materialized view:
CREATE VIEW [vwPlaySequence] WITH SCHEMABINDING
AS
SELECT
p.SiteIDNumber,
dbo.ToUnsignedInt(p.SequenceNumber) AS PlayID,
p.SequenceNumber
FROM dbo.Play p
GO
CREATE UNIQUE CLUSTERED INDEX
PK_vwPlaySequence ON [vwPlaySequence]
(
[PlayID],
[SiteIDNumber],
[SequenceNumber]
)
GO
The base table has a clustered index on SequenceNumber.
The following query on the base table executes on 160 million rows in 4 seconds:
select SiteIDNumber, min(SequenceNumber), max(SequenceNumber) from Play
group by SiteIDNumber
Here is the execution plan:
And this is the same query on the view executes in 46 seconds:
select SiteIDNumber, min(SequenceNumber), max(SequenceNumber) from vwPlaySequence
group by SiteIDNumber
Its execution plan:
I'm not seeing what it is in these execution plans that would warrant such a drastic difference in run time. I've run both of these queries many times with the same results.

Both queries use the view. One is parallel one is not. You say that adding OPTION (MAXDOP 1) to both queries makes all differences disappear. This means that parallelism accounts for all of the differences.
There's no logical reason SQL Server has to pick a serial plan in one of the cases here. It is probably a bug or known limitation. I have encountered many limitations and strange behaviors with indexed view matching. In that sense I'm only mildly surprised.
Now that the difference is (kind of) explained, what to do about it?
You can try to force parallelism: OPTION (QUERYTRACEON 8649) --set parallelism cost to zero. This is an undocumented trace flag that is considered safe for production by some leading experts. I also do consider it to be safe.
You can try to select from the view using WITH (NOEXPAND). This bypasses view matching and hopefully allows SQL Server to find a parallel plan.
Prefer option (2).

Why is performance increased when moving from a derived table to a temp table solution?

I'm reading "Dissecting SQL Server Execution Plans" from Grant Fritchey and it's helping me a lot to see why certain queries are slow.
However, I am stumped with this case where a simple rewrite performs quite a lot faster.
This is my first attempt and it takes 21 secs. It uses a derived table:
-- 21 secs
SELECT *
FROM Table1 AS o JOIN(
SELECT col1
FROM Table1
GROUP BY col1
HAVING COUNT( * ) > 1
) AS i ON ON i.col1= o.col1
My second attempt is 3 times faster and simply moves out the derived table to a temp table. Now it's 3 times faster:
-- 7 secs
SELECT col1
INTO #doubles
FROM Table1
GROUP BY col1
HAVING COUNT( * ) > 1
SELECT *
FROM Table1 AS o JOIN #doubles AS i ON i.col1= o.col1
My main interest is into why moving from a derived table to a temp table improves performance so much, not on how to make it even faster.
I would be grateful if someone could show me how I can diagnose this issue using the (graphical) execution plan.
Xml Execution plan:
https://www.sugarsync.com/pf/D6486369_1701716_16980
Edit 1
When I created statistics on the 2 columns that were specified in the group by and the optimizer started doing "the right thing", after giving up the procedure cache (don't forget that if you are a beginner!). I simplified the query in the question which was not a good simplification in retrospect. The attached sqlplan shows the 2 columns but this was not obvious.
The estimates are now a lot more accurate as is the performance which is up to par with the temp table solution. As you know the optimizer creates stats on single columns automatically (if not disabled) but 2 column statistics have to be create by the DBA.
A (non clustered) index on these 2 columns made the query perform the same but in this case a stat is just as good and it doesn't suffer the downside of index maintenance.
I'm going forward with the 2 column stat and see how it performs. #Grant Do you know if the stats on an index are more reliable than that of a column stat?
Edit 2
I always follow up once a problem is solved on how a similar problem can be diagnosed faster in the future.
The problem here was that the estimated row couns were way of. The graphical execution plans shows these when you hover over a row but that's about it.
Some tools that can help:
SET STATISTICS PROFILE ON
I heard this one will become obsolete and be replaced by its XML variant but I still like the output which is in grid format.
Here the big diff between columns "Rows" and "EstimateRows" would have shown the problem
External Tool: SQL Sentry Plan Explorer
http://www.sqlsentry.net/
This is a nice tool especially if you are a beginner. It highlights problems
External Tool: SSMS Tools Pack
http://www.ssmstoolspack.com/
A more general purpose tool but again directs the user to potential problems
Kind Regards, Tom

Looking at the values for the first execution plan, it looks like it's statistics. You have an estimated number of rows at 800 and an actual of 1.2 million. I think you'll find that updating the statistics will change the way the first query's plan is generated.

SQL Server ORDER BY Performance Aberration

SQL Server 2008 running on Windows Server Enterprise(?) Edition 2008
I have a query joining against twenty some-odd tables (mostly LEFT OUTER JOINs). The full dataset returned by an unfiltered query returns less than 1,000 rows in less than 1s. When I apply a WHERE clause to filter the query it returns less than 300 rows in less than 1s.
When I apply an ORDER BY clause to the query it returns in 90s.
I examined the results of the query and notice a number of NULL results returned in the column that is being used to sort. I modified the query to COALESCE a NULL value to a valid search value without any change to the performance of the query.
I then did a
SELECT * FROM
(
my query goes here
) qry
ORDER BY myOrderByHere
And that produced the same results.
When I SELECT ... INTO #tempTable (without the ORDER BY) and then SELECT FROM the #tempTable with the order by the query returns in less than 1s.
What is really strange at this point is that the SELECT... INTO will also take 90s even without the ORDER BY.
The Execution Plan says that the SORT is taking 98% of the execution time when included with the main query. If I am doing the INSERT INTO the the explain plan says that the actual insert into the temp table takes 99% of the execution time.
And to take out server issues I have run the same tests on two different instances of SQL Server 2008 with nearly identical results.
Many thanks!
rjsjr

Sounds like something strange is going on with your tempdb. Inserting 1000 rows in a temporary table should be fast, whether it's an implicit spool for sorting, or an explicit select into.
Check the size of your tempdb, the health of the hard disk it's on, and it's recovery model (should be simple, not full or bulk logged.)

A sort operation is usually an expensive step in the query. So, it's not surprising that the addition of the sort adds time. You may be seeing similar results when you incorporate a temp table in your steps. The sort operation in your original query may use tempdb to help do the sort, and that can be the time-consuming step in each query you compare.
If you want to learn more about each query you're running, you can review query plan outputs.

how can I test performance in Sql Server Mgmt Studio without outputting data?

Using SQL Server Management Studio.
How can I test the performance of a large select (say 600k rows) without the results window impacting my test? All things being equal it doesn't really matter, since the two queries will both be outputting to the same place. But I'd like to speed up my testing cycles and I'm thinking that the output settings of SQL Server Management Studio are getting in my way. Output to text is what I'm using currently, but I'm hoping for a better alternative.
I think this is impacting my numbers because the database is on my local box.
Edit: Had a question about doing WHERE 1=0 here (thinking that the join would happen but no output), but I tested it and it didn't work -- not a valid indicator of query performance.

You could do SET ROWCOUNT 1 before your query. I'm not sure it's exactly what you want but it will avoid having to wait for lots of data to be returned and therefore give you accurate calculation costs.
However, if you add Client Statistics to your query, one of the numbers is Wait time on server replies which will give you the server calculation time not including the time it takes to transfer the data over the network.

You can SET STATISTICS TIME ON to get a measurement of the time on server. And you can use the Query/Include Client Statistics (Shift+Alt+S) on SSMS to get detail information about the client time usage. Note that SQL queries don't run and then return the result to the client when finished, but instead they run as they return results and even suspend execution if the communication channel is full.
The only context under which a query completely ignores sending the result packets back to the client is activation. But then the time to return the output to the client should be also considered when you measure your performance. Are you sure your own client will be any faster than SSMS?

SET ROWCOUNT 1 will stop processing after the first row is returned which means unless the plan happens to have a blocking operator the results will be useless.
Taking a trivial example
SELECT * FROM TableX
The cost of this query in practice will heavily depend on the number of rows in TableX.
Using SET ROWCOUNT 1 won't show any of that. Irrespective of whether TableX has 1 row or 1 billion rows it will stop executing after the first row is returned.
I often assign the SELECT results to variables to be able to look at things like logical reads without being slowed down by SSMS displaying the results.
SET STATISTICS IO ON
DECLARE #name nvarchar(35),
#type nchar(3)
SELECT #name = name,
#type = type
FROM master..spt_values
There is a related Connect Item request Provide "Discard results at server" option in SSMS and/or TSQL

The best thing you can do is to check the Query Execution Plan (press Ctrl+L) for the actual query. That will give you the best guesstimate for performance available.

I'd think that the where clause of WHERE 1=0 is definitely happening on the SQL Server side, and not Management Studio. No results would be returned.
Is you DB engine on the same machine that you're running the Mgmt Studio on?
You could :
Output to Text or
Output to File.
Close the Query Results pane.
That'd just move the cycles spent on drawing the grid in Mgmt Studio. Perhaps the Resuls to Text would be more performant on the whole. Hiding the pane would save the cycles on Mgmt Studio on having to draw the data. It's still being returned to the Mgmt Studio, so it really isn't saving a lot of cycles.

How can you test performance of your query if you don't output the results? Speeding up the testing is pointless if the testing doesn't tell you anything about how the query is going to perform. Do you really want to find out this dog of a query takes ten minutes to return data after you push it to prod?
And of course its going to take some time to return 600,000 records. It will in your user interface as well, it will probably take longer than in your query window because the info has to go across the network.

There is a lot of more correct answers of answers but I assume real question here is the one I just asked myself when I stumbled upon this question:
I have a query A and a query B on the same test data. Which is faster? And I want to check quick and dirty. For me the answer is - temp tables (overhead of creating temp table here is easy to ignore). This is to be done on perf/testing/dev server only!
Query A:
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS (to clear statistics
SELECT * INTO #temp1 FROM ...
Query B
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
SELECT * INTO #temp2 FROM ...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Estimated execution plan show needed indexes? - sql-server

Related

Distinct and Group By - query performance

Performance with materialized (indexed) view

Why is performance increased when moving from a derived table to a temp table solution?

SQL Server ORDER BY Performance Aberration

how can I test performance in Sql Server Mgmt Studio without outputting data?

Categories

Resources