How to compare sql server query performance

How to compare sql server query performance - sql-server

I have two ways to get the result from sql server
first way
select * from my_view where key='key'
second way
select * from my_function('key')
the two ways do the same thing.
actually the size of the database is low so the execution time is too low ,and i can't tell witch one take less time, but I m not sure that they still be the same if the data base come to grow
So my question is: Is there any way or free tool to compare the exact execution time of the two queries?

key='key' will be always faster or the same. Never use function on column name (if applicable, use function on the value in the condition eg.: key=function('key')). If you do, SQL Server is not able to use any optimizations on the query (eg. indexes will be ignored).
To ensure query will keep its performance level as your table grows, you will have to set an index on the column.
You can deduce performance from Execution Plan in SQL Server Management Studio (when you execute the query with Include Actual Execution Plan enabled - Ctrl+M). You can also enable Include Statistics (Ctr+Alt+S) to see some performance info.

If you run multiple queries in a batch (one query window) and enable "Include Actual Execution Plan", it'll give each query a "relative cost to the batch".
So if you had two in your batch, if one is < 50% then it's faster than the other.

Related

SQL Server different time duration between stmtcompleted and direct query

I have then following tracert about one query in my application:
As you can see, the query reads all registers in the table and the result is a big impact in time duration.
But when I try to execute the query directly the result is another... What is wrong?

You executed ANOTHER query from SSMS.
The query shown in profiler is part of stored procedure, and has 8 parameters.
What you've executed is a query with constants that has another execution plan as all the constants are known and estimation was done in base of these known values.
When your sp's statement was executed the plan was created for god-knows-what sniffed parameters and this plan is different from what you have in SSMS.
From the thikness of arrows in SSMS it's clear that you query does not do 7.954.449 reads.
If you want to see you actual execution plan in profiler you should select corresponding event (Showplan XML Statistics Profile ).

Yes, There are two different queries. The Axapta uses placeholders in common. Your query uses literal constants.
The forceLiterals hint in the query make the Axapta query similar to your SSMS sample. Default Axapta hint is forcePlaceholders.
Main goal of placeholders is optimize a massive stream of similar queries with different constants from multiple clients. Mainly due to the use of a query plans cache.
see also the injection warning for forceLiterals:
https://msdn.microsoft.com/en-us/library/aa861766.aspx

Does SQL Server always physically build the full result set?

Imagine I have a big table with 20 columns and billion lines of data. Then I run a simple query like:
select [First Name], [Last Name]
from Audience;
After that I read the result set sequentially. Will SQL Server physically create all records (i.e. billion records) on the server side in the result set before I will start reading it? Is there any query plan that will build the result set dynamically while feeding it to the client?
I understand that concurrency reasons may prevent this. Can I give any hint that multiuser access is not possible? Maybe I should use cursors?

Depends on the query plan. If the query does not require any temporary internal structures then yes you get immediate response even before the full recordset has been constructed. If the query does require temporary internal storage (e.g. you are sorting it in a manner that doesn't match any index, or an index is available but a different one is used because it requires less I/O) then you will have to wait until the full recordset is constructed.
The only way to tell is to look at the query plan and examine each and every step. You will need to know how to interpret them... for example, a DISTINCT will require a temporary structure whereas a FLOW DISTINCT will not. If the query plan shows an EAGER SPOOL you will definitely have to wait, although there are a few things you can do to avoid them.
Note: You can't rely on this-- query plans can change depending not just on schema or indexes but on database statistics (e.g. selectivity), which are always changing.

Extrememly High Estimated Number of Rows in Execution Plan

I have a stored procedure running 10 times slower in production than in staging. I took at look at the execution plan and the first thing I noticed was the cost on Table Insert (into a table variable #temp) was 100% in production and 2% in staging.
The estimated number of rows in production showed almost 200 million row! But in staging was only about 33.
Although the production DB is running on SQL Server 2008 R2 while staging is SQL Server 2012 but I don't think this difference could cause such a problem.
What could be the cause of such a huge difference?
UPDATED
Added the execution plan. As you can see, the large number of estimated rows shows up in Nested Loops (Inner Join) but all it does is a clustered index seek to another table.
UPDATED2
Link for the plan XML included
plan.xml
And SQL Sentry Plan Explorer view (with estimated counts shown)

This looks like a bug to me.
There are an estimated 90,991.1 rows going into the nested loops.
The table cardinality of the table being seeked on is 24,826.
If there are no statistics for a column and the equality operator is used, that means the SQL can’t know the density of the column, so it uses a 10 percent fixed value.
90,991.1 * 24,826 * 10% = 225,894,504.86 which is pretty close to your estimated rows of 225,894,000
But the execution plan shows that only 1 row is estimated per seek. Not the 24,826 from above.
So these figures don't add up. I would assume that it starts off from an original 10% ball park estimate and then later adjusts it to 1 because of the presence of a unique constraint without making a compensating adjustment to the other branches.
I see that the seek is calling a scalar UDF [dbo].[TryConvertGuid] I was able to reproduce similar behavior on SQL Server 2005 where seeking on a unique index on the inside of a nested loops with the predicate being a UDF produced a result where the number of rows estimated out of the join was much larger than would be expected by multiplying estimated seeked rows * estimated number of executions.
But, in your case, the operators to the left of the problematic part of the plan are pretty simple and not sensitive to the number of rows (neither the rowcount top operator or the insert operator will change) so I don't think this quirk is responsible for the performance issues you noticed.
Regarding the point in the comments to another answer that switching to a temp table helped the performance of the insert this may be because it allows the read part of the plan to operate in parallel (inserting to a table variable would block this)

Run EXEC sp_updatestats; on the production database. This updates statistics on all tables. It might produce more sane execution plans if your statistics are screwed up.

Please don't run EXEC sp_updatestats; On a large system it could take hours, or days, to complete. What you may want to do is look at the query plan that is being used on production. Try to see if it has a index that could be used and is not being used. Try rebuilding the index (as a side effect it rebuilds statistics on the index.) After rebuilding look at the query plan and note if it is using the index. Perhaps you many need to add an index to the table. Does the table have a clustered index?
As a general rule, since 2005, SQL server manages statistics on its own rather well. The only time you need to explicitly update statistics is if you know that if SQL Server uses an index the query would execute would execute a lot faster but its not. You may want to run (on a nightly or weekly basis) scripts that automatically test every table and every index to see if the index needs to be reorged or rebuilt (depending on how fragmented it is). These kind of scripts (on a large active OLTP system)r may take a long time to run and you should consider carefully when you have a window to run it. There are quite a few versions of this script floating around but I have used this one often:
https://msdn.microsoft.com/en-us/library/ms189858.aspx

Sorry this is probably too late to help you.
Table Variables are impossible for SQL Server to predict. They always estimate one row and exactly one row coming back.
To get accurate estimates so that the better plan can be created you need to switch your table variable to a temp table or a cte.

In SQL Server, how to allow for multiple execution plans for a single query in a SP without having to recompile every time?

In SQL Server, what is the best way to allow for multiple execution plans to exist for a query in a SP without having to recompile every time?
For example, I have a case where the query plan varies significantly depending on how many rows are in a temp table that the query uses. Since there was no "one size fits all" plan that was satisfactory, and since it was unacceptable to recompile every time, I ended up copy/pasting (ick) the main query in the SP multiple times within several IF statements, forcing the SQL engine to give each case its own optimal plan. It actually seemed to work beautifully performance-wise, but it feels a bit clunky. (I know I could similarly break this part out into multiple SPs to do the same thing.) Is there a better way to do this?
IF #RowCount < 1
[paste query here]
ELSE IF #RowCount < 50
[paste query here]
ELSE IF #RowCount < 200
[paste query here]
ELSE
[paste query here]

You can use OPTIMIZE FOR in certain situations, to create a plan targeted to a certain value of a parameter (but not multiple plans per se). This allows you to specify what parameter value we want SQL Server to use when creating the execution plan. This is a SQL Server 2005 onwards hint.
Optimize Parameter Driven Queries with the OPTIMIZE FOR Hint in SQL Server
There is also OPTIMIZE FOR UNKNOWN – a SQL Server 2008 onwards feature (use judiciously):
This hint directs the query optimizer
to use the standard algorithms it has
always used if no parameters values
had been passed to the query at all.
In this case the optimizer will look
at all available statistical data to
reach a determination of what the
values of the local variables used to
generate the queryplan should be,
instead of looking at the specific
parameter values that were passed to
the query by the application.
Perhaps also look into optimize for ad hoc workloads Option

SQL Server 2005+ has statement level recompilation and is better at dealing with this kind of branching. You have one plan still but the plan can be partially recompiled at the statement level.
But it is ugly.
I'd go with #Mitch Wheat's option personally because you have recompilations anyway with the stored procedure using a temp table. See Temp table and stored proc compilation

how can I test performance in Sql Server Mgmt Studio without outputting data?

Using SQL Server Management Studio.
How can I test the performance of a large select (say 600k rows) without the results window impacting my test? All things being equal it doesn't really matter, since the two queries will both be outputting to the same place. But I'd like to speed up my testing cycles and I'm thinking that the output settings of SQL Server Management Studio are getting in my way. Output to text is what I'm using currently, but I'm hoping for a better alternative.
I think this is impacting my numbers because the database is on my local box.
Edit: Had a question about doing WHERE 1=0 here (thinking that the join would happen but no output), but I tested it and it didn't work -- not a valid indicator of query performance.

You could do SET ROWCOUNT 1 before your query. I'm not sure it's exactly what you want but it will avoid having to wait for lots of data to be returned and therefore give you accurate calculation costs.
However, if you add Client Statistics to your query, one of the numbers is Wait time on server replies which will give you the server calculation time not including the time it takes to transfer the data over the network.

You can SET STATISTICS TIME ON to get a measurement of the time on server. And you can use the Query/Include Client Statistics (Shift+Alt+S) on SSMS to get detail information about the client time usage. Note that SQL queries don't run and then return the result to the client when finished, but instead they run as they return results and even suspend execution if the communication channel is full.
The only context under which a query completely ignores sending the result packets back to the client is activation. But then the time to return the output to the client should be also considered when you measure your performance. Are you sure your own client will be any faster than SSMS?

SET ROWCOUNT 1 will stop processing after the first row is returned which means unless the plan happens to have a blocking operator the results will be useless.
Taking a trivial example
SELECT * FROM TableX
The cost of this query in practice will heavily depend on the number of rows in TableX.
Using SET ROWCOUNT 1 won't show any of that. Irrespective of whether TableX has 1 row or 1 billion rows it will stop executing after the first row is returned.
I often assign the SELECT results to variables to be able to look at things like logical reads without being slowed down by SSMS displaying the results.
SET STATISTICS IO ON
DECLARE #name nvarchar(35),
#type nchar(3)
SELECT #name = name,
#type = type
FROM master..spt_values
There is a related Connect Item request Provide "Discard results at server" option in SSMS and/or TSQL

The best thing you can do is to check the Query Execution Plan (press Ctrl+L) for the actual query. That will give you the best guesstimate for performance available.

I'd think that the where clause of WHERE 1=0 is definitely happening on the SQL Server side, and not Management Studio. No results would be returned.
Is you DB engine on the same machine that you're running the Mgmt Studio on?
You could :
Output to Text or
Output to File.
Close the Query Results pane.
That'd just move the cycles spent on drawing the grid in Mgmt Studio. Perhaps the Resuls to Text would be more performant on the whole. Hiding the pane would save the cycles on Mgmt Studio on having to draw the data. It's still being returned to the Mgmt Studio, so it really isn't saving a lot of cycles.

How can you test performance of your query if you don't output the results? Speeding up the testing is pointless if the testing doesn't tell you anything about how the query is going to perform. Do you really want to find out this dog of a query takes ten minutes to return data after you push it to prod?
And of course its going to take some time to return 600,000 records. It will in your user interface as well, it will probably take longer than in your query window because the info has to go across the network.

There is a lot of more correct answers of answers but I assume real question here is the one I just asked myself when I stumbled upon this question:
I have a query A and a query B on the same test data. Which is faster? And I want to check quick and dirty. For me the answer is - temp tables (overhead of creating temp table here is easy to ignore). This is to be done on perf/testing/dev server only!
Query A:
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS (to clear statistics
SELECT * INTO #temp1 FROM ...
Query B
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
SELECT * INTO #temp2 FROM ...