Distinct and Group By - query performance - sql-server

Query A
SELECT Distinct ord_no FROM Orders
ORDER BY ord_no
Query B
SELECT ord_no FROM Orders
GROUP BY ord_no
ORDER BY ord_no
In Orders tabel, ord_no is varchar type and has duplicate. Here it's a composite key with a identity column.
May I know which query is better?
How could we check the query performance using MS SQL Server 2008 R2
(express version)?

You can see the amount of time each query takes in milli seconds on the SQL profiler. From management studio, go to Tools --> Profiler and start a trace on your DB. The run your queries. You can see the duration they took. Mind you, you'll need to have considerable amount of data to see the difference.
You can use SQL Express Profiler if you are not on the full blown version of SQL.

Check the execution plans for both queries. It's very likely that they will be the same, especially with such a simple query (you'll probably see a stream aggregate operator, doing the grouping, in both cases).
If the execution plans are the same, then there is no (statistically significant) difference in performance between the two.
Having said that, use group by instead distinct whenever in doubt.

Related

SQL Server report Performance - Top Queries by Total CPU Time - what's included

I'm trying to assess load level on sql server 2008/2016 by insert query.
There are articles I found which discuss that, like:
http://use-the-index-luke.com/sql/dml/insert
which talks about execution time.
I'm not very proficient in sql server, e.g. don't know how to evaluate execution plans.
I know that are handy performance reports, like "Performance - Top Queries by Total CPU Time".
I've searched and not found definitions of those reports.
So question is - which server tasks does this report include in CPU time calculations of queries, i.e.
indexes recalculation?
maybe even executing of triggers?
something else?
Thank you!
These are MDW or Management Data Warehouse reports and in particular the Query Statistics History introduced in SQL Server 2008. If you are interested in collecting this data then enable and Configure the Management Data Warehouse.
What are these reports anyway.
By default, only the top 10 queries will be included in the Top 10 Queries by CPU, however, you can emulate the query behind the report and tweak the desired outcome using a query similar to the one below as discussed in this article.
SELECT TOP X
qs.total_worker_time/(qs.execution_count*60000000) as [Minutes Avg CPU Time],
qs.execution_count as [Times Run],
qs.min_worker_time/60000000 as [CPU Time in Mins],
SUBSTRING(qt.text,qs.statement_start_offset/2,
(case when qs.statement_end_offset = -1 then len(convert(nvarchar(max), qt.text)) * 2
else qs.statement_end_offset end -qs.statement_start_offset)/2) as [Query Text],
db_name(qt.dbid) as [Database],
object_name(qt.objectid) as [Object Name]
FROM sys.dm_exec_query_stats qs cross apply
sys.dm_exec_sql_text(qs.sql_handle) as qt
ORDER BY [Minutes Avg CPU Time] DESC
Index recalculations and trigger executions are not performed as part of a query. Index updates are part of maintenance activity and trigger executions are part of Insert/Update/Delete activity.
Generally speaking, there are no "server tasks" included in the calculations for the Top Queries report. A query execution plan is based on that query and the data statistics available at the start of the query compilation. The plan generated is independent of maintenance or IUD activity taking place on the server.
It is possible that other activity make cause the actual duration to increase, but that additional time is not directly attributable to the query. The query is just forced to wait while the other activity completes.
Does that help?
Here is modified query which shows top CPU time consumers.
It is not average, it is summarized.
Also it is grouped by query_plan_hash so same query with different params will be in one group.
Note 1: if query running frequently (~1 time every second) then it's statistics will be flushed every hour.
Note 2: User name will exist if only query is running at the moment
Note 3: If you need to keep stats for long time, you will need to store it somewhere separately. Also adding grouping by date will help with reporting
SELECT TOP 10
SUM(qs.total_worker_time)/(1000000) AS [CPU Time Seconds],
SUM(qs.execution_count) AS [Times Run],
qs.query_plan_hash AS [Hash],
MIN(creation_time) AS [Creation time],
MIN(qt.text) AS [Query],
MIN(USER_NAME(r.user_id)) AS [UserName]
FROM sys.dm_exec_query_stats qs CROSS apply
sys.dm_exec_sql_text(qs.sql_handle) AS qt
LEFT JOIN sys.dm_exec_requests AS r ON qs.query_plan_hash =
r.query_plan_hash
GROUP BY qs.query_plan_hash
ORDER BY [CPU Time Seconds] DESC

Query really slow when changing join field

I'm using SQL Server 2008 and I noticed an enormous difference in performance when running these two almost identical queries.
Fast query (takes less than a second):
SELECT Season.Description, sum( Sales.Value ) AS Value
FROM Seasons, Sales
WHERE Sales.Property05=Seasons.Season
GROUP BY Seasons.Description
Slow query (takes around 5 minutes):
SELECT Season.Description, sum( Sales.Value ) AS Value
FROM Seasons, Sales
WHERE Sales.Property04=Seasons.Season
GROUP BY Seasons.Description
The only difference is that the tables SALES and SEASONS are joined on Property05 in the fast query and Property04 in the slow one.
Neither of the two property fields are in a key nor in an index so I really don't understand why the execution plan and the performances are so different between the two queries.
Can somebody enlighten me?
EDIT: The query is automatically generated by a Business Intelligence program, so I have no power there. I would have normally used the JOIN ON sintax, although I don't know if that makes a difference.
Slow query plan: https://www.brentozar.com/pastetheplan/?id=HkcBc7gXZ
Fast query plan: https://www.brentozar.com/pastetheplan/?id=rJQ95mgXb
Note that the query above were simplified to the essential part. The query plans are more detailed.

Estimated execution plan show needed indexes?

Ok,
I don't know if I am going crazy or not, but didn't the Estimated exexution plan use to show you what Indexes you would need to improve your performance? I used to this at my old job, but now it seems I have to using the tuning advisor. I don't mind the tuning advisor, but the way I did it before was so simple!
Thanks
In both SSMS 2008 and in SSMS 2012 see this is working fine for both estimated and actual plans.
Here is a quick example to show that estimated and actual execution plans will both show missing indexes:
USE tempdb;
GO
CREATE TABLE dbo.smorg(blamp INT);
GO
INSERT dbo.smorg(blamp) SELECT n FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY c1.object_id)
FROM sys.all_objects AS c1, sys.all_objects AS c2
) AS x(n);
GO
Now highlight this and choose estimated execution plan, or turn on actual execution plan and hit Execute:
SELECT blamp FROM dbo.smorg WHERE blamp BETWEEN 100 AND 105;
You should see a missing index recommendation. And you will see it represented here:
SELECT *
FROM sys.dm_db_missing_index_details
WHERE [object_id] = OBJECT_ID('dbo.smorg');
You can read more about the DMV here:
http://msdn.microsoft.com/en-us/library/ms345434.aspx
Also you should investigate SQL Sentry Plan Explorer (disclaimer: I work for SQL Sentry). This is a free tool that shows the missing indexes for estimated plans (if they are in the XML provided by SQL Server), doesn't have bugs like SSMS (where it repeats the same recommendation across multiple batches, even batches that don't mention the same tables), and generates actual execution plans without pulling all of the results across the network to the client - it just discards them (so the network and data overhead don't factor into to plan analysis).

Determining which SQL Server database is spiking the CPU

We are running SQL Server 2008 with currently around 50 databases of varying size and workload. Occasionally SQL Server spikes the CPU completely for about a minute, after which it drops to normal baseline load.
My problem is that I can't determine which database or connection is causing it (I'm fairly sure it is one specific query that is missing an index - or something like that).
I have found T-SQL queries that gives you a frozen image of current processes. There are also the "recent expensive queries" view and of course the profiler, but it is hard to map to a "this is the database that is causing it" answer.
What makes it even harder is that the problem disappears before I have even fired up the profiler or activity monitor, and it only happens about once or twice a day.
Ideally I would like to use a performance counter so I could simply run it for a day or two and then take a look at what caused the spikes. I can however not find any relevant counter.
Any suggestions?
This will help, courtesy of Glenn Berry adapted from Robert Pearl:
WITH DB_CPU_Stats
AS
(SELECT DatabaseID, DB_Name(DatabaseID) AS [DatabaseName], SUM(total_worker_time) AS [CPU_Time_Ms]
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY (SELECT CONVERT(int, value) AS [DatabaseID]
FROM sys.dm_exec_plan_attributes(qs.plan_handle)
WHERE attribute = N'dbid') AS F_DB
GROUP BY DatabaseID)
SELECT ROW_NUMBER() OVER(ORDER BY [CPU_Time_Ms] DESC) AS [row_num],
DatabaseName, [CPU_Time_Ms],
CAST([CPU_Time_Ms] * 1.0 / SUM([CPU_Time_Ms]) OVER() * 100.0 AS DECIMAL(5, 2)) AS [CPUPercent]
FROM DB_CPU_Stats
WHERE DatabaseID > 4 -- system databases
AND DatabaseID <> 32767 -- ResourceDB
ORDER BY row_num OPTION (RECOMPILE);
Run a profiler trace logging the database name and cpu during a spike, load the data up into a table, count and group on db.
select DatabaseName, sum(CPU) from Trace
group by DatabaseName
Have a look at sys.dm_exec_query_stats. The total_worker_time column is a measure of CPU. You may be able to accomplish what you're trying to do in one look at the view. You may, however, need to come up with a process to take "snapshots" of the view and compare successive snapshots. That is, look at the data in the view and compare it to five minutes later and compare the differences. The differences will the the amount of resources consumed between the two snapshots. Good luck!
Have you tried relating SQL Server Profiler to Performance Monitor?
When you correlate the data, you can see spikes in performance related to DB activity.
http://www.sqlservernation.com/home/relating-sql-server-profiler-with-performance-monitor.html

SQL Server ORDER BY Performance Aberration

SQL Server 2008 running on Windows Server Enterprise(?) Edition 2008
I have a query joining against twenty some-odd tables (mostly LEFT OUTER JOINs). The full dataset returned by an unfiltered query returns less than 1,000 rows in less than 1s. When I apply a WHERE clause to filter the query it returns less than 300 rows in less than 1s.
When I apply an ORDER BY clause to the query it returns in 90s.
I examined the results of the query and notice a number of NULL results returned in the column that is being used to sort. I modified the query to COALESCE a NULL value to a valid search value without any change to the performance of the query.
I then did a
SELECT * FROM
(
my query goes here
) qry
ORDER BY myOrderByHere
And that produced the same results.
When I SELECT ... INTO #tempTable (without the ORDER BY) and then SELECT FROM the #tempTable with the order by the query returns in less than 1s.
What is really strange at this point is that the SELECT... INTO will also take 90s even without the ORDER BY.
The Execution Plan says that the SORT is taking 98% of the execution time when included with the main query. If I am doing the INSERT INTO the the explain plan says that the actual insert into the temp table takes 99% of the execution time.
And to take out server issues I have run the same tests on two different instances of SQL Server 2008 with nearly identical results.
Many thanks!
rjsjr
Sounds like something strange is going on with your tempdb. Inserting 1000 rows in a temporary table should be fast, whether it's an implicit spool for sorting, or an explicit select into.
Check the size of your tempdb, the health of the hard disk it's on, and it's recovery model (should be simple, not full or bulk logged.)
A sort operation is usually an expensive step in the query. So, it's not surprising that the addition of the sort adds time. You may be seeing similar results when you incorporate a temp table in your steps. The sort operation in your original query may use tempdb to help do the sort, and that can be the time-consuming step in each query you compare.
If you want to learn more about each query you're running, you can review query plan outputs.

Resources