Are there any performance difference between these 2 queries? - sql-server

Query 1 - UserId is the main identifier, non-clustered index
update myTable set
CurrentHp=MaximumHp,
SelectedAttack1RemainingPP=SelectedAttack1MaximumPP,
SelectedAttack2RemainingPP=SelectedAttack2MaximumPP,
SelectedAttack3RemainingPP=SelectedAttack3MaximumPP,
SelectedAttack4RemainingPP=SelectedAttack4MaximumPP where UserId=1001695
Query 2
update myTable set
CurrentHp=MaximumHp,
SelectedAttack1RemainingPP=SelectedAttack1MaximumPP,
SelectedAttack2RemainingPP=SelectedAttack2MaximumPP,
SelectedAttack3RemainingPP=SelectedAttack3MaximumPP,
SelectedAttack4RemainingPP=SelectedAttack4MaximumPP
where UserId=1001695
and
(
SelectedAttack1RemainingPP!=SelectedAttack1MaximumPP
or
SelectedAttack2RemainingPP!=SelectedAttack2MaximumPP
or
SelectedAttack3RemainingPP!=SelectedAttack3MaximumPP
or
SelectedAttack4RemainingPP!=SelectedAttack4MaximumPP
or
CurrentHp!=MaximumHp
)
When i check via SQL server management studio and compare "Include Actual Execution Plan", their cost is same
However when i check via Include Client Statistics, I see that the first query shows 1900 rows updated while the second one shows 0 rows updated
So here my question, when column A and B value are equal, do SQL still make an update?
I also logically think that both query should be same but i would like to hear your opinion
execution plan same performance image
client statistics query 1
client statistics query 2

Two execution plans are the same because your first filter condition (UserId=1001695) chooses just one row and the table has an index on this field.
If you change your queries as a range condition such as (userID > 100), the costs in execution plans changed and they are not the same, or if your filter is in another field that table does not have index on it, the structures of execution plans are changed and they are not the same.

Related

SQL Server Update Statistics

I have 2 questions about SQL Server statistics, please help me. I am using SQL Server 2016.
My table TBL1 has only one column COL1. When I used COL1 in joins with other tables, statistics are automatically created on COL1.
Next I create a non-clustered index on COL1 of TBL1, then another set of statistics are created on COL1. Now I have 2 sets of statistics on COL1.
Out of the above 2 statistics, which statistics are used by SQL Server for further queries? I am assuming that the statistics created by the non-clustered index will be used, am I right?
If I use the Update Statics TBL1 command, all the statistics for TBL1 are updated. In the MSDN documentation, I see that updating statistics causes queries to recompile, what do they mean by re-compiling of queries? The MSDN link is
https://learn.microsoft.com/en-us/sql/relational-databases/statistics/update-statistics?view=sql-server-ver15
Please explain.
If there's only 1 column in your table, there's no reason to have a non-clustered index. This creates a separate copy of that data. Just create the clustered index on that column.
Yes - Since your table only has the one column and an index was created on that column, it's almost certain that SQL Server will use that index whenever joining to that table and thus the statistics for that index will be used.
In this context, it means that the execution plan in cache will be invalidated due to stale statistics and the next time a query executes the optimizer will recreate an execution plan. In other words, it will be assumed there may be a better set of steps to execute the query and the optimizer will try to assemble a better set of steps (execution plan) to execute.
Recommended Reading:
SQL Server Statistics
Understanding Execution Plans
Execution Plan Caching & Reuse

SQL Server - wrong execution plan?

I have a very big table with a lot of rows and a lot of columns (I know it's bad but let's leave this aside).
Specifically, I had two columns - FinishTime, JobId. The first one is the finish time of the row and the second is its id (not unique, but almost unique - only few records exist with the same id).
I have index on jobid and index on finishtime.
We insert rows all the time, mostly ordered by the finish time. We also update statistics of each index periodically.
Now to the problem:
When I run query with filter jobid==<some id> AND finishtime > <now minus 1 hour> - this query takes a lot of time, and when showing the estimated execution plan I see that the plan is to go over the finishtime index, even though going over the jobid index should be a lot better. When looking at the index statistics, I see that the server "thinks" that the number of jobs in the last hour is 1 because we didn't update the statistics of this index.
When I run query with filter jobid==<some id> AND finishtime > <now minus 100 days> - this works great, because the SQL Server knows to go over the correct index - the job id index.
So basically my question is why if we don't update index statistics all the time (which is time consuming), the server assumes that the number of records past the last bucket is 1?
Thanks very much
You can get a histogram of what the statistics contains for an index using DBCC SHOW_STATISTICS, e.g.
DBCC SHOW_STATISTICS ( mytablename , myindexname )
For date-based records, queries will always be prone to incorrect statistics. Running this should show that the last bucket in the histogram has barely any records in the range [prior-to-today / after-today]. However, all else being equal, SQL Server should still prefer the job_id index to the finishtime index if both are single-column indexes with no included columns; this is due to job_id (int) being faster to lookup than finishtime (datetime).
Note: If your finishtime is covering for the query, this will heavily influence the query optimizer in selecting it since it eliminates a bookmark lookup operation.
To combat this, either
update statistics regularly
create multiple filtered indexes (2008+ feature) on the data, with one partition updated far more regularly being the tail end
use index hints on sensitive queries

SQL Server : wrong index is used when filter value exceeds the index histogram range

We have a very large table, where every day 1-2 million rows are being added to the table.
In this query:
SELECT jobid, exitstatus
FROM jobsData
WHERE finishtime >= {ts '2012-10-04 03:19:26'} AND task = 't1_345345_454'
GROUP BY jobid, exitstatus
Indexes exists for both Task and FinishTime.
We expected that the task index will be used since it has much fewer rows. The problem that we see is that SQL Server creates a bad query execution plan which uses the FinishTime index instead of the task, and the query takes very long time.
This happens when the finish time value is outside the FinishTime index histogram.
Statistics are updated every day / several hours, but there are still many cases where the queries are for recent values.
The question: we can see clearly in the estimated execution plan that the estimated number of rows for the FinishTime is 1 in this case, so the FinishTime index is selcted. Why SQL Server assumes that this is 1 if there is no data? Is there a way to tell it to use something more reasonable?
When we replace the date with a bit earlier one, statistics exists in the histogram and the estimated number of rows is ~7000
You can use a Plan Guide to instruct the optimizer to use a specific query plan for you. This fits well for generated queries that you cannot modify to add hints.

SQL Server ORDER BY Performance Aberration

SQL Server 2008 running on Windows Server Enterprise(?) Edition 2008
I have a query joining against twenty some-odd tables (mostly LEFT OUTER JOINs). The full dataset returned by an unfiltered query returns less than 1,000 rows in less than 1s. When I apply a WHERE clause to filter the query it returns less than 300 rows in less than 1s.
When I apply an ORDER BY clause to the query it returns in 90s.
I examined the results of the query and notice a number of NULL results returned in the column that is being used to sort. I modified the query to COALESCE a NULL value to a valid search value without any change to the performance of the query.
I then did a
SELECT * FROM
(
my query goes here
) qry
ORDER BY myOrderByHere
And that produced the same results.
When I SELECT ... INTO #tempTable (without the ORDER BY) and then SELECT FROM the #tempTable with the order by the query returns in less than 1s.
What is really strange at this point is that the SELECT... INTO will also take 90s even without the ORDER BY.
The Execution Plan says that the SORT is taking 98% of the execution time when included with the main query. If I am doing the INSERT INTO the the explain plan says that the actual insert into the temp table takes 99% of the execution time.
And to take out server issues I have run the same tests on two different instances of SQL Server 2008 with nearly identical results.
Many thanks!
rjsjr
Sounds like something strange is going on with your tempdb. Inserting 1000 rows in a temporary table should be fast, whether it's an implicit spool for sorting, or an explicit select into.
Check the size of your tempdb, the health of the hard disk it's on, and it's recovery model (should be simple, not full or bulk logged.)
A sort operation is usually an expensive step in the query. So, it's not surprising that the addition of the sort adds time. You may be seeing similar results when you incorporate a temp table in your steps. The sort operation in your original query may use tempdb to help do the sort, and that can be the time-consuming step in each query you compare.
If you want to learn more about each query you're running, you can review query plan outputs.

SQL Server STATISTICS

So for this one project, we have a bunch of queries that are executed on a regular basis (every minute or so. I used the "Analyze Query in Database Engine " to check on them.
They are pretty simple:
select * from tablex where processed='0'
There is an index on processed, and each query should return <1000 rows on a table with 1MM records.
The Analyzer recommended creating some STATISTICS on this.... So my question is: What are those statistics ? do they really help performance ? how costly are they for a table like above ?
Please bear in mind that by no means I would call myself a SQL Server experienced user ... And this is the first time using this Analyzer.
Statistics are what SQL Server uses to determine the viability of how to get data.
Let's say, for instance, that you have a table that only has a clustered index on the primary key. When you execute SELECT * FROM tablename WHERE col1=value, SQL Server only has one option, to scan every row in the table to find the matching rows.
Now we add an index on col1 so you assume that SQL Server will use the index to find the matching rows, but that's not always true. Let's say that the table has 200,000 rows and col1 only has 2 values: 1 and 0. When SQL Server uses an index to find data, the index contains pointers back to the clustered index position. Given there's only two values in the indexed column, SQL Server decides it makes more sense to just scan the table because using the index would be more work.
Now we'll add another 800,000 rows of data to the table, but this time the values in col1 are widely varied. Now it's a useful index because SQL Server can viably use the index to limit what it needs to pull out of the table. Will SQL Server use the index?
It depends. And what it depends on are the Statistics. At some point in time, with AUTO UPDATE STATISTICS set on, the server will update the statistics for the index and know it's a very good and valid index to use. Until that point, however, it will ignore the index as being irrelevant.
That's one use of statistics. But there is another use and that isn't related to indices. SQL Server keeps basic statistics about all of the columns in a table. If there's enough different data to make it worthwhile, SQL Server will actually create a temporary index on a column and use that to filter. While this takes more time than using an existing index, it takes less time than a full table scan.
Sometimes you will get recommendations to create specific statistics on columns that would be useful for that. These aren't indices, but the do keep track of the statistical sampling of data in the column so SQL Server can determine whether it makes sense to create a temporary index to return data.
HTH
In Sql Server 2005, set auto create statistics and auto update statistics. You won't have to worry about creating them or maintaining them yourself, since the database handles this very well itself.

Resources