I have 2 questions about SQL Server statistics, please help me. I am using SQL Server 2016.
My table TBL1 has only one column COL1. When I used COL1 in joins with other tables, statistics are automatically created on COL1.
Next I create a non-clustered index on COL1 of TBL1, then another set of statistics are created on COL1. Now I have 2 sets of statistics on COL1.
Out of the above 2 statistics, which statistics are used by SQL Server for further queries? I am assuming that the statistics created by the non-clustered index will be used, am I right?
If I use the Update Statics TBL1 command, all the statistics for TBL1 are updated. In the MSDN documentation, I see that updating statistics causes queries to recompile, what do they mean by re-compiling of queries? The MSDN link is
https://learn.microsoft.com/en-us/sql/relational-databases/statistics/update-statistics?view=sql-server-ver15
Please explain.
If there's only 1 column in your table, there's no reason to have a non-clustered index. This creates a separate copy of that data. Just create the clustered index on that column.
Yes - Since your table only has the one column and an index was created on that column, it's almost certain that SQL Server will use that index whenever joining to that table and thus the statistics for that index will be used.
In this context, it means that the execution plan in cache will be invalidated due to stale statistics and the next time a query executes the optimizer will recreate an execution plan. In other words, it will be assumed there may be a better set of steps to execute the query and the optimizer will try to assemble a better set of steps (execution plan) to execute.
Recommended Reading:
SQL Server Statistics
Understanding Execution Plans
Execution Plan Caching & Reuse
Related
Query 1 - UserId is the main identifier, non-clustered index
update myTable set
CurrentHp=MaximumHp,
SelectedAttack1RemainingPP=SelectedAttack1MaximumPP,
SelectedAttack2RemainingPP=SelectedAttack2MaximumPP,
SelectedAttack3RemainingPP=SelectedAttack3MaximumPP,
SelectedAttack4RemainingPP=SelectedAttack4MaximumPP where UserId=1001695
Query 2
update myTable set
CurrentHp=MaximumHp,
SelectedAttack1RemainingPP=SelectedAttack1MaximumPP,
SelectedAttack2RemainingPP=SelectedAttack2MaximumPP,
SelectedAttack3RemainingPP=SelectedAttack3MaximumPP,
SelectedAttack4RemainingPP=SelectedAttack4MaximumPP
where UserId=1001695
and
(
SelectedAttack1RemainingPP!=SelectedAttack1MaximumPP
or
SelectedAttack2RemainingPP!=SelectedAttack2MaximumPP
or
SelectedAttack3RemainingPP!=SelectedAttack3MaximumPP
or
SelectedAttack4RemainingPP!=SelectedAttack4MaximumPP
or
CurrentHp!=MaximumHp
)
When i check via SQL server management studio and compare "Include Actual Execution Plan", their cost is same
However when i check via Include Client Statistics, I see that the first query shows 1900 rows updated while the second one shows 0 rows updated
So here my question, when column A and B value are equal, do SQL still make an update?
I also logically think that both query should be same but i would like to hear your opinion
execution plan same performance image
client statistics query 1
client statistics query 2
Two execution plans are the same because your first filter condition (UserId=1001695) chooses just one row and the table has an index on this field.
If you change your queries as a range condition such as (userID > 100), the costs in execution plans changed and they are not the same, or if your filter is in another field that table does not have index on it, the structures of execution plans are changed and they are not the same.
Imagine this scenario in SQL Server 2016: we have to tables A and B
A is a memory optimized table
B is a normal table
We join A and B, and nothing happens and 1000 rows are returned in min time.
But when we want to insert this result set into another table (memory optimized table OR normal table or even a temp table), it takes 10 to 20 seconds to insert.
Any ideas?
UPDATE : Execution plans for normal scenario and memory optimized table added
When a DML statement targets a Memory-Optimized table, the query cannot run in parallel, and the server will employ a serialized plan. So, your first statement runs in a single-core mode.
In the second instance, the DML statement leverages the fact that "SELECT INTO / FROM" is parallelizable. This behavior was added in SQL Server 2014. Thus, you get a parallel plan for that. Here is some information about this:
Reference: What's New (Database Engine) - SQL Server 2014
I have run into this problem countless times with Memory-Optimized targets. One solution I have found, if the I/O requirements are high on the retrieval, is to stage the result of the SELECT statement into a temporary table or other intermediate location, then insert from there into the Memory-Optimized table.
The third issue is that, by default, statements that merely read from a Memory-Optimized table, even if that table is not the target of DML, are also run in serialized fashion. There is a hotfix for this, which you can enable with a query hint.
The hint is used like this:
OPTION(HINT USE ('ENABLE_QUERY_OPTIMIZER_HOTFIXES'))
Reference: Update enables DML query plan to scan query memory-optimized tables in parallel in SQL Server 2016
In either case, any DML that has a memory-optimized table as a target is going to run on a single core. This is by design. If you need to exploit parallelism, you cannot do it if the Memory-Optimized table is the target of the statement. You will need to benchmark different approaches to find the one that performs best for your scenario.
I am using SQL Server 2008 and I need to optimize my queries.For that purpose I am using Database Engine Tuning Advisor.
My question is can I check the performance of only one SQL query at a time or more than one suing new session?
To analyze one query at a time right click it in the SSMS script window and choose the option "Analyze Query in DTA" For this workload select the option "keep all existing PDS" to avoid loads of drop recommendations for indexes not used by the query under examination.
To do more than one first capture a trace file with a representative workload sample then you can analyse that with the DTA.
There are simple steps that must follow when writes SQL Query:-
1-Take the name of the columns in the select query instead of *
2-Avoid sub queries
3-Avoid to use operator IN operator
4-Use having as a filter in in Group By
5-Don not save image in database instead of this save the image
Path in database Ex: saving image in the DB takes large space and each
time needs to serialization when saving or retrieving images in the database.
6-Each table should have a primary key
7-Each table should have a minimum of one clustered index
8-Each table should have an appropriate amount of non-clustered index Non-clustered index should be created on columns of table based on query which is running
9-Following priority orders should be followed when any index is
created a) WHERE clause, b) JOIN clause, c) ORDER BY clause, d)SELECT clause
10-Do not to use Views or replace views with original source table
11-Triggers should not be used if possible, incorporate
the logic of trigger in stored procedure
12-Remove any adhoc queries and use Stored Procedure instead
13-Check if there is atleast 30% HHD is empty it will be improves the performance a bit
14-If possible move the logic of UDF to SP as well
15-Remove any unnecessary joins from the table
16-If there is cursor used in a query, see if there is any other way to avoid the use of this
(either by SELECT … INTO or INSERT … INTO, etc)
I'm considering dropping an index from a table in a SQL Server 2005 instance. Is there a way that I can see which stored procedures might have statements that are dependent on that index?
First check if the indexes are being used at all, you can use the sys.dm_db_index_usage_stats DMV for that, check the user_scans and the user_seeks column
read this Use the sys.dm db index usage stats dmv to check if indexes are being used
Nope. For one thing, index selection is dynamic - the indexes aren't selected until the query executes.
Barring "HINT", but let's not go there.
As le dorfier says, this depends on the execution plan SQL determines at runtime. I'd suggest setting up perfmon to track table scans, or keep sql profiler running after you drop the index filtering for the colum names you're indexing. Look for long running queries.
So for this one project, we have a bunch of queries that are executed on a regular basis (every minute or so. I used the "Analyze Query in Database Engine " to check on them.
They are pretty simple:
select * from tablex where processed='0'
There is an index on processed, and each query should return <1000 rows on a table with 1MM records.
The Analyzer recommended creating some STATISTICS on this.... So my question is: What are those statistics ? do they really help performance ? how costly are they for a table like above ?
Please bear in mind that by no means I would call myself a SQL Server experienced user ... And this is the first time using this Analyzer.
Statistics are what SQL Server uses to determine the viability of how to get data.
Let's say, for instance, that you have a table that only has a clustered index on the primary key. When you execute SELECT * FROM tablename WHERE col1=value, SQL Server only has one option, to scan every row in the table to find the matching rows.
Now we add an index on col1 so you assume that SQL Server will use the index to find the matching rows, but that's not always true. Let's say that the table has 200,000 rows and col1 only has 2 values: 1 and 0. When SQL Server uses an index to find data, the index contains pointers back to the clustered index position. Given there's only two values in the indexed column, SQL Server decides it makes more sense to just scan the table because using the index would be more work.
Now we'll add another 800,000 rows of data to the table, but this time the values in col1 are widely varied. Now it's a useful index because SQL Server can viably use the index to limit what it needs to pull out of the table. Will SQL Server use the index?
It depends. And what it depends on are the Statistics. At some point in time, with AUTO UPDATE STATISTICS set on, the server will update the statistics for the index and know it's a very good and valid index to use. Until that point, however, it will ignore the index as being irrelevant.
That's one use of statistics. But there is another use and that isn't related to indices. SQL Server keeps basic statistics about all of the columns in a table. If there's enough different data to make it worthwhile, SQL Server will actually create a temporary index on a column and use that to filter. While this takes more time than using an existing index, it takes less time than a full table scan.
Sometimes you will get recommendations to create specific statistics on columns that would be useful for that. These aren't indices, but the do keep track of the statistical sampling of data in the column so SQL Server can determine whether it makes sense to create a temporary index to return data.
HTH
In Sql Server 2005, set auto create statistics and auto update statistics. You won't have to worry about creating them or maintaining them yourself, since the database handles this very well itself.