SQL Server STATISTICS - sql-server

So for this one project, we have a bunch of queries that are executed on a regular basis (every minute or so. I used the "Analyze Query in Database Engine " to check on them.
They are pretty simple:
select * from tablex where processed='0'
There is an index on processed, and each query should return <1000 rows on a table with 1MM records.
The Analyzer recommended creating some STATISTICS on this.... So my question is: What are those statistics ? do they really help performance ? how costly are they for a table like above ?
Please bear in mind that by no means I would call myself a SQL Server experienced user ... And this is the first time using this Analyzer.

Statistics are what SQL Server uses to determine the viability of how to get data.
Let's say, for instance, that you have a table that only has a clustered index on the primary key. When you execute SELECT * FROM tablename WHERE col1=value, SQL Server only has one option, to scan every row in the table to find the matching rows.
Now we add an index on col1 so you assume that SQL Server will use the index to find the matching rows, but that's not always true. Let's say that the table has 200,000 rows and col1 only has 2 values: 1 and 0. When SQL Server uses an index to find data, the index contains pointers back to the clustered index position. Given there's only two values in the indexed column, SQL Server decides it makes more sense to just scan the table because using the index would be more work.
Now we'll add another 800,000 rows of data to the table, but this time the values in col1 are widely varied. Now it's a useful index because SQL Server can viably use the index to limit what it needs to pull out of the table. Will SQL Server use the index?
It depends. And what it depends on are the Statistics. At some point in time, with AUTO UPDATE STATISTICS set on, the server will update the statistics for the index and know it's a very good and valid index to use. Until that point, however, it will ignore the index as being irrelevant.
That's one use of statistics. But there is another use and that isn't related to indices. SQL Server keeps basic statistics about all of the columns in a table. If there's enough different data to make it worthwhile, SQL Server will actually create a temporary index on a column and use that to filter. While this takes more time than using an existing index, it takes less time than a full table scan.
Sometimes you will get recommendations to create specific statistics on columns that would be useful for that. These aren't indices, but the do keep track of the statistical sampling of data in the column so SQL Server can determine whether it makes sense to create a temporary index to return data.
HTH

In Sql Server 2005, set auto create statistics and auto update statistics. You won't have to worry about creating them or maintaining them yourself, since the database handles this very well itself.

Related

SQL Server Update Statistics

I have 2 questions about SQL Server statistics, please help me. I am using SQL Server 2016.
My table TBL1 has only one column COL1. When I used COL1 in joins with other tables, statistics are automatically created on COL1.
Next I create a non-clustered index on COL1 of TBL1, then another set of statistics are created on COL1. Now I have 2 sets of statistics on COL1.
Out of the above 2 statistics, which statistics are used by SQL Server for further queries? I am assuming that the statistics created by the non-clustered index will be used, am I right?
If I use the Update Statics TBL1 command, all the statistics for TBL1 are updated. In the MSDN documentation, I see that updating statistics causes queries to recompile, what do they mean by re-compiling of queries? The MSDN link is
https://learn.microsoft.com/en-us/sql/relational-databases/statistics/update-statistics?view=sql-server-ver15
Please explain.
If there's only 1 column in your table, there's no reason to have a non-clustered index. This creates a separate copy of that data. Just create the clustered index on that column.
Yes - Since your table only has the one column and an index was created on that column, it's almost certain that SQL Server will use that index whenever joining to that table and thus the statistics for that index will be used.
In this context, it means that the execution plan in cache will be invalidated due to stale statistics and the next time a query executes the optimizer will recreate an execution plan. In other words, it will be assumed there may be a better set of steps to execute the query and the optimizer will try to assemble a better set of steps (execution plan) to execute.
Recommended Reading:
SQL Server Statistics
Understanding Execution Plans
Execution Plan Caching & Reuse

SQL Server execution plan Index seek

I was trying to improve 2 queries that are almost the same with indexing. I saw a Table Scan in the first query and created an index to make that an Index Seek, when I saw the second query, SQL Server indicated to create an index equals that last I have created changing the order of columns only, but in execution plan the SQL Server Engine was already doing an Index seek on the table.
My question is:
If SQL Server execution plan are already an index seek should I create another index for this query, should I delete the index I have created and replace with this other one, or should I ignore the advice that SQL Server gives?
One cannot answer without specific details. This is not a guessing game. Please post the exact table structure, table sizes, the indexes you added and the execution plans you have.
The fact that you added an index does not mean you added the best index. Nor does the fact that the execution plan uses an index seek implies the plan is optimal. Wrong index column order and partial predicate match would manifest as 'seek' on the leading column(s), it would be suboptimal, and SQL would continue recommending a better index (ie. exactly the symptoms you describe).
Please read Understanding how SQL Server executes a query and How to analyse SQL Server performance.
I saw a Table Scan in the first query and created an index to make that an Index Seek
All Seeks are not good,All Scans are not bad..
Imagine you have an customers table with 10 customers each having 1000 orders,now total rows in the orders table is 10000 rows..
To get top 1 order for each customer ,if your query is doing scan of orders table it may be bad,since doing seek will only cost you 10 seeks..
You have to understand the data and see why optimizer choose this plan and how you make optimizer in choosing the plan you need..Itzik Ben-Gan gives amazing examples in this tutorial and there is a video on SQL Bits
Further Craig Freedman talks on seeks and scans part and goes into details on why optimiser may choose Scan over Seek due to random reads,data density

Tens of Millions inserts into an indexed table performance/strategy (Sql Server >= 2005)

I have to get data from many tables and combine them into a single one.
The final table will have about 120 millions rows.
I'm planning to insert the rows in the exact order needed by the big table indexes.
My question is, in terms of performance:
Is it better create the indexes of the new table from the start, or first make the inserts and at the end of the import create the indexes ?
Also, would it make a difference if, when building indexes at the end, the rows are already sorted in terms of indexes specifications ?
I can't test both cases and get an objective comparison since the database is on the main server which is used for many other databases and applications which can be heavy loaded or not on different moment of times. I can't restore the database to my local server either, since I don't have full access to the main server yet.
I suggest that copy date in first and then create your indexes. If you insert records on the table that have index, for each insert, SQL Server refresh table index. but when you create index after insert all record to your table, SQL Server don't need to refresh table index for each insert, and rebuild index one way.
You can use SSIS in order to copy data from source tables to destination. SSIS use balk insert and have good performance. also if you have any trigger on destination database, I suggest that disable that before start your convert.
When you create index each time on your table, rows stored in terms of your index.

Update with "not in" on huge table in SQL Server 2005

I have a table with around 115k rows. Something like this:
Table: People
Column: ID PRIMARY KEY INT IDENTITY NOT NULL
Column: SpecialCode NVARCHAR(255) NULL
Column: IsActive BIT NOT NULL
Initially, I had an index defined like so:
PK_IDX (clustered) -- clustered index on primary key
IDX_SpecialCode (non clustered, non-unique) -- index on the SpecialCode column
And I'm doing an update like so:
Update People set IsActive = 0
Where SpecialCode not in ('...enormous list of special codes....')
This enormous list is essentially 99% of the users in the table.
This update takes forever on my server. As a test I trimmed the list of special codes in the "not in" clause to something like 1% of the users in the table, and my execution plan ends up using an INDEX SCAN on the PK_IDX index instead of the IDX_SpecialCode index that I thought it'd use.
So, I thought that maybe I needed to modify the IDX_SpecialCode so that it included the column "IsActive" in it. I did so and I still see the execution plan defaulting to the PK_IDX index scan and my query still takes a very long time to run.
So - what is the more correct way to do an update of this nature? I have the list of user's I want to exclude from the update, but was trying to avoid loading all employees special codes from the database, filtering out those not in my list on my application side, and then running my query with an in clause, which will be a much much smaller list in my actual usage.
Thanks
If you have the employees you want to exclude, why not just populate an indexed table with those PK_IDs and do a:
Update People
set IsActive = 0
Where NOT EXISTS (SELECT NULL
FROM lookuptable l
WHERE l.PK = People.PK)
You are getting index scans because SQL Server is not stupid, and realizes that it makes more sense to just look at the whole table instead of checking for 100 different criteria one at a time. If your stats are up to date the optimizer knows about how much of the table is covered by your IN statement and will do a table or clustered index scan if it thinks it will be faster.
With SQL-Server indexes are ignored when you use the NOT clause. That is why you are seeing the execution plan ignoring your index. <- Ref: page 6. MCTS Exam 70-433 Database Development SQL 2008 (I'm reading it at the moment)
It might be worth taking a look at Full text indexes although I don't know whether the same will happen with that (I haven't got access to a box with it set up to test at the moment)
hth
Is there any way you could use the IDs of the users you wish to exclude instead of their code - even on indexed values comparing ids may be faster than strings.
I think that the problem is your SpecialCode NVARCHAR(255). Strings comparison in Sql Server are very slow. Consider change your query to work with the IDs. And also, try to avoid the NVarchar. if dont care about Unicode, use Varchar instead.
Also, check your database collation to see if it matches the instance collation. Make sure you are not having hard disk performance issues.

SQL Query Optimization

I am using SQL Server 2008 and I need to optimize my queries.For that purpose I am using Database Engine Tuning Advisor.
My question is can I check the performance of only one SQL query at a time or more than one suing new session?
To analyze one query at a time right click it in the SSMS script window and choose the option "Analyze Query in DTA" For this workload select the option "keep all existing PDS" to avoid loads of drop recommendations for indexes not used by the query under examination.
To do more than one first capture a trace file with a representative workload sample then you can analyse that with the DTA.
There are simple steps that must follow when writes SQL Query:-
1-Take the name of the columns in the select query instead of *
2-Avoid sub queries
3-Avoid to use operator IN operator
4-Use having as a filter in in Group By
5-Don not save image in database instead of this save the image
Path in database Ex: saving image in the DB takes large space and each
time needs to serialization when saving or retrieving images in the database.
6-Each table should have a primary key
7-Each table should have a minimum of one clustered index
8-Each table should have an appropriate amount of non-clustered index Non-clustered index should be created on columns of table based on query which is running
9-Following priority orders should be followed when any index is
created a) WHERE clause, b) JOIN clause, c) ORDER BY clause, d)SELECT clause
10-Do not to use Views or replace views with original source table
11-Triggers should not be used if possible, incorporate
the logic of trigger in stored procedure
12-Remove any adhoc queries and use Stored Procedure instead
13-Check if there is atleast 30% HHD is empty it will be improves the performance a bit
14-If possible move the logic of UDF to SP as well
15-Remove any unnecessary joins from the table
16-If there is cursor used in a query, see if there is any other way to avoid the use of this
(either by SELECT … INTO or INSERT … INTO, etc)

Resources