Redundant indexes? - sql-server

I noticed a strange combination of indexes in one of the databases I was working on.
Here is the table design:
CREATE TABLE tblABC
(
id INT NOT NULL IDENTITY(1,1),
AnotherId INT NOT NULL, --not unique column
Othercolumn1 INT,
OtherColumn2 VARCHAR(10),
OtherColumn3 DATETIME,
OtherColumn4 DECIMAL(14, 4),
OtherColumn5 INT,
CONSTRAINT idxPKNCU
PRIMARY KEY NONCLUSTERED (id)
)
CREATE CLUSTERED INDEX idx1
ON tblABC(AnotherId ASC)
CREATE NONCLUSTERED INDEX idx2
ON tblABC(AnotherId ASC) INCLUDE(OtherColumn4)
CREATE NONCLUSTERED INDEX idx3
ON tblABC (AnotherId) INCLUDE (OtherColumn2, OtherColumn4)
Please note that column id is identity and defined as primary key.
A clustered index is defined on column - AnotherId, this column is not unique.
There are two additional nonclustered indexes defined on AnotherId, with additional include columns
My opinion is that either of the nonclustered indexes on AnotherId are redundant (idx2 and idx3) because the main copy of the table (culstred index) has the same data.
When I checked the index usage, I was expecting to see no usage on idx2 and idx3, but idx3 had highest index seeks.
I have given a screenshots of the index design and usage
My question is - aren't these nonclustered indexes - idx2 and idx3 redundant? Optimizer can get the same data from the clustered index - idx1. May be it would have got it, if there was no NC index defined.
Am I missing something?
Regards,
Nayak

It is a bit odd to have two very similar non-clustered indexes, though they may both be getting used equally. I do also find it positively weird that the clustered index was made on a non-unique field.
Check out the following link for information and a free tool to ascertain index usage. I use this all the time to see which indexes are being used etc.
https://www.brentozar.com/blitzindex/
For the non-clustered indexes - You can consolidate, and remove the unused indexes as if you're only writing to them, it is a royal waste of resources.
For the clustered index, you may consider redoing it based on your findings with the blitz index tool.

Related

What is different between the two methods of generating cluster primary keys?

I have a Table to make a Clustered Primary Key.
CREATE TABLE dbo.SampleTable
(
C1 INT NOT NULL,
C2 INT NOT NULL )
First Way is making Primary Key index with Clustered index.
ALTER TABLE dbo.SampleTable ADD CONSTRAINT IDX_SampleTable PRIMARY KEY CLUSTERED (C1, C2)
Second Way is CREATE CLUSTERED INDEX after ADD CONSTRAINT PRIMARY KEY NONCLUSTERED about same columns.
ALTER TABLE dbo.SampleTable ADD CONSTRAINT IDX_SampleTable PRIMARY KEY NONCLUSTERED (C1, C2)
CREATE CLUSTERED INDEX IDX_SampleTable2 ON dbo.SampleTable (C1 ,C2) -- Can not create Same Name With above Constraint Name
Is there a difference in performance from the above two methods?
Is there a way do not recommend using it?
Yes, there is a difference. By specifying CLUSTERED, you instruct the database to store the data in a certain way. Basically, it enforces that subsequent indexes are stored on subsequent data blocks on the hard drive.
By creating a clustered primary key as in your first statement, all the data in the table will always have unique values in C1, C2 and the data is always stored in subsequent data blocks.
In the second example, you do NOT enforce this CLUSTERED behaviour through the primary key, but through a separate index. Though the effects are the same now, you might choose to remove (or temporarily disable) the index and then the data would no longer be guaranteed to get stored in a CLUSTERED fashion.
Bottom line: In practice these two statements are the same now, but might make a difference in the future because the CLUSTERED property is not integrated in the PK, but in a separate index.
Creating a Nonclustered Primary Key and then creating a Clustered index on the columns within the Primary key is not a good idea. Effectively you'll create 2 indexes on the columns (C1 and C2 in this case), however, it's very unlikely the nonclustered index will ever be used. This is because the Clustered Index is very likely going to be the first choice for the RDBMS, as the pages will be in the order of the Clustered Index. Also, when using a non-clustered index the data engine will still need to refer to the Clustered Index afterwards, to find out the exact location of the row (in the pages).
If you do want a clustered index on your Primary Key(s) then create the key as a Clustered Primary Key. This is not to say that your Primary Key should always be Clustered, but that is a very different subject.
This depends from your datas:
https://learn.microsoft.com/en-gb/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-2017
Clustered indexes sort and store the data rows in the table or view
based on their key values. These are the columns included in the index
definition. There can be only one clustered index per table, because
the data rows themselves can be stored in only one order.
So the clustered key influence the format of your physical data structure.

Should the PK on an identity column (which is surrogate key) be non-clustered?

For a table with PK on an identity column, it will be clustered by default. Could it better be non-clustered? The PK is a surrogate key which may never be used for querying directly, it may be used to join another table.
The reason is other indexes will be created for queries. A query which uses a non-clustered index and returned columns are not covered by the index will use less LIO because there is no extra clustered index seek steps?
create table T (
Id int identity(1,1) primary key, -- clustered or non-clustered?
A ....
B ....
C ....
....)
create index ix_A on T (A)
create index ix_..... -- Many indexes can be created for different queries
select A, B
from T
where A between #a and #a+5 -- This query will have less LIO if the PK is non-clustered (seek)
It's perfectly fine to set your surrogate PK to be non-clustered if there is a better candidate in the table for the clustered index.
Good candidates for a clustered index are columns that you will frequently do either range searches ([ColumnName] BETWEEN This AND That) on, or ORDER BY clauses on.

What is a non-clustered index scan

I know what table scan, clustered index scan and index seek is but my google skills let me down to find a precise explanation into non clustered index scans. Why and when a query uses a non clustered index scan?
Thank you.
As the name suggests, Non Clustered Index Scans are scans on Non Clustered Indexes - NCI scans will typically be done if all of the fields in a select can be fulfilled from a non clustered index, but where the selectivity or indexing of the query is too poor to result in an Seek.
NCI scans potentially have performance benefit over a clustered index scan in that the NCI indexes are generally narrower than the Clustered Indexes (since they generally have fewer columns), hence fewer pages to fetch, and less I/O.
I've put a contrived scenario up on SqlFiddle Here - click on the 'view execution plan' at the bottom.
Given the following setup of table, clustered, and non clustered indexes:
CREATE TABLE Foo
(
FooId INT,
Name VARCHAR(50),
BigCharField CHAR(7000),
CONSTRAINT PK_FOO PRIMARY KEY CLUSTERED(FooId)
);
CREATE NONCLUSTERED INDEX IX_FOO ON Foo(Name);
The following queries demonstrate the different scans:
-- Clustered Index Scan - because we need all fields, CI is most efficient
SELECT * FROM FOO;
-- Non Clustered Index Scan - because we just need Name, but have no selectivity, the NCI
-- will suffice and is narrower.
SELECT DISTINCT(Name) FROM FOO;

indexes that appear to be redundant with clustered PK

I am working on a database at a client with the following table:
CREATE TABLE [Example] (
[ID] INT IDENTITY (1, 1) NOT NULL,
....
[AddressID] INT NULL,
[RepName] VARCHAR(50) NULL,
....
CONSTRAINT [PK_Example] PRIMARY KEY CLUSTERED ([ID] ASC)
)
And it has the following indexes:
CREATE NONCLUSTERED INDEX [IDX_Example_Address]
ON [example]( [ID] ASC, [AddressId] ASC);
CREATE NONCLUSTERED INDEX [IDX_Example_Rep]
ON [example]( [ID] ASC, [RepName] ASC);
To me these are appear to be redundant with the clustered Index. I cannot imagine any scenario where these would be beneficial. If anyone can come up with a situation where these would be useful, let me know.
Here is another example:
CREATE NONCLUSTERED INDEX [IDX_Example_IsDeleted]
ON [example]( [IsDeleted] ASC)
INCLUDE( [ID], [SomeNumber]);
Why would you need to INCLUDE [ID]? My understanding is that the clustered index key is already present in every non-clustered index, so why would they do that? I would just INCLUDE ([SomeNumber])
You are correct in that the clustered index key is already included in every non-clustered index, but not in the same sense as your example clustered indices suggest.
For example, if you have a non-clustered index as in your example for IDX_Example_Rep, and you run this query:
SELECT [RepName], [Id] FROM [Example] WHERE [RepName] = 'some_value';
The IDX_Example_Rep index will be used, but it will be an index scan (every row will be checked). This is because the [Id] column was specified as the first column in the index.
If the index is instead specified as follows:
CREATE NONCLUSTERED INDEX [IDX_Example_Rep]
ON [example]([RepName] ASC);
Then when you run the same sample query, the IDX_Example_Rep index is used and the operation is an index seek - the engine knows exactly where to find the records by [RepName] within the IDX_Example_Rep index and, because the only other field being returned by the SELECT is the [Id] field, which is the key of the clustered index and therefore included in the non-clustered index, no further operations are necessary.
If the SELECT list were expanded to include, say, the [AddressId] field, then you'll find the engine still performs the index seek against IDX_Example_Rep to find the correct records, but then also has do a key lookup against the clustered index to get the "other" fields (the [AddressId] in this example).
So, no - you probably don't want to repeat the [Id] column as part of the non-clustered indices in general, but when it comes to non-clustered indices you definitely want to pay attention to your SELECTed fields and know whether or not you're covering the fields you're going to need.

Primary keys without defaul index (sort) - SQL2005

How do I switch off the default index on primary keys
I dont want all my tables to be indexed (sorted) but they must have a primary key
You can define a primary key index as NONCLUSTERED to prevent the table rows from being ordered according to the primary key, but you cannot define a primary key without some associated index.
Tables are always unsorted - there is no "default" order for a table and the optimiser may or may not choose to use an index if one exists.
In SQL Server an index is effectively the only way to implement a key. You get a choice between clustered or nonclustered indexes - that is all.
The means by which SQL Server implements Primary and Unique keys is by placing an index on those columns. So you cannot have a Primary Key (or Unique constraint) without an index.
You can tell SQL Server to use a nonclustered index to implement these indexes. If there are only nonclustered indexes on a table (or no indexes at all), you have a heap. It's pretty rare that this is what you actually want.
Just because a table has a clustered index, this in no way indicates that the rows of the table will be returned in the "order" defined by such an index - the fact that the rows are usually returned in that order is an implementation quirk.
And the actual code would be:
CREATE TABLE T (
Column1 char(1) not null,
Column2 char(1) not null,
Column3 char(1) not null,
constraint PK_T PRIMARY KEY NONCLUSTERED (Column2,Column3)
)
What does " I dont want all my tables to be sorted" mean ? If it means that you want the rows to appear in the order where they've been entered, there's only one way to garantee it: have a field that stores that order (or the time if you don't have a lot of transactions). And in that case, you will want to have a clustered index on that field for best performance.
You might end up with a non clustered PK (like the productId) AND a clustered unique index on your autonumber_or_timestamp field for max performance.
But that's really depending on the reality your're trying to model, and your question contains too little information about this. DB design is NOT abstract thinking.

Resources