Why QO choses clustered index-scan vs table-scan? - sql-server

If I have a query like this:
SELECT * FROM tTable
where tTable does not contain any indexes a table-scan happens, as expected. If I add a clustered index on some column then QO decides to use clustered index scan on this query. Why? Why is clustered-index-scan preferred instead of table-scan in this case?

If I add a clustered index on some column then QO decides to use clustered index scan on this query
because when you create a clustered index on a table,data in table is rearranged in index order..so table it self is clustered index.This is also the reason why you can't have two clustered indexes on same table
To summarize,when you create a clustered index,there is only one structure ,not two(clustered index and table)

The query is "give me all rows and all columns" which means "read every row" which is a scan
There is nothing to do an index seek on, because there is no WHERE clause.
Unlike this:
SELECT * FROM tTable WHERE PrimaryClusteredKeyValue = 45
Then this may use a nonclustered seek followed by a clustered key lookup or it may still scan the clustered index because you ask for all columns. It depends on how many rows gbn will match
SELECT * FROM tTable WHERE NonClusteredOtherColumnValue = 'gbn'

Related

Multiple Clustered Indexes on a Single Table?

I thought we could only place one clustered index on one table, and put multiple non-clustered indexes on a table, but using the code below I can easily add more than one clustered index to my table.
CREATE CLUSTERED INDEX TBL_MULTI_LC_HIST ON dbo.TBL_MULTI_LC_HIST (ID,AsOfDate)
Is this completely wrong?
It isn't possible to create multiple clustered indexes for a single table. From the docs (emphasis mine):
Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be stored in only one order.
For example this will fail:
CREATE TABLE Thing
(
Column1 INT NOT NULL,
Column2 INT NOT NULL
)
CREATE CLUSTERED INDEX IX1 ON dbo.Thing(Column1)
CREATE CLUSTERED INDEX IX2 ON dbo.Thing(Column2)
Error:
Cannot create more than one clustered index on table 'dbo.Thing'. Drop the existing clustered index 'IX1' before creating another.
Example: http://www.sqlfiddle.com/#!18/53a63/1
You can however have a single index with multiple columns in it which is perhaps where you are getting confused:
CREATE CLUSTERED INDEX IX3 ON dbo.Thing(Column1, Column2)
You can only have one clustered index. A "Clustered" index IS the row... it contains all the columns. Every other index would just contain a pointer to the clustered row. The key of the clustered index enforces an 'ordering' on the rows by default.
If there is no clustered index, then the rows are basically stored in a heap, with no order or structure.

Should the PK on an identity column (which is surrogate key) be non-clustered?

For a table with PK on an identity column, it will be clustered by default. Could it better be non-clustered? The PK is a surrogate key which may never be used for querying directly, it may be used to join another table.
The reason is other indexes will be created for queries. A query which uses a non-clustered index and returned columns are not covered by the index will use less LIO because there is no extra clustered index seek steps?
create table T (
Id int identity(1,1) primary key, -- clustered or non-clustered?
A ....
B ....
C ....
....)
create index ix_A on T (A)
create index ix_..... -- Many indexes can be created for different queries
select A, B
from T
where A between #a and #a+5 -- This query will have less LIO if the PK is non-clustered (seek)
It's perfectly fine to set your surrogate PK to be non-clustered if there is a better candidate in the table for the clustered index.
Good candidates for a clustered index are columns that you will frequently do either range searches ([ColumnName] BETWEEN This AND That) on, or ORDER BY clauses on.

What is a non-clustered index scan

I know what table scan, clustered index scan and index seek is but my google skills let me down to find a precise explanation into non clustered index scans. Why and when a query uses a non clustered index scan?
Thank you.
As the name suggests, Non Clustered Index Scans are scans on Non Clustered Indexes - NCI scans will typically be done if all of the fields in a select can be fulfilled from a non clustered index, but where the selectivity or indexing of the query is too poor to result in an Seek.
NCI scans potentially have performance benefit over a clustered index scan in that the NCI indexes are generally narrower than the Clustered Indexes (since they generally have fewer columns), hence fewer pages to fetch, and less I/O.
I've put a contrived scenario up on SqlFiddle Here - click on the 'view execution plan' at the bottom.
Given the following setup of table, clustered, and non clustered indexes:
CREATE TABLE Foo
(
FooId INT,
Name VARCHAR(50),
BigCharField CHAR(7000),
CONSTRAINT PK_FOO PRIMARY KEY CLUSTERED(FooId)
);
CREATE NONCLUSTERED INDEX IX_FOO ON Foo(Name);
The following queries demonstrate the different scans:
-- Clustered Index Scan - because we need all fields, CI is most efficient
SELECT * FROM FOO;
-- Non Clustered Index Scan - because we just need Name, but have no selectivity, the NCI
-- will suffice and is narrower.
SELECT DISTINCT(Name) FROM FOO;

What is the difference between composite non clustered index and covering index

SQL Server 2005 includes "covering index" feature which allows us to select more than one non key column to be included to the existing non clustered index.
For example, I have the following columns:
EmployeeID, DepartmentID, DesignationID, BranchID
Here are two scenarios:
EmployeeID is a primary key with
clustered index and the remaining
columns (DepartmentID, DesignationID,
BranchID) are taken as non clustered
index (composite index).
EmployeeID is a primary key with
clustered index and DepartmentID is
non clustered index with
DesignationID, BranchID are "included
columns" for non clustered index.
What is the difference between the above two? If both are same what's new to introduce "Covering Index" concept?
The difference is that if there are two rows with the same DepartmentID in the first index they will be sorted based on their values of DesignationID and BranchID. In the second case they will not be sorted relative to each other and could appear in any order in the index.
In terms of what this means to your application:
A query which can use an index on (DepartmentID, DesignationID) can be more efficient with the first query than the second.
Building the first index may take slightly longer because of the extra sorting required.
Covered index is a nonclustered index with INCLUDE clause

SQL Server Indexes

What's the Need for going for Non-clustered index even though table has clustered index?
For optimal performance you have to create an index for every combination used in your queries. For instance if you have a select like this.
SELECT *
FROM MyTable
WHERE Col_1 = #SomeValue AND
Col_2 = #SomeOtherValue
Then you should do a clustered index with Col_1 and Col_2.
On the other hand if you have an additional query which only looks up one of the Column like:
SELECT *
FROM MyTable
WHERE Col_1 = #SomeValue
Then you should have an index with just the Col_1.
So you end up with two indexes. One with Col_1 and Col_2 and another with just Col_1.
The "need" is to do faster lookups of columns not included in the clustered index.
Don't get clustered indexes confused with indexes across multiple columns. That isn't the same thing.
Here's an article that does a good job of explaining clustered vs. non-clustered indexes.
In mssql server you can only have one clustered index per table, and it's almost always the primary key. A clustered index is "attached" to the table so it doesn't need to go back to the table to get any other data elements that might be in the "select" clause. A non-clustered index is not attached, but contains a reference back to the table row with all the rest of the data.

Resources