SQL Server : expected clustered index scan, but got non clustered index scan - sql-server

I have this table:
CREATE TABLE Ta
(
coda int NOT NULL PRIMARY KEY,
a2 int UNIQUE
);
and a SQL select statement:
select *
from Ta
I have a clustered index, the primary key and a non-clustered index, specified by the unique constraint.
Executing the select I get the following execution plan:
But I'm not sure why.
The data should be on the leaf level, therefore it should scan the leaf level, hence it should do a clustered scan.
EDIT: the table has 10000 rows, coda has values from 9999 to 0 and a2 has values from 0 to 9999.

The non-clustered index is a covering index for the query. That is, the index contains all of the columns needed to satisfy the query.
The execution plan is showing that SQL Server is using the non-clustered index.
For the given query, it seems like a reasonable execution plan.
If there were some predicate (a WHERE clause condition on a column) or an ORDER BY clause, then we would expect that to influence which index is used.
But in this case, retrieving two columns (a2 and coda) for every row in the table with the rows returned in an unspecified order, then a full scan of either index is a suitable plan.

Related

How to I force a better execution plan when the database is forcing a join?

I am optimizing a query on SQL Server 2005. I have a simple query against mytable that has about 2 million rows:
SELECT id, num
FROM mytable
WHERE t_id = 587
The id field is the Primary Key (clustered index) and there exists a non-clustered index on the t_id field.
The query plan for the above query is including both a Clustered Index Seek and an Index Seek, then it's executing a Nested Loop (Inner Join) to combine the results. The STATISTICS IO is showing 3325 page reads.
If I change the query to just the following, the server is only executing 6 Page Reads and only a single Index Seek with no join:
SELECT id
FROM mytable
WHERE t_id = 587
I have tried adding an index on the num column, and an index on both num and tid. Neither index was selected by the server.
I'm looking to reduce the number of page reads but still retrieve the id and num columns.
The following index should be optimal:
CREATE INDEX idx ON MyTable (t_id)
INCLUDE (num)
I cannot remember if INCLUDEd columns were valid syntax in 2005, you may have to use:
CREATE INDEX idx ON MyTable (t_id, num)
The [id] column will be included in the index as it is the clustered key.
The optimal index would be on (t_id, num, id).
The reason your query is probably one that bad side is because multiple rows are being selected. I wonder if rephrasing the query like this would improve performance:
SELECT t.id, t.num
FROM mytable t
WHERE EXISTS (SELECT 1
FROM my_table t2
WHERE t2.t_id = 587 AND t2.id = t.id
);
Lets clarify the problem and then discuss on the solutions to improve it:
You have a table(lets call it tblTest1 and contains 2M records) with a Clustered Index on id and a Non Clustered Index on t_id, and you are going to query the data which filters the data using Non Clustered Index and getting the id and num columns.
So SQL server will use the Non Clustered Index to filter the data(t_id=587), but after filtering the data SQL server needs to get the values stored in id and num columns. Apparently because you have Clustered index then SQL server will use this index to obtain the data stored in id and num columns. This happens because leafs in the Non clustered index's tree contains the Clustered index's value, this is why you see the Key Lookup operator in the execution plan. In fact SQL Server uses the Index seek(NonCluster) to find the t_id=587 and then uses the Key Lookup to get the num data!(SQL Server will not use this operator to get the value stored in id column, because your have a Clustered index and the leafs in NonClustered Index contains the Clustered Index's value).
Referred to the above-mentioned screenshot, when we have Index Seek(NonClustred) and a Key Lookup, SQL Server needs a Nested Loop Join operator to get the data in num column using the Index Seek(Nonclustered) operator. In fact in this stage SQL Server has two separate sets: one is the results obtained from Nonclustered Index tree and the other is data inside Clustered Index tree.
Based on this story, the problem is clear! What will happen if we say to SQL server, not to do a Key Lookup? this will cause the SQL Server to execute the query using a shorter way(No need to Key Lookup and apparently no need to the Nested loop join! ).
To achieve this, we need to INCLUDE the num column inside the NonClustered index's tree, so in this case the leaf of this index will contains the id column's data and also the num column's data! Clearly when we say the SQL Server to find a data using NonClustred Index and return the id and num columns, it will not need to do a Key Lookup!
Finally what we need to do, is to INCLUDE the num in NonClustered Index! Thanks to #MJH's answer:
CREATE NONCLUSTERED INDEX idx ON tblTest1 (t_id)
INCLUDE (num)
Luckily, SQL Server 2005 provided a new feature for NonClustered indexes, the ability to include additional, non-key columns in the leaf level of the NonClustered indexes!
Read more:
https://www.red-gate.com/simple-talk/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-indexes-with-included-columns?view=sql-server-2017
But what will happens if we write the query like this?
SELECT id, num
FROM tblTest1 AS t1
WHERE
EXISTS (SELECT 1
FROM tblTest1 t2
WHERE t2.t_id = 587 AND t2.id = t1.id
)
This is a great approach, but lets see the execution plan:
Clearly, SQL server needs to do a Index seek(NonClustered) to find the t_id=587 and then obtain the data from Clustered Index using Clustered Index Seek. In this case we will not get any notable performance improvement.
Note: When you are using Indexes, you need to have an appropriate plan to maintain them. As the indexes get fragmented, their impact on the query performance will be decreased and you might face performance problems after a while! You need to have an appropriate plan to Reorganize and Rebuild them, when they get fragmented!
Read more: https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-2017

Unused column in MS SQL execution plan

I have a query and checked the execution plan in SQL Management Studio. Some non-clustered index scan steps return the PK column of the table instead of the indexed and joined column. Example:
select a.c10, b.c20
from a inner join b on a.c11 = b.c21
where a.c12 = 23
index on table a:
create unique nonclustered index ix_a_1 on a (a.c12 asc) include ( a.c13, a.c14)
the query plan shows:
index seek, nonclustered, ix_a_1 , output list: a.primary_key_col
The column a.primary_key_col is not used in the query. Why is this the only column included in the output list?
The PK column is needed to look into the clustered index (assumed PK) to get columns c10 and c11. This is known as a "key lookup"
You can remove this by making or changing the nonclustered index so it is "covering"
Try this
create nonclustered index ix_a_gbn on a (c12, c11) include (c10, c13, c14)
Some background reading from Simple Talke via Google

Why does SQL Server use an Index Scan instead of a Seek + RID lookup?

I have a table with approx. 135M rows:
CREATE TABLE [LargeTable]
(
[ID] UNIQUEIDENTIFIER NOT NULL,
[ChildID] UNIQUEIDENTIFIER NOT NULL,
[ChildType] INT NOT NULL
)
It has a non-clustered index with no included columns:
CREATE NONCLUSTERED INDEX [LargeTable_ChildID_IX]
ON [LargeTable]
(
[ChildID] ASC
)
(It is clustered on ID).
I wish to join this against a temporary table which contains a few thousand rows:
CREATE TABLE #temp
(
ChildID UNIQUEIDENTIFIER PRIMARY KEY,
ChildType INT
)
...add #temp data...
SELECT lt.ChildID, lt.ChildType
FROM #temp t
INNER JOIN [LargeTable] lt
ON lt.[ChildID] = t.[ChildID]
However the query plan includes an index scan on the large table:
If I change the index to include extra columns:
CREATE NONCLUSTERED INDEX [LargeTable_ChildID_IX] ON [LargeTable]
(
[ChildID] ASC
)
INCLUDE [ChildType]
Then the query plan changes to something more sensible:
So my question is: Why can't SQL Server still use an index seek in the first scenario, but with a RID lookup to get from the non-clustered index to the table data? Surely that would be more efficient than an index scan on such a large table?
The first query plan actually makes a lot of sense. Remember that SQL Server never reads records, it reads pages. In your table, a page contains many records, since those records are so small.
With the original index, if the second query plan would be used, after finding all the RID's in the index, and reading index pages to do so, pages in the clustered index need to be read to read the ChildType column. In a worst case scenario, that is an entire page for each record it needs to read. As there are many records per page, that might boil down to reading a large percentage of the pages in the clustered index.
SQL server guessed, based on statistics, that simply scanning the pages in the clustered index would require less page reads in total, because it then avoids reading the pages in the non-clustered index.
What matters here is the number of rows in the temp table compared to the number of pages in the large table. Assuming a random distribution of ChildID in the large table, as soon as the number of rows in the temp table approaches or supersedes the number of pages in the large table, SQL server will have to read virtually every page in the large table anyway.
Because the column ChildType isn't covered in an index, it has to go back to the clustered index (with the mentioned Row IDentifier lookup) to get the values for ChildType.
When you INCLUDE this column in the nonclustered index it will be added to the leaf-level of the index where it is available for querying.
Colloquially is called 'the index tipping point'. Basically, at what point does the cost based optimizer consider that is more effective to do a scan rather than seek + lookup. Usually is around 20% of the size, which in your case will base on an estimate coming from the #temp table stats. YMMV.
You already have your answer: include the required column, make the index covering.

Why does performance degrade when using a non-indexed field in the SELECT clause?

Consider these three queries:
select sampleno from sample
where markupdate > '1/1/2010'
select sampleno, markupdate from sample
where markupdate > '1/1/2010'
select sampleno, markuptime from sample
where markupdate > '1/1/2010'
sampleno and markupdate are indexed fields (sampleno is the primary key)
markuptime is not indexed
Queries 1 and 2 take about 1 second to run (returning 237K rows). Query 3 is still running after 3 minutes.
Why would the inclusion of a non-indexed field in the SELECT clause cause such a performance degradation?
This is a SQL 6.5 database.
A table's data (basically: all columns) is stored in a clustered index. A clustered index is a binary tree that allows a binary search on the indexed column(s). It is special (clustered) in that it contains all other columns at the leaf level. Usually, the clustered index is also the primary key. In your case, it's:
(sampleno) include (markupdate, markuptime, ...)
A non-clustered index contains the indexed column(s) and (at the leaf level) the clustered index. When you use a non-clustered index, the database has to look up all the other columns in the clustered index. That process is called a lookup. In your case, the non-clustered index on (markupdate) is:
(markupdate) include (sampleno)
This index contains all data for a query on markupdate, sampleno. The technical term for such an index is a covering index. But when you add markuptime to the query, the index is no longer covering. It has to look up the value for markuptime in the clustered index. And lookups are expansive.
Only your third query requires lookups. And that's why your third query is slower.

Indexing in Sql Server

What is Clustered and non clustered indexing? How to index a table using sql server 2000 Enterprise manager?
In a clustered index on ID, the table rows are ordered by ID.
In a non-clustered index on ID, the references to table rows are ordered by ID.
We can compare a database to a CSV file:
ID,Value
-------
1,ReallyReallyLongValue1
3,ReallyReallyLongValue2
In a clustered table, when we insert a new row, we need to squeeze it between the existing rows:
ID,Value
-------
1,ReallyReallyLongValue1
2,ReallyReallyLongValue2
3,ReallyReallyLongValue3
, which is slow on insert but fast on retrieve.
In a non-clustered table, we keep a separate file index file which orders our rows:
Id,RowNumber
------------
1, 1
3, 2
When we insert the new row, we just append it to our main file and update the short index file:
ID,Value
-------
1,ReallyReallyLongValue1
3,ReallyReallyLongValue3
2,ReallyReallyLongValue2
Id,RowNumber
------------
1, 1
2, 3
3, 2
, which is fast on insert but less efficient on retrieve.
In real databases indexes use more efficient binary trees, but the principle remains the same.
Clustered indexes are faster on SELECT, non-clustered indexes are faster on INSERT / UPDATE / DELETE
A clustered index means that the rows are physically ordered by the values in that index. A non-clustered index means that an index table is kept up to date that allows for quick seeking and sorting based upon value, but does not physically order the rows.
Only one clustered index can exist for a table, and if a primary key exists then that is the clustered index (in SQL Server).
A clustered index defines how the actual table is stored. The rows are stored in a way to make searches on the fields in the clustered index fast. (They're not physically stored in the sort order of the index fields, but in a binary tree or something similiar.)
You can have only one clustered index per table. The clustered index contains all fields in the table, for example:
indexfield1 - indexfield2 - field2 - field3 - ....
A non-clustered index is like a separate table. It contains the fields in the index, and a reference to the fields in the table. For example:
secondindexfield1 - secondindexfield2 - reference to table row
When searching a non-clustered index, SQL server will find the value in the index, do a "bookmark lookup" to the table, and retrieve the other row fields from there. This is why non-clustered indexes perform slightly less wel then clustered indexes.
To add an index in SQL Server Management Studio, expand the table node in object view. Right click on "Indexes" and select "New Index".
Clustered Index: Only one clustered index per table is allowed. If an index is clustered, it means that the table on which the clustered index is based is physically sorted according to that index. Think of the page numbers in an encyclopedia.
Non-clustered Index: Can have many non-clustered indexes per table. Think of the keyword index at the back of the book.

Resources