I have a query and checked the execution plan in SQL Management Studio. Some non-clustered index scan steps return the PK column of the table instead of the indexed and joined column. Example:
select a.c10, b.c20
from a inner join b on a.c11 = b.c21
where a.c12 = 23
index on table a:
create unique nonclustered index ix_a_1 on a (a.c12 asc) include ( a.c13, a.c14)
the query plan shows:
index seek, nonclustered, ix_a_1 , output list: a.primary_key_col
The column a.primary_key_col is not used in the query. Why is this the only column included in the output list?
The PK column is needed to look into the clustered index (assumed PK) to get columns c10 and c11. This is known as a "key lookup"
You can remove this by making or changing the nonclustered index so it is "covering"
Try this
create nonclustered index ix_a_gbn on a (c12, c11) include (c10, c13, c14)
Some background reading from Simple Talke via Google
Related
I am optimizing a query on SQL Server 2005. I have a simple query against mytable that has about 2 million rows:
SELECT id, num
FROM mytable
WHERE t_id = 587
The id field is the Primary Key (clustered index) and there exists a non-clustered index on the t_id field.
The query plan for the above query is including both a Clustered Index Seek and an Index Seek, then it's executing a Nested Loop (Inner Join) to combine the results. The STATISTICS IO is showing 3325 page reads.
If I change the query to just the following, the server is only executing 6 Page Reads and only a single Index Seek with no join:
SELECT id
FROM mytable
WHERE t_id = 587
I have tried adding an index on the num column, and an index on both num and tid. Neither index was selected by the server.
I'm looking to reduce the number of page reads but still retrieve the id and num columns.
The following index should be optimal:
CREATE INDEX idx ON MyTable (t_id)
INCLUDE (num)
I cannot remember if INCLUDEd columns were valid syntax in 2005, you may have to use:
CREATE INDEX idx ON MyTable (t_id, num)
The [id] column will be included in the index as it is the clustered key.
The optimal index would be on (t_id, num, id).
The reason your query is probably one that bad side is because multiple rows are being selected. I wonder if rephrasing the query like this would improve performance:
SELECT t.id, t.num
FROM mytable t
WHERE EXISTS (SELECT 1
FROM my_table t2
WHERE t2.t_id = 587 AND t2.id = t.id
);
Lets clarify the problem and then discuss on the solutions to improve it:
You have a table(lets call it tblTest1 and contains 2M records) with a Clustered Index on id and a Non Clustered Index on t_id, and you are going to query the data which filters the data using Non Clustered Index and getting the id and num columns.
So SQL server will use the Non Clustered Index to filter the data(t_id=587), but after filtering the data SQL server needs to get the values stored in id and num columns. Apparently because you have Clustered index then SQL server will use this index to obtain the data stored in id and num columns. This happens because leafs in the Non clustered index's tree contains the Clustered index's value, this is why you see the Key Lookup operator in the execution plan. In fact SQL Server uses the Index seek(NonCluster) to find the t_id=587 and then uses the Key Lookup to get the num data!(SQL Server will not use this operator to get the value stored in id column, because your have a Clustered index and the leafs in NonClustered Index contains the Clustered Index's value).
Referred to the above-mentioned screenshot, when we have Index Seek(NonClustred) and a Key Lookup, SQL Server needs a Nested Loop Join operator to get the data in num column using the Index Seek(Nonclustered) operator. In fact in this stage SQL Server has two separate sets: one is the results obtained from Nonclustered Index tree and the other is data inside Clustered Index tree.
Based on this story, the problem is clear! What will happen if we say to SQL server, not to do a Key Lookup? this will cause the SQL Server to execute the query using a shorter way(No need to Key Lookup and apparently no need to the Nested loop join! ).
To achieve this, we need to INCLUDE the num column inside the NonClustered index's tree, so in this case the leaf of this index will contains the id column's data and also the num column's data! Clearly when we say the SQL Server to find a data using NonClustred Index and return the id and num columns, it will not need to do a Key Lookup!
Finally what we need to do, is to INCLUDE the num in NonClustered Index! Thanks to #MJH's answer:
CREATE NONCLUSTERED INDEX idx ON tblTest1 (t_id)
INCLUDE (num)
Luckily, SQL Server 2005 provided a new feature for NonClustered indexes, the ability to include additional, non-key columns in the leaf level of the NonClustered indexes!
Read more:
https://www.red-gate.com/simple-talk/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-indexes-with-included-columns?view=sql-server-2017
But what will happens if we write the query like this?
SELECT id, num
FROM tblTest1 AS t1
WHERE
EXISTS (SELECT 1
FROM tblTest1 t2
WHERE t2.t_id = 587 AND t2.id = t1.id
)
This is a great approach, but lets see the execution plan:
Clearly, SQL server needs to do a Index seek(NonClustered) to find the t_id=587 and then obtain the data from Clustered Index using Clustered Index Seek. In this case we will not get any notable performance improvement.
Note: When you are using Indexes, you need to have an appropriate plan to maintain them. As the indexes get fragmented, their impact on the query performance will be decreased and you might face performance problems after a while! You need to have an appropriate plan to Reorganize and Rebuild them, when they get fragmented!
Read more: https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-2017
I am running following query in SSMS and execution plan suggesting to add index on columns which are not part of where clause. I was planning to add index on two columns which are being used in where clause (OID and TransactionDate).
SELECT
[OID] , //this is not a PK. Primary key column is not a part of sql script
[CustomerNum] ,
[Amount] ,
[TransactionDate] ,
[CreatedDate]
FROM [dbo].[Transaction]
WHERE OID = 489
AND TransactionDate > '01/01/2018 06:13:06.46';
Index suggestion
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Transaction] ([OID],[TransactionDate])
INCLUDE ([CustomerNum],[Amount],[CreatedDate])
Updated
Do i need to include other columns? Data is being imported to that table through a back end process using SQLBulkCopy class in .net. I am wondering if having non cluster index on all columns would reduce the performance. (In my table is Pk column called TransactionID which is not needed but i have this in the table in case its needed in the future otherwise SQLBulkCopy works better with heap. Other option is to drop and recreate indexes before and after SQLBulkCopy operation)
the INCLUDE keyword specifies the non-key columns to be added to the leaf level of the nonclustered index.
This means that if you will add this index and run the query again, SQL Server can get all the information needed from the index, thus eliminating the need to perform a lookup in the table as well.
As a general rule of thumb - when SSMS suggest an index, create it. You can always drop it later if it doesn't help.
You don't need to add all table columns in your non-clustered index, suggested index is good for the query provided. SQL Server database engine suggestions are usually really good.
INCLUDE keyword is required to avoid KEY LOOKUP and use NONCLUSTERED INDEX SEEK.
All in all: No NONCLUSTERED INDEX results in Clustered index scan
Created NONCLUSTERED INDEX with no included columns results in NONCLUSTERED INDEX scan plus key lookup.
Created NONCLUSTERED INDEX with included columns results in NONCLUSTERED INDEX SEEK.
I have this table:
CREATE TABLE Ta
(
coda int NOT NULL PRIMARY KEY,
a2 int UNIQUE
);
and a SQL select statement:
select *
from Ta
I have a clustered index, the primary key and a non-clustered index, specified by the unique constraint.
Executing the select I get the following execution plan:
But I'm not sure why.
The data should be on the leaf level, therefore it should scan the leaf level, hence it should do a clustered scan.
EDIT: the table has 10000 rows, coda has values from 9999 to 0 and a2 has values from 0 to 9999.
The non-clustered index is a covering index for the query. That is, the index contains all of the columns needed to satisfy the query.
The execution plan is showing that SQL Server is using the non-clustered index.
For the given query, it seems like a reasonable execution plan.
If there were some predicate (a WHERE clause condition on a column) or an ORDER BY clause, then we would expect that to influence which index is used.
But in this case, retrieving two columns (a2 and coda) for every row in the table with the rows returned in an unspecified order, then a full scan of either index is a suitable plan.
I have a very simple select on SQL Server:
select * from person
where first_name = 'John' and last_name = 'Smith'`
In the execution plan I have:
Nonclustered Index seek - NC_First_Last_pers
Key lookup (Clustered) on PK
And this two goes into a Nested Loop Join.
My question is:
Why is there the join? I thought that this is used only for joining different tables, but I have only 1 table here.
Thanks!
In the index you have the data for the columns that are included in the index plus the clustered key. You are querying the table with * meaning that you have to go look for all column values and those are stored together with the clustered key.
The query uses the index on the names to find all rows that match and then uses the clustered key to find the data you want.
What is Clustered and non clustered indexing? How to index a table using sql server 2000 Enterprise manager?
In a clustered index on ID, the table rows are ordered by ID.
In a non-clustered index on ID, the references to table rows are ordered by ID.
We can compare a database to a CSV file:
ID,Value
-------
1,ReallyReallyLongValue1
3,ReallyReallyLongValue2
In a clustered table, when we insert a new row, we need to squeeze it between the existing rows:
ID,Value
-------
1,ReallyReallyLongValue1
2,ReallyReallyLongValue2
3,ReallyReallyLongValue3
, which is slow on insert but fast on retrieve.
In a non-clustered table, we keep a separate file index file which orders our rows:
Id,RowNumber
------------
1, 1
3, 2
When we insert the new row, we just append it to our main file and update the short index file:
ID,Value
-------
1,ReallyReallyLongValue1
3,ReallyReallyLongValue3
2,ReallyReallyLongValue2
Id,RowNumber
------------
1, 1
2, 3
3, 2
, which is fast on insert but less efficient on retrieve.
In real databases indexes use more efficient binary trees, but the principle remains the same.
Clustered indexes are faster on SELECT, non-clustered indexes are faster on INSERT / UPDATE / DELETE
A clustered index means that the rows are physically ordered by the values in that index. A non-clustered index means that an index table is kept up to date that allows for quick seeking and sorting based upon value, but does not physically order the rows.
Only one clustered index can exist for a table, and if a primary key exists then that is the clustered index (in SQL Server).
A clustered index defines how the actual table is stored. The rows are stored in a way to make searches on the fields in the clustered index fast. (They're not physically stored in the sort order of the index fields, but in a binary tree or something similiar.)
You can have only one clustered index per table. The clustered index contains all fields in the table, for example:
indexfield1 - indexfield2 - field2 - field3 - ....
A non-clustered index is like a separate table. It contains the fields in the index, and a reference to the fields in the table. For example:
secondindexfield1 - secondindexfield2 - reference to table row
When searching a non-clustered index, SQL server will find the value in the index, do a "bookmark lookup" to the table, and retrieve the other row fields from there. This is why non-clustered indexes perform slightly less wel then clustered indexes.
To add an index in SQL Server Management Studio, expand the table node in object view. Right click on "Indexes" and select "New Index".
Clustered Index: Only one clustered index per table is allowed. If an index is clustered, it means that the table on which the clustered index is based is physically sorted according to that index. Think of the page numbers in an encyclopedia.
Non-clustered Index: Can have many non-clustered indexes per table. Think of the keyword index at the back of the book.