SQL Server Indexes - sql-server

What's the Need for going for Non-clustered index even though table has clustered index?

For optimal performance you have to create an index for every combination used in your queries. For instance if you have a select like this.
SELECT *
FROM MyTable
WHERE Col_1 = #SomeValue AND
Col_2 = #SomeOtherValue
Then you should do a clustered index with Col_1 and Col_2.
On the other hand if you have an additional query which only looks up one of the Column like:
SELECT *
FROM MyTable
WHERE Col_1 = #SomeValue
Then you should have an index with just the Col_1.
So you end up with two indexes. One with Col_1 and Col_2 and another with just Col_1.

The "need" is to do faster lookups of columns not included in the clustered index.

Don't get clustered indexes confused with indexes across multiple columns. That isn't the same thing.
Here's an article that does a good job of explaining clustered vs. non-clustered indexes.
In mssql server you can only have one clustered index per table, and it's almost always the primary key. A clustered index is "attached" to the table so it doesn't need to go back to the table to get any other data elements that might be in the "select" clause. A non-clustered index is not attached, but contains a reference back to the table row with all the rest of the data.

Related

How to I force a better execution plan when the database is forcing a join?

I am optimizing a query on SQL Server 2005. I have a simple query against mytable that has about 2 million rows:
SELECT id, num
FROM mytable
WHERE t_id = 587
The id field is the Primary Key (clustered index) and there exists a non-clustered index on the t_id field.
The query plan for the above query is including both a Clustered Index Seek and an Index Seek, then it's executing a Nested Loop (Inner Join) to combine the results. The STATISTICS IO is showing 3325 page reads.
If I change the query to just the following, the server is only executing 6 Page Reads and only a single Index Seek with no join:
SELECT id
FROM mytable
WHERE t_id = 587
I have tried adding an index on the num column, and an index on both num and tid. Neither index was selected by the server.
I'm looking to reduce the number of page reads but still retrieve the id and num columns.
The following index should be optimal:
CREATE INDEX idx ON MyTable (t_id)
INCLUDE (num)
I cannot remember if INCLUDEd columns were valid syntax in 2005, you may have to use:
CREATE INDEX idx ON MyTable (t_id, num)
The [id] column will be included in the index as it is the clustered key.
The optimal index would be on (t_id, num, id).
The reason your query is probably one that bad side is because multiple rows are being selected. I wonder if rephrasing the query like this would improve performance:
SELECT t.id, t.num
FROM mytable t
WHERE EXISTS (SELECT 1
FROM my_table t2
WHERE t2.t_id = 587 AND t2.id = t.id
);
Lets clarify the problem and then discuss on the solutions to improve it:
You have a table(lets call it tblTest1 and contains 2M records) with a Clustered Index on id and a Non Clustered Index on t_id, and you are going to query the data which filters the data using Non Clustered Index and getting the id and num columns.
So SQL server will use the Non Clustered Index to filter the data(t_id=587), but after filtering the data SQL server needs to get the values stored in id and num columns. Apparently because you have Clustered index then SQL server will use this index to obtain the data stored in id and num columns. This happens because leafs in the Non clustered index's tree contains the Clustered index's value, this is why you see the Key Lookup operator in the execution plan. In fact SQL Server uses the Index seek(NonCluster) to find the t_id=587 and then uses the Key Lookup to get the num data!(SQL Server will not use this operator to get the value stored in id column, because your have a Clustered index and the leafs in NonClustered Index contains the Clustered Index's value).
Referred to the above-mentioned screenshot, when we have Index Seek(NonClustred) and a Key Lookup, SQL Server needs a Nested Loop Join operator to get the data in num column using the Index Seek(Nonclustered) operator. In fact in this stage SQL Server has two separate sets: one is the results obtained from Nonclustered Index tree and the other is data inside Clustered Index tree.
Based on this story, the problem is clear! What will happen if we say to SQL server, not to do a Key Lookup? this will cause the SQL Server to execute the query using a shorter way(No need to Key Lookup and apparently no need to the Nested loop join! ).
To achieve this, we need to INCLUDE the num column inside the NonClustered index's tree, so in this case the leaf of this index will contains the id column's data and also the num column's data! Clearly when we say the SQL Server to find a data using NonClustred Index and return the id and num columns, it will not need to do a Key Lookup!
Finally what we need to do, is to INCLUDE the num in NonClustered Index! Thanks to #MJH's answer:
CREATE NONCLUSTERED INDEX idx ON tblTest1 (t_id)
INCLUDE (num)
Luckily, SQL Server 2005 provided a new feature for NonClustered indexes, the ability to include additional, non-key columns in the leaf level of the NonClustered indexes!
Read more:
https://www.red-gate.com/simple-talk/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-indexes-with-included-columns?view=sql-server-2017
But what will happens if we write the query like this?
SELECT id, num
FROM tblTest1 AS t1
WHERE
EXISTS (SELECT 1
FROM tblTest1 t2
WHERE t2.t_id = 587 AND t2.id = t1.id
)
This is a great approach, but lets see the execution plan:
Clearly, SQL server needs to do a Index seek(NonClustered) to find the t_id=587 and then obtain the data from Clustered Index using Clustered Index Seek. In this case we will not get any notable performance improvement.
Note: When you are using Indexes, you need to have an appropriate plan to maintain them. As the indexes get fragmented, their impact on the query performance will be decreased and you might face performance problems after a while! You need to have an appropriate plan to Reorganize and Rebuild them, when they get fragmented!
Read more: https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-2017

execution plan suggesting to add an index on columns which are not part of where clause

I am running following query in SSMS and execution plan suggesting to add index on columns which are not part of where clause. I was planning to add index on two columns which are being used in where clause (OID and TransactionDate).
SELECT
[OID] , //this is not a PK. Primary key column is not a part of sql script
[CustomerNum] ,
[Amount] ,
[TransactionDate] ,
[CreatedDate]
FROM [dbo].[Transaction]
WHERE OID = 489
AND TransactionDate > '01/01/2018 06:13:06.46';
Index suggestion
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Transaction] ([OID],[TransactionDate])
INCLUDE ([CustomerNum],[Amount],[CreatedDate])
Updated
Do i need to include other columns? Data is being imported to that table through a back end process using SQLBulkCopy class in .net. I am wondering if having non cluster index on all columns would reduce the performance. (In my table is Pk column called TransactionID which is not needed but i have this in the table in case its needed in the future otherwise SQLBulkCopy works better with heap. Other option is to drop and recreate indexes before and after SQLBulkCopy operation)
the INCLUDE keyword specifies the non-key columns to be added to the leaf level of the nonclustered index.
This means that if you will add this index and run the query again, SQL Server can get all the information needed from the index, thus eliminating the need to perform a lookup in the table as well.
As a general rule of thumb - when SSMS suggest an index, create it. You can always drop it later if it doesn't help.
You don't need to add all table columns in your non-clustered index, suggested index is good for the query provided. SQL Server database engine suggestions are usually really good.
INCLUDE keyword is required to avoid KEY LOOKUP and use NONCLUSTERED INDEX SEEK.
All in all: No NONCLUSTERED INDEX results in Clustered index scan
Created NONCLUSTERED INDEX with no included columns results in NONCLUSTERED INDEX scan plus key lookup.
Created NONCLUSTERED INDEX with included columns results in NONCLUSTERED INDEX SEEK.

Why QO choses clustered index-scan vs table-scan?

If I have a query like this:
SELECT * FROM tTable
where tTable does not contain any indexes a table-scan happens, as expected. If I add a clustered index on some column then QO decides to use clustered index scan on this query. Why? Why is clustered-index-scan preferred instead of table-scan in this case?
If I add a clustered index on some column then QO decides to use clustered index scan on this query
because when you create a clustered index on a table,data in table is rearranged in index order..so table it self is clustered index.This is also the reason why you can't have two clustered indexes on same table
To summarize,when you create a clustered index,there is only one structure ,not two(clustered index and table)
The query is "give me all rows and all columns" which means "read every row" which is a scan
There is nothing to do an index seek on, because there is no WHERE clause.
Unlike this:
SELECT * FROM tTable WHERE PrimaryClusteredKeyValue = 45
Then this may use a nonclustered seek followed by a clustered key lookup or it may still scan the clustered index because you ask for all columns. It depends on how many rows gbn will match
SELECT * FROM tTable WHERE NonClusteredOtherColumnValue = 'gbn'

SQL Server 2008 R2 Create index

Now I am looking for your help to create index on these.
Now this is my table structure
This the query I need index for maximum performance.
select PageId
from tblPages
where UrlChecksumCode = #UrlChecksumCode
and PageUrl = #PageUrl
Now I am very bad with indexes. I plan like that when the query is executing it will find first that UrlChecksumCode rows then look pageurl columns. If you also explain me why to make such index I really appreciate that. Thank you.
one way, since your pageURL is long(nvarchar(1000) and an index can only be 900 bytes if you don't use included columns, I have created this index with included columns
create index ix_IndexName on tblPages (UrlChecksumCode) INCLUDE(PageUrl)
or
create index ix_IndexName on tblPages (UrlChecksumCode) INCLUDE(PageUrl, PageID)
See also SQL Server covering indexes
why is URL nvarchar instead of varchar...can they be unicode? if not make it varchar and save 1/2 the size
Create Nonclustered Indexes
create nonclustered index indexname on tablename(columnname)
Drop Index
Drop Index index name on tablename
See all Indexes in a database
select * from sys.indexes
You probably want an index over (UrlChecksumCode and PageUrl), because that's what it's selecting against.
Though you might, under some data patterns, have better performance if you just index over (UrlChecksumCode) because the PageUrl would require a larger text index
If you want to create cluster then , you can used this,
CREATE CLUSTERED INDEX IX_IndexName ON tblPages (UrlChecksumCode) INCLUDE(PageUrl)
The following query retrieves the indexes created on your table.
EXECUTE sp_helpindex tablename
Brief introduction for cluster indexing:-
A clustered index defines the order in which data is physically stored in a table. Table data can be sorted in only way, therefore, there can be only one clustered index per table. In SQL Server, the primary key constraint automatically creates a clustered index on that particular column. or you can create custom cluster index removing default cluster index which created by primary key.
i think even we can use clusterd index as for fast search like
create clusteredindex someName
on tableName(columnName1,columnName2,......)
but only one clustered index we can create for a tab

sql server execution plan - nested loop join

I have a very simple select on SQL Server:
select * from person
where first_name = 'John' and last_name = 'Smith'`
In the execution plan I have:
Nonclustered Index seek - NC_First_Last_pers
Key lookup (Clustered) on PK
And this two goes into a Nested Loop Join.
My question is:
Why is there the join? I thought that this is used only for joining different tables, but I have only 1 table here.
Thanks!
In the index you have the data for the columns that are included in the index plus the clustered key. You are querying the table with * meaning that you have to go look for all column values and those are stored together with the clustered key.
The query uses the index on the names to find all rows that match and then uses the clustered key to find the data you want.

Resources