How to create index to optimize a "Key Lookup" - sql-server

I have an SP which on first run, can run for over a minute. On second run, it takes less than a second.
To fix, I check the Execution Plan which shows this:
So I add the following index, which as far as I know should create an index so it doesn't use the Key Lookup:
CREATE NONCLUSTERED INDEX [ix_account_id_include_utility_id]
ON dbo.account (account_id)
include ( utility_id)
I create the index which took 1 min 50 seconds.
I then check the execution plan again. But it has exactly the same plan with the Key Lookup.
Am I doing something wrong here with the index?
I am a newbie at indexing and optimization, so any advice would be appreciated.

Keylookup means that a specific field is not available in the index and we have to go to the data page to pick up the field.
In your case, accountid is being used to pickup the utility_id from the data page of Account table.
What you have to do is, add this utility_id as part of the included column of either of the right most two indexes (as highlighted in Right box), to avoid the key lookup.
But, in your case, you are again adding the index to the account table. So, it is not leveraging the newly created index.

Related

Query plan ignores index on large join

I have the following fairly simple query that returns about 1 million rows (I've left out columns as they are just for output), but the query plan doesn't seem to want to use the index and wants me to create one:
SELECT [SAU]
,nr.[Headend]
,[Source]
,[Destination]
,[FibreHop]
,[CableRef]
,[CableSectionRef]
,[nNGAFibres]
,[nEthFibres]
,[FromID]
,[ToID]
,[FromIDTerm]
,[ToIDTerm],Reversed
,#Now
FROM [NodeRouting] nr
join [TargetHeadends] tex ON nr.Headend=tex.Headend
The index is:
CREATE NONCLUSTERED INDEX [NodeRouting_Headend] ON [NodeRouting]
(
[Headend] ASC
)
the other table Headend is the PK
The query plan is this:
If I give it a hint to use the index already created (non-unique, non-clustered) on the id field:
join [TargetHeadends] tex ON nr.id=tex.id (index=NodeRouting_Headend)
It changes to this:
The estimated number of rows, btw, in reality is the first 966,000. The RID 761,000 is a few hundred thousand short and the operator cost seems a lot higher
One thing that is striking me as a little odd, is in the first example where it chose to not use the index it says this:
Missing Index (impact 99): CREATE NONCLUSTERED INDEX <NAME> ON NodeRouting(id) include (....)
CREATE NONCLUSTERED INDEX [<Name>]
ON [NodeRouting] ([Headend])
INCLUDE ([SAU],[Source],[Destination],[FibreHop],[CableRef],[CableSectionRef],[nNGAFibres],[nEthFibres],[FromID],[ToID],[FromIDTerm],[ToIDTerm],[Reversed])
I appreciate i'm returning more columns than in the index but would have thought the index would have still been used without the INCLUDE?
Indexes don't always help and they should not need to be forced into use. For example, for small tables a scan will be used because it's less work because of index overhead. Don't force the use of the index.
For a large table, an index helps when it is "selective" and the query is selective. It will get a few records quickly. It does not get a lot of records quickly. If the index is more than about 5% selective, then it might be used. If not, a scan might be faster than using the non-selective index.
If you are returning all the records, then there is no selectivity. A scan is going to be more efficient. For the join, other methods are more efficient than the lookup for a lot of records.
Using a phonebook analogy, just start at the front of the phone book and read it to the end. Don't start at the start of the index and lookup each name one at a time until you get to the end of the index.
A covered index can help because it can be scanned in place of scanning the original table (clustered index). For example, if you have a two phone books where one has address information and the other does not, then reading the one without address information will be faster if you are not interested in addresses.
FWI: Don't trust the order of the columns for the index suggestions. Also, the index suggested in this case might be a covering index to avoid reading unneeded columns - not for selectivity.

SQL makes key lookup instead of seek over indexed column

I have 1 large and 2 small tables inner joined. I added appropriate indexes over large table. Even if query is fast (most of the time) some times it is getting over 3 seconds. When I checked execution plan, seems like SQL goes with key lookup instead of index seek.
Here is my query;
and my execution plan;
and here execution details;
Am I missing something here?
A key lookup is a seek. It is looking up using the key.
The non clustered index always includes the clustered index key (or physical rid if the table isn't clustered in which case you get a "bookmark lookup" instead)
Because the index used in the previous index seek does not contain the CreateDate column it needs to use the clustered index key to seek into the clustered index to retrieve it. This type of seek to retrieve additional columns is called a key lookup.
If you wanted to get rid of the lookup you could consider adding CreateDate as an included column to the index on NewsCategoryUrlId.
Though as Hadi says in the comments your case sounds like parameter sniffing or outdated statistics. Often a plan with a non covering index seek and key lookups may be generated if the optimiser believes the parameter value to be selective and be problematic if it is not selective.
With parameter sniffing the problem can arise if a plan is compiled for a selective value and then cached and reused for a less selective value.
Outdated statistics may not reflect the true selectivity of the parameter value the plan is compiled for in the first place.
After suggestion from Martin Smith, I re-create my index as below;
and now, execution plan is mush satisfied for me;

Why isn't a particular index being used in a query?

I have a table named Workflow. It has 37M rows in it. There is a primary key on the ID column (int) plus an additional column. The ID column is the first column in the index.
If I execute the following query, the PK is not used (unless I use an index hint)
Select Distinct(SubID) From Workflow Where ID >= #LastSeenWorkflowID
If I execute this query instead, the PK is used
Select Distinct(SubID) From Workflow Where ID >= 786400000
I suspect the problem is with using the parameter value in the query (which I have to do). I really don't want to use an index hint. Is there a workaround for this?
Please post the execution plan(s), as well as the exact table definition, including all indexes.
When you use a variable the optimizer does no know what selectivity the query will have, the #LastSeenWorkflowID may filter out all but very last few rows in Workflow, or it may include them all. The generated plan has to work in both situations. There is a threshold at which the range seek over the clustered index is becoming more expensive than a full scan over a non-clustered index, simply because the clustered index is so much wider (it includes every column in the leaf levels) and thus has so much more pages to iterate over. The plan generated, which considers an unknown value for #LastSeenWorkflowID, is likely crossing that threshold in estimating the cost of the clustered index seek and as such it chooses the scan over the non-clustered index.
You could provide a narrow index that is aimed specifically at this query:
CREATE INDEX WorkflowSubId ON Workflow(ID, SubId);
or:
CREATE INDEX WorkflowSubId ON Workflow(ID) INCLUDE (SubId);
Such an index is too-good-to-pass for your query, no matter the value of #LastSeenWorkflowID.
Assuming your PK is an identity OR is always greater than 0, perhaps you could try this:
Select Distinct(SubID)
From Workflow
Where ID >= #LastSeenWorkflowID
And ID > 0
By adding the 2nd condition, it may cause the optimizer to use an index seek.
This is a classic example of local variable producing a sub-optimal plan.
You should use OPTION (RECOMPILE) in order to compile your query with the actual parameter value of ID.
See my blog post for more information:
http://www.sqlbadpractices.com/using-local-variables-in-t-sql-queries/

SQL Server Index cost

I have read that one of the tradeoffs for adding table indexes in SQL Server is the increased cost of insert/update/delete queries to benefit the performance of select queries.
I can conceptually understand what happens in the case of an insert because SQL Server has to write entries into each index matching the new rows, but update and delete are a little more murky to me because I can't quite wrap my head around what the database engine has to do.
Let's take DELETE as an example and assume I have the following schema (pardon the pseudo-SQL)
TABLE Foo
col1 int
,col2 int
,col3 int
,col4 int
PRIMARY KEY (col1,col2)
INDEX IX_1
col3
INCLUDE
col4
Now, if I issue the statement
DELETE FROM Foo WHERE col1=12 AND col2 > 34
I understand what the engine must do to update the table (or clustered index if you prefer). The index is set up to make it easy to find the range of rows to be removed and do so.
However, at this point it also needs to update IX_1 and the query that I gave it gives no obvious efficient way for the database engine to find the rows to update. Is it forced to do a full index scan at this point? Does the engine read the rows from the clustered index first and generate a smarter internal delete against the index?
It might help me to wrap my head around this if I understood better what is going on under the hood, but I guess my real question is this. I have a database that is spending a significant amount of time in delete and I'm trying to figure out what I can do about it.
When I display the execution plan for the deletion, it just shows an entry for "Clustered Index Delete" on table Foo which lists in the details section the other indices that need to be updated but I don't get any indication of the relative cost of these other indices.
Are they all equal in this case? Is there some way that I can estimate the impact of removing one or more of these indices without having to actually try it?
Nonclustered indexes also store the clustered keys.
It does not have to do a full scan, since:
your query will use the clustered index to locate rows
rows contain the other index value (c3)
using the other index value (c3) and the clustered index values (c1,c2), it can locate matching entries in the other index.
(Note: I had trouble interpreting the docs, but I would imagine that IX_1 in your case could be defined as if it was also sorted on c1,c2. Since these are already stored in the index, it would make perfect sense to use them to more efficiently locate records for e.g. updates and deletes.)
All this, however has a cost. For each matching row:
it has to read the row, to find out the value for c3
it has to find the entry for (c3,c1,c2) in the nonclustered index
it has to delete the entry from there as well.
Furthermore, while the range query can be efficient on the clustered index in your case (linear access, after finding a match), maintenance of the other indexes will most likely result in random access to them for every matching row. Random access has a much higher cost than just enumerating B+ tree leaf nodes starting from a given match.
Given the above query, more time is spent on the non-clustered index maintenance - the amount depends heavily on the number of records selected by the col1 = 12 AND col2 > 34
predicate.
My guess is that the cost is conceptually the same as if you did not have a secondary index but had e.g. a separate table, holding (c3,c1,c2) as the only columns in a clustered key and you did a DELETE for each matching row using (c3,c1,c2). Obviously, index maintenance is internal to SQL Server and is faster, but conceptually, I guess the above is close.
The above would mean that maintenance costs of indexes would stay pretty close to each other, since the number of entries in each secondary index is the same (the number of records) and deletion can proceed only one-by-one on each index.
If you need the indexes, performance-wise, depending on the number of deleted records, you might be better off scheduling the deletes, dropping the indexes - that are not used during the delete - before the delete and adding them back after. Depending on the number of records affected, rebuilding the indexes might be faster.

Index Seek with Bookmark Lookup Only Option for SQL Query?

I am working on optimizing a SQL query that goes against a very wide table in a legacy system. I am not able to narrow the table at this point for various reasons.
My query is running slowly because it does an Index Seek on an Index I've created, and then uses a Bookmark Lookup to find the additional columns it needs that do not exist in the Index. The bookmark lookup takes 42% of the query time (according to the query optimizer).
The table has 38 columns, some of which are nvarchars, so I cannot make a covering index that includes all the columns. I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used.
Also, since 28 of the 38 columns are pulled out via this query, I'd have 28/38 of the columns in the table stored in these covering indexes, so I'm not sure how much this would help.
Do you think a Bookmark Lookup is as good as it is going to get, or what would another option be?
(I should specify that this is SQL Server 2000)
OH,
the covering index with include should work. Another option might be to create a clustered indexed view containing only the columns you need.
Regards,
Lieven
You could create an index with included columns as another option
example from BOL, this is for 2005 and up
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);
To answer this part "I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used."
An index can only be used when the query is created in a way that it is sargable, in other words if you use function on the left side of the operator or leave out the first column of the index in your WHERE clause then the index won't be used. If the selectivity of the index is low then also the index won't be used
Check out SQL Server covering indexes for some more info

Resources