Why isn't a particular index being used in a query? - sql-server

I have a table named Workflow. It has 37M rows in it. There is a primary key on the ID column (int) plus an additional column. The ID column is the first column in the index.
If I execute the following query, the PK is not used (unless I use an index hint)
Select Distinct(SubID) From Workflow Where ID >= #LastSeenWorkflowID
If I execute this query instead, the PK is used
Select Distinct(SubID) From Workflow Where ID >= 786400000
I suspect the problem is with using the parameter value in the query (which I have to do). I really don't want to use an index hint. Is there a workaround for this?

Please post the execution plan(s), as well as the exact table definition, including all indexes.
When you use a variable the optimizer does no know what selectivity the query will have, the #LastSeenWorkflowID may filter out all but very last few rows in Workflow, or it may include them all. The generated plan has to work in both situations. There is a threshold at which the range seek over the clustered index is becoming more expensive than a full scan over a non-clustered index, simply because the clustered index is so much wider (it includes every column in the leaf levels) and thus has so much more pages to iterate over. The plan generated, which considers an unknown value for #LastSeenWorkflowID, is likely crossing that threshold in estimating the cost of the clustered index seek and as such it chooses the scan over the non-clustered index.
You could provide a narrow index that is aimed specifically at this query:
CREATE INDEX WorkflowSubId ON Workflow(ID, SubId);
or:
CREATE INDEX WorkflowSubId ON Workflow(ID) INCLUDE (SubId);
Such an index is too-good-to-pass for your query, no matter the value of #LastSeenWorkflowID.

Assuming your PK is an identity OR is always greater than 0, perhaps you could try this:
Select Distinct(SubID)
From Workflow
Where ID >= #LastSeenWorkflowID
And ID > 0
By adding the 2nd condition, it may cause the optimizer to use an index seek.

This is a classic example of local variable producing a sub-optimal plan.
You should use OPTION (RECOMPILE) in order to compile your query with the actual parameter value of ID.
See my blog post for more information:
http://www.sqlbadpractices.com/using-local-variables-in-t-sql-queries/

Related

Sql Server - Index not being used

After running the following query:
SELECT [hour], count(*) as hits, avg(elapsed)
FROM myTable
WHERE [url] IS NOT NULL and floordate >= '2017-05-01'
group by [hour]
the execution plan is basically a clustered Index Scan on the PK (int, auto-increment, 97% of the work)
The thing is: URL has a index on it (regular index because i'm always searching for a exact match), floordate also has an index...
Why are they not being used? How can i speed up this query?
PS: table is 70M items long and this query takes about 9 min to run
Edit 1
If i don't use (select or filter) a column on my index, will it still be used? Usually i also filter-for/group-by clientId (approx 300 unique across the db) and hour (24 unique)...
In this scenario, two things affect how SQL Server will choose an index.
How selective is the index. A higher selectivity is better. NULL/NOT NULL filters generally have a very low selectivity.
Are all of the columns in the index, also known as a covering index.
In your example, if the index cannot cover the query, SQL will have to look up the other column values against the base table. If your URL/Floordate combination is not selective enough, SQL may determine it is cheaper to scan the base table rather than do an expensive lookup from the non-clustered index to the base table for a large number of rows.
Without knowing anything else about your schema, I'd recommend an index with the following columns:
floordate, url, hour; include elapsed
Date ranges scans are generally more selective than a NULL/NOT NULL test. Moving Floordate to the front may make this index more desirable for this query. If SQL determines the query is good for Floordate and URL, the Hour column can be used for the Group By action. Since Elapsed is included, this index can cover the query completely.
You can include ClientID after hour to see if that helps your other query as well.
As long as an index contains all of the columns to resolve the query, it is a candidate for use, even if there is no filtering needed. Generally speaking, a non-clustered index is skinnier than the base table, requiring less IO than scanning the full width base table.

SQL makes key lookup instead of seek over indexed column

I have 1 large and 2 small tables inner joined. I added appropriate indexes over large table. Even if query is fast (most of the time) some times it is getting over 3 seconds. When I checked execution plan, seems like SQL goes with key lookup instead of index seek.
Here is my query;
and my execution plan;
and here execution details;
Am I missing something here?
A key lookup is a seek. It is looking up using the key.
The non clustered index always includes the clustered index key (or physical rid if the table isn't clustered in which case you get a "bookmark lookup" instead)
Because the index used in the previous index seek does not contain the CreateDate column it needs to use the clustered index key to seek into the clustered index to retrieve it. This type of seek to retrieve additional columns is called a key lookup.
If you wanted to get rid of the lookup you could consider adding CreateDate as an included column to the index on NewsCategoryUrlId.
Though as Hadi says in the comments your case sounds like parameter sniffing or outdated statistics. Often a plan with a non covering index seek and key lookups may be generated if the optimiser believes the parameter value to be selective and be problematic if it is not selective.
With parameter sniffing the problem can arise if a plan is compiled for a selective value and then cached and reused for a less selective value.
Outdated statistics may not reflect the true selectivity of the parameter value the plan is compiled for in the first place.
After suggestion from Martin Smith, I re-create my index as below;
and now, execution plan is mush satisfied for me;

Covering indexes when extra columns uniquely determined by clustered index

Suppose I need to update myTab from luTab as follows
update myTab
set LookupVale = (select LookupValue from luTab B
where B.idLookup = myTab.idLookup)
luTab consists of 2 columns (idLookup(unique), LookupValue)
Which is preferable : a unique clustered index on idLookup, or one on idLookup and Lookupvalue combined? Is a covering index going to make any difference in this situation?
(I'm mostly interested in SQL server)
Epilogue :
I followed up Krips tests below with 27M rows in myTab, 1.5M rows in luTab.
The crucial part seems to be the uniqueness of the index.
If the index is specified as unique, the update uses a hash table.
If it is not specified as unique, then the update first aggreates luTab by idLookup (the Stream Aggegate) and then uses a nested loop. This is much slower.
When I use the extended index, SQL is now no longer assued that that LookupValue is unique so its forced down the much slower, stream aggregate-nested loop route
Firstly:
A covering index is always non-clustered
You should always have a PK and a clustered index (there are the same by default on SQL Server)
The 2 concepts are separate
So:
Your PK (clustered) would be idLookup if this uniquely identifies a row
The covering index would be (idLookup) INCLUDE (LookupValue)
However:
idLookup is the PK (clustered), so you don't need a covering index
the clustered index (PK) is implicitly "covering" by the nature of a clustered index (simply, index is data at the lowest level)
I've created your tables and loaded just a few records (50 or so lookup, and 15 in myTab).
Then I've tried various index options. The Index Seek on luTab always has a cost of 29%.
The interesting bit is that if you add in the LookupValue column to the index on luTab the execution plan shows two extra steps after the Index Seek: Stream Aggregate and Assert. While cost is 0%, that may go up with more data.
I've also tried a nonclustered index on just idLookup, and including LookupValue as an 'Included Column'. That way the data pages don't need to be accessed to retrieve that that column. That may be an option for you although the execution plan doesn't show anything different (but they don't have the Stream Aggregate / Assert either).
-Krip

SQL Server: Index columns used in like?

Is it a good idea to index varchar columns only used in LIKE opertations? From what I can read from query analytics I get from the following query:
SELECT * FROM ClientUsers WHERE Email LIKE '%niels#bosmainter%'
I get an "Estimated subtree cost" of 0.38 without any index and 0.14 with an index. Is this a good metric to use for anlayzing if a query has been optimized with an index?
Given the data 'abcdefg'
WHERE Column1 LIKE '%cde%' --can't use an index
WHERE Column1 LIKE 'abc%' --can use an index
WHERE Column1 Like '%defg' --can't use an index, but see note below
Note: If you have important queries that require '%defg', you could use a persistent computed column where you REVERSE() the column and then index it. Your can then query on:
WHERE Column1Reverse Like REVERSE('defg')+'%' --can use the persistent computed column's index
In my experience the first %-sign will make any index useless, but one at the end will use the index.
To answer the metrics part of your question: The type of index/table scan/seek being performed is a good indicator for knowing if an index is being (properly) used. It's usually shown topmost in the query plan analyzer.
The following scan/seek types are sorted from worst (top) to best (bottom):
Table Scan
Clustered Index Scan
Index Scan
Clustered Index Seek
Index Seek
As a rule of thumb, you would normally try to get seeks over scans whenever possible. As always, there are exceptions depending on table size, queried columns, etc. I recommend doing a search on StackOverflow for "scan seek index", and you'll get a lot of good information about this subject.

What is a Bookmark Lookup in Sql Server?

I'm in the process of trying to optimize a query that looks up historical data. I'm using the query analyzer to lookup the Execution Plan and have found that the majority of my query cost is on something called a "Bookmark Lookup". I've never seen this node in an execution plan before and don't know what it means.
Is this a good thing or a bad thing in a query?
A bookmark lookup is the process of finding the actual data in the SQL table, based on an entry found in a non-clustered index.
When you search for a value in a non-clustered index, and your query needs more fields than are part of the index leaf node (all the index fields, plus any possible INCLUDE columns), then SQL Server needs to go retrieve the actual data page(s) - that's what's called a bookmark lookup.
In some cases, that's really the only way to go - only if your query would require just one more field (not a whole bunch of 'em), it might be a good idea to INCLUDE that field in the non-clustered index. In that case, the leaf-level node of the non-clustered index would contain all fields needed to satisfy your query (a "covering" index), and thus a bookmark lookup wouldn't be necessary anymore.
Marc
It's a NESTED LOOP which joins a non-clustered index with the table itself on a row pointer.
Happens for the queries like this:
SELECT col1
FROM table
WHERE col2 BETWEEN 1 AND 10
, if you have an index on col2.
The index on col2 contains pointers to the indexed rows.
So, in order to retrieve the value of col1, the engine needs to scan the index on col2 for the key values from 1 to 10, and for each index leaf, refer to the table itself using the pointer contained in the leaf, to find out the value of col1.
This article points out that a Bookmark Lookup is SQL Server 2000's term, which is replaced by NESTED LOOP's between the index and the table in SQL Server 2005 and above
From MSDN regarding Bookmark Lookups:
The Bookmark Lookup operator uses a
bookmark (row ID or clustering key) to
look up the corresponding row in the
table or clustered index. The Argument
column contains the bookmark label
used to look up the row in the table
or clustered index. The Argument
column also contains the name of the
table or clustered index in which the
row is looked up. If the WITH PREFETCH
clause appears in the Argument column,
the query processor has determined
that it is optimal to use asynchronous
prefetching (read-ahead) when looking
up bookmarks in the table or clustered
index.

Resources