How does sql server look up in composite non-clustered index? - sql-server

If for example I have composite non-clustered index as following:
CREATE NONCLUSTERED INDEX idx_Test ON dbo.Persons(IsActive, UserName)
Depending on this answer How important is the order of columns in indexes?
If I run this query :
Select * From Persons Where UserName='Smith'
In the query above IsActive which its order=1 in the non-clustered index is not present. Does that mean Sql Server query optimizer will ignore looking up in the index because IsActive is not present or what?
Of course I can just test it and check the execution plan, and I will do that, but I'm also curious about the theory behind it. When does cardinality matter and when does it not?

SQLServer will scan the total index ,in this case it might be narrowest index..
Below is a small example on orders table i have
Query predicate (shipperid='G') satisfies 199748 rows,but sql server has to read total rows (998123) to get data.This is visible from the number of rows read to actual number of rows.
I found this from Craig freedman to be very usefull..Assuming you have index on (a,b)..SQLServer can effectively do below
a=somevalue and b=somevalue
a=someval and b>0
a=someval and b>=0
for below operations,sql server will choose to filter out as many as rows possible by first predicate(This is also the reason you might have heard to keep a column with more unique values first) and will use second predicate as a residual
- a>=somevalue and b=someval
for below case,sql server has to scan the entire index..
b=someval
Further reading :
Craig Freedman's SQL Server Blog :Seek Predicates
Probe Residual when you have a Hash Match – a hidden cost in execution plans:Rob Farley
The Tipping Point Query Answers:Kimberly L. Tripp

Related

Sql Server - Index not being used

After running the following query:
SELECT [hour], count(*) as hits, avg(elapsed)
FROM myTable
WHERE [url] IS NOT NULL and floordate >= '2017-05-01'
group by [hour]
the execution plan is basically a clustered Index Scan on the PK (int, auto-increment, 97% of the work)
The thing is: URL has a index on it (regular index because i'm always searching for a exact match), floordate also has an index...
Why are they not being used? How can i speed up this query?
PS: table is 70M items long and this query takes about 9 min to run
Edit 1
If i don't use (select or filter) a column on my index, will it still be used? Usually i also filter-for/group-by clientId (approx 300 unique across the db) and hour (24 unique)...
In this scenario, two things affect how SQL Server will choose an index.
How selective is the index. A higher selectivity is better. NULL/NOT NULL filters generally have a very low selectivity.
Are all of the columns in the index, also known as a covering index.
In your example, if the index cannot cover the query, SQL will have to look up the other column values against the base table. If your URL/Floordate combination is not selective enough, SQL may determine it is cheaper to scan the base table rather than do an expensive lookup from the non-clustered index to the base table for a large number of rows.
Without knowing anything else about your schema, I'd recommend an index with the following columns:
floordate, url, hour; include elapsed
Date ranges scans are generally more selective than a NULL/NOT NULL test. Moving Floordate to the front may make this index more desirable for this query. If SQL determines the query is good for Floordate and URL, the Hour column can be used for the Group By action. Since Elapsed is included, this index can cover the query completely.
You can include ClientID after hour to see if that helps your other query as well.
As long as an index contains all of the columns to resolve the query, it is a candidate for use, even if there is no filtering needed. Generally speaking, a non-clustered index is skinnier than the base table, requiring less IO than scanning the full width base table.

SQL Server 2005 cached an execution plan that could never work

We have a view that is used to lookup a record in a table by clustered index. The view also has a couple of subqueries in the select statement that lookup data in two large tables, also by clustered index.
To hugely simplify it would be something like this:
SELECT a,
(SELECT b FROM tableB where tableB.a=tableA.a) as b
(SELECT c FROM tableC where tableC.a=tableA.a) as c
FROM tableA
Most lookups to [tableB] correctly use a non-clustered index on [tableB] and work very efficiently. However, very occasionally SQL Server, in generating an execution plan, has instead used an index on [tableB] that doesn't contain the value being passed through. So, following the example above, although an index of column [a] exists on tableB, the plan instead does a scan of a clustered index that has column [z]. Using SQL's own language the plan's "predicate is not relevant to the object". I can't see why this would ever be practical. As a result, when SQL does this, it has to scan every record in the index, because it would never exist, taking up to 30 seconds. It just seems plain wrong, always.
Has any one seen this before, where an execution plan does something that looks like it could never be right? I am going to rewrite the query anyway, so my concern is less about the structure of the query, but more as to why SQL would ever get it that wrong.
I know sometimes SQL Server can choose a plan that worked once and it can become inefficient as the dataset changes but in this case it could never work.
Further information
[tableB] has 4 million records, and most values for [a] are null
I'm unable now to get hold of the initial query that generated the plan
These queries are run through Coldfusion but at this time I'm interested in anyone having seen this independently in SQL Server
It just seems plain wrong, always.
You might be interested in the First Rule of Programming.
So, following the example above, although an index of column [a]
exists on tableB, the plan instead does a scan of a clustered index
that has column [z].
A clustered index always includes all rows. It might be ordered by z, but it will still contain all other columns at the leaf level.
The reason SQL Server sometimes prefers a clustered scan over an index seek is this. When you do an index seek, you have to follow it up with a bookmark lookup to the clustered index to retrieve columns that are not in the index.
When you do a clustered index scan, you by definition find all columns. That means no bookmark lookup is required.
When SQL Server expects many rows, it tries to avoid the bookmark lookups. This is a time-tested choice. Nonclustered index seeks are routinely beaten by clustered index scans.
You can test this for your case by forcing either with the with (index(IX_YourIndex)) query hint.

Composite and Covering index

What is the diffrence between composite index and covering index in Sql Server ?
A covering index is a composite index that contains every column you are currently retrieving with your select statement and that participates in the where clause. It is one of the best ways to improve query performance substantially.
A covering index is a composite index that covers (hence the name) all columns that are necessary to fulfill a query or a join condition.
There is nothing special about SQL server here, these are generic designations.
A composite index is also a covering index when the index contains your search criteria and all the data your query is attempting to retrieve. In this example:
SELECT a,b,c FROM Foo WHERE a = 'FooFoo'
A covering index would contain column a (your search predicate) as well as the columns b and c.
In this case SQL Server is optimized to return those values found in the index and does not need to make an additional look up in the actual table. If b and c are frequently returned but rarely searched on then the index might be set up such that b and c are included in the index but not indexed.
Before SQL Server 2005 DBA's would add additional 'covering' columns to their indexes to achieve this optimization. In SQL Server 2005 an additional feature was added that allowed you to include covering columns in the leaf nodes of the index that were not part of the index tree. When creating an index you can specify additional 'covering' columns in the include clause. These columns will not be indexed but added to the leaf node of the index saving SQL Server from looking up the additional data in the main table. Adding the data to the include clause saves SQL Server the overhead of adding the additional data to the search tree while gaining the optimization that a covering index brings.

What is a Bookmark Lookup in Sql Server?

I'm in the process of trying to optimize a query that looks up historical data. I'm using the query analyzer to lookup the Execution Plan and have found that the majority of my query cost is on something called a "Bookmark Lookup". I've never seen this node in an execution plan before and don't know what it means.
Is this a good thing or a bad thing in a query?
A bookmark lookup is the process of finding the actual data in the SQL table, based on an entry found in a non-clustered index.
When you search for a value in a non-clustered index, and your query needs more fields than are part of the index leaf node (all the index fields, plus any possible INCLUDE columns), then SQL Server needs to go retrieve the actual data page(s) - that's what's called a bookmark lookup.
In some cases, that's really the only way to go - only if your query would require just one more field (not a whole bunch of 'em), it might be a good idea to INCLUDE that field in the non-clustered index. In that case, the leaf-level node of the non-clustered index would contain all fields needed to satisfy your query (a "covering" index), and thus a bookmark lookup wouldn't be necessary anymore.
Marc
It's a NESTED LOOP which joins a non-clustered index with the table itself on a row pointer.
Happens for the queries like this:
SELECT col1
FROM table
WHERE col2 BETWEEN 1 AND 10
, if you have an index on col2.
The index on col2 contains pointers to the indexed rows.
So, in order to retrieve the value of col1, the engine needs to scan the index on col2 for the key values from 1 to 10, and for each index leaf, refer to the table itself using the pointer contained in the leaf, to find out the value of col1.
This article points out that a Bookmark Lookup is SQL Server 2000's term, which is replaced by NESTED LOOP's between the index and the table in SQL Server 2005 and above
From MSDN regarding Bookmark Lookups:
The Bookmark Lookup operator uses a
bookmark (row ID or clustering key) to
look up the corresponding row in the
table or clustered index. The Argument
column contains the bookmark label
used to look up the row in the table
or clustered index. The Argument
column also contains the name of the
table or clustered index in which the
row is looked up. If the WITH PREFETCH
clause appears in the Argument column,
the query processor has determined
that it is optimal to use asynchronous
prefetching (read-ahead) when looking
up bookmarks in the table or clustered
index.

Index Seek with Bookmark Lookup Only Option for SQL Query?

I am working on optimizing a SQL query that goes against a very wide table in a legacy system. I am not able to narrow the table at this point for various reasons.
My query is running slowly because it does an Index Seek on an Index I've created, and then uses a Bookmark Lookup to find the additional columns it needs that do not exist in the Index. The bookmark lookup takes 42% of the query time (according to the query optimizer).
The table has 38 columns, some of which are nvarchars, so I cannot make a covering index that includes all the columns. I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used.
Also, since 28 of the 38 columns are pulled out via this query, I'd have 28/38 of the columns in the table stored in these covering indexes, so I'm not sure how much this would help.
Do you think a Bookmark Lookup is as good as it is going to get, or what would another option be?
(I should specify that this is SQL Server 2000)
OH,
the covering index with include should work. Another option might be to create a clustered indexed view containing only the columns you need.
Regards,
Lieven
You could create an index with included columns as another option
example from BOL, this is for 2005 and up
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);
To answer this part "I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used."
An index can only be used when the query is created in a way that it is sargable, in other words if you use function on the left side of the operator or leave out the first column of the index in your WHERE clause then the index won't be used. If the selectivity of the index is low then also the index won't be used
Check out SQL Server covering indexes for some more info

Resources