I have a update query that runs slow (see first query below). I have an index created on the table PhoneStatus and column PhoneID that is named IX_PhoneStatus_PhoneID. The Table PhoneStatus contains 20 million records. When I run the following query, the index is not used and a Clustered Index Scan is used and in-turn the update runs slow.
UPDATE PhoneStatus
SET RecordEndDate = GETDATE()
WHERE PhoneID = 126
If I execute the following query, which includes the new FROM, I still have the same problem with the index not used.
UPDATE PhoneStatus
SET RecordEndDate = GETDATE()
FROM Cust_Profile.PhoneStatus
WHERE PhoneID = 126
But if I add the HINT to force the use of the index on the FROM it works correctly, and an Index Seek is used.
UPDATE PhoneStatus
SET RecordEndDate = GETDATE()
FROM Cust_Profile.PhoneStatus WITH(INDEX(IX_PhoneStatus_PhoneID))
WHERE PhoneID = 126
Does anyone know why the first query would not use the Index?
Update
In the table of 20 million records, each phoneID could show up 10 times at the most
BarDev
How many distinct PhoneIDs are in the 20M table? If the condition where PhoneID=126 is not selective enough, you may be hitting the index tipping point. If this query and access condition is very frequent, PhoneID is a good candidate for a clustered index leftmost key.
Pablo is correct, SQL Server will use an index only if it thinks this will run the query more efficiently. But with 20 million rows it should have known to use the index. I would imagine that you simply need to update statistics on the database.
Form more information, see http://msdn.microsoft.com/en-us/library/aa260645(SQL.80).aspx.
Take a look at Is an index seek always better or faster than an index scan?
Sometimes a seek and a scan will be the exact same.
An index might be disregarded because your stats could be stale or the selectivity of the index is so low that SQL Server thinks a scan will be better
Turn on stats and see if there are any differences between the query with and without a seek
SET STATISTICS io ON
UPDATE PhoneStatus
SET RecordEndDate = GETDATE()
WHERE PhoneID = 126
UPDATE PhoneStatus
SET RecordEndDate = GETDATE()
FROM Cust_Profile.PhoneStatus WITH(INDEX(IX_PhoneStatus_PhoneID))
WHERE PhoneID = 126
Now look at the reads that came back
SQLServer (or any other SQL Server product for that matter) if not forced to use any index at all. It will use it, if it thinks will help running the query more efficiently.
So, in your case, SQLServer is thinking that it doesn't need using IX_PhoneStatus_PhoneID and by using its clustered index might get better results. It might be wrong though, that's what index hints are for: letting the Server know it would do a better job by using other index.
If your table was recently created and populated, it might be the case that statistics are somewhat outdated. So you might want to force a statistic update.
To restate:
You have table PhoneStatus
With a clustered index
And a non-clustered index on columns PhoneStatus and PhoneId, in that order
You are issuing an update with "...WHERE PhoneId = 126"
There are 20 million rows in the table (i.e. it's big and then some)
SQL will take your query and try to figure out how to do the work without working over
the whole table. For your non-clustered index, the data might look like:
PhoneStatus PhoneID
A 124
A 125
A 126
B 127
C 128
C 129
C 130
etc.
The thing is, SQL will check the first column first, before it checks the value
of the second column. As the first column is not specified in the update, SQL
cannot "shortcut" through the index search tree to the relevant entries, and so will have to scan the entire table. (No, SQL is not clever enough to say "eh, I'll just check
the second column first", and yes, they're right to have done it that way.)
Since the non-clustered index won't make the query faster, it defaults to a table
scan -- and since there is a clustered index, that means it instead becomse a clustered index scan. (If the clustered index is on PhoneId, then you'd have optimal performance on your query, but I'm guessing that's not the case here.)
When you use the hint, it forces the use the non-clustered index, and that will be faster
than the full table scan if the table has a lot more columns than the index (which
essentially has only the two), because there'd be that much less data to sift through.
Related
After running the following query:
SELECT [hour], count(*) as hits, avg(elapsed)
FROM myTable
WHERE [url] IS NOT NULL and floordate >= '2017-05-01'
group by [hour]
the execution plan is basically a clustered Index Scan on the PK (int, auto-increment, 97% of the work)
The thing is: URL has a index on it (regular index because i'm always searching for a exact match), floordate also has an index...
Why are they not being used? How can i speed up this query?
PS: table is 70M items long and this query takes about 9 min to run
Edit 1
If i don't use (select or filter) a column on my index, will it still be used? Usually i also filter-for/group-by clientId (approx 300 unique across the db) and hour (24 unique)...
In this scenario, two things affect how SQL Server will choose an index.
How selective is the index. A higher selectivity is better. NULL/NOT NULL filters generally have a very low selectivity.
Are all of the columns in the index, also known as a covering index.
In your example, if the index cannot cover the query, SQL will have to look up the other column values against the base table. If your URL/Floordate combination is not selective enough, SQL may determine it is cheaper to scan the base table rather than do an expensive lookup from the non-clustered index to the base table for a large number of rows.
Without knowing anything else about your schema, I'd recommend an index with the following columns:
floordate, url, hour; include elapsed
Date ranges scans are generally more selective than a NULL/NOT NULL test. Moving Floordate to the front may make this index more desirable for this query. If SQL determines the query is good for Floordate and URL, the Hour column can be used for the Group By action. Since Elapsed is included, this index can cover the query completely.
You can include ClientID after hour to see if that helps your other query as well.
As long as an index contains all of the columns to resolve the query, it is a candidate for use, even if there is no filtering needed. Generally speaking, a non-clustered index is skinnier than the base table, requiring less IO than scanning the full width base table.
I'm a bit unsure over the best index to use for a particular column in my table.
I have a [Deleted] column that is a DateTime, and represents the moment that the record was "deleted" from the system (it's a soft delete, so the physical record itself is not deleted).
Almost all queries hitting the table will have a 'WHERE [Deleted] IS NULL' filter to it. I am not worried about the performance of queries that do no have this filter.
What would be the best index to construct for this scenario? A filtered Index WHERE [Deleted] IS NULL? An index on a computed column of definition IsDeleted = Deleted IS NOT NULL ? I'm a bit unsure.
As you have said almost all the queries will have this clause in them. I would suggest to add a BIT column to your table, once a row is deleted (Soft deletion) set it to 1. Also keep this Datetime column probably as Deleted_Datetime to keep track of when records are being deleted.
Then Create a Filtered Index on that BIT field something like
CREATE NONCLUSTERED INDEX NCIX_Deleted_Data
ON dbo.Table(Deleted)
WHERE Deleted = 0
Bit being the smallest datatype, and having a filtered index on it will give you a massive performance boost.
You can create this filtered index on your Datetime column as well. But the logic is in that case sql server will have to go through 8 bytes of data on index to see if row is filtered or not.
Whereas if you have Bit column sql server will have to go through 1 byte of data to filter it out or to keep it.
Smaller data quick processing, Bigger data more time to process it :).
This is a lost battle. You can add a filter to all other indexes, see filtered indexes. But you won't be able to filter the clustered index.
A filter on Deleted column will be too unselective for NULLs and all queries would hit the index tipping point. And a bit Deleted column is worse, with a 0/1 selectivity it will be always ignored.
I would recommend you investigate partitioned views instead. You can store current data in one table, deleted data in another one, and operate over a view that unions the two. When properly designed (this is why reading the link is critical, to understand the 'proper' part) this scheme works fine and you never scan the 'cold' data.
We have a view that is used to lookup a record in a table by clustered index. The view also has a couple of subqueries in the select statement that lookup data in two large tables, also by clustered index.
To hugely simplify it would be something like this:
SELECT a,
(SELECT b FROM tableB where tableB.a=tableA.a) as b
(SELECT c FROM tableC where tableC.a=tableA.a) as c
FROM tableA
Most lookups to [tableB] correctly use a non-clustered index on [tableB] and work very efficiently. However, very occasionally SQL Server, in generating an execution plan, has instead used an index on [tableB] that doesn't contain the value being passed through. So, following the example above, although an index of column [a] exists on tableB, the plan instead does a scan of a clustered index that has column [z]. Using SQL's own language the plan's "predicate is not relevant to the object". I can't see why this would ever be practical. As a result, when SQL does this, it has to scan every record in the index, because it would never exist, taking up to 30 seconds. It just seems plain wrong, always.
Has any one seen this before, where an execution plan does something that looks like it could never be right? I am going to rewrite the query anyway, so my concern is less about the structure of the query, but more as to why SQL would ever get it that wrong.
I know sometimes SQL Server can choose a plan that worked once and it can become inefficient as the dataset changes but in this case it could never work.
Further information
[tableB] has 4 million records, and most values for [a] are null
I'm unable now to get hold of the initial query that generated the plan
These queries are run through Coldfusion but at this time I'm interested in anyone having seen this independently in SQL Server
It just seems plain wrong, always.
You might be interested in the First Rule of Programming.
So, following the example above, although an index of column [a]
exists on tableB, the plan instead does a scan of a clustered index
that has column [z].
A clustered index always includes all rows. It might be ordered by z, but it will still contain all other columns at the leaf level.
The reason SQL Server sometimes prefers a clustered scan over an index seek is this. When you do an index seek, you have to follow it up with a bookmark lookup to the clustered index to retrieve columns that are not in the index.
When you do a clustered index scan, you by definition find all columns. That means no bookmark lookup is required.
When SQL Server expects many rows, it tries to avoid the bookmark lookups. This is a time-tested choice. Nonclustered index seeks are routinely beaten by clustered index scans.
You can test this for your case by forcing either with the with (index(IX_YourIndex)) query hint.
I have read that one of the tradeoffs for adding table indexes in SQL Server is the increased cost of insert/update/delete queries to benefit the performance of select queries.
I can conceptually understand what happens in the case of an insert because SQL Server has to write entries into each index matching the new rows, but update and delete are a little more murky to me because I can't quite wrap my head around what the database engine has to do.
Let's take DELETE as an example and assume I have the following schema (pardon the pseudo-SQL)
TABLE Foo
col1 int
,col2 int
,col3 int
,col4 int
PRIMARY KEY (col1,col2)
INDEX IX_1
col3
INCLUDE
col4
Now, if I issue the statement
DELETE FROM Foo WHERE col1=12 AND col2 > 34
I understand what the engine must do to update the table (or clustered index if you prefer). The index is set up to make it easy to find the range of rows to be removed and do so.
However, at this point it also needs to update IX_1 and the query that I gave it gives no obvious efficient way for the database engine to find the rows to update. Is it forced to do a full index scan at this point? Does the engine read the rows from the clustered index first and generate a smarter internal delete against the index?
It might help me to wrap my head around this if I understood better what is going on under the hood, but I guess my real question is this. I have a database that is spending a significant amount of time in delete and I'm trying to figure out what I can do about it.
When I display the execution plan for the deletion, it just shows an entry for "Clustered Index Delete" on table Foo which lists in the details section the other indices that need to be updated but I don't get any indication of the relative cost of these other indices.
Are they all equal in this case? Is there some way that I can estimate the impact of removing one or more of these indices without having to actually try it?
Nonclustered indexes also store the clustered keys.
It does not have to do a full scan, since:
your query will use the clustered index to locate rows
rows contain the other index value (c3)
using the other index value (c3) and the clustered index values (c1,c2), it can locate matching entries in the other index.
(Note: I had trouble interpreting the docs, but I would imagine that IX_1 in your case could be defined as if it was also sorted on c1,c2. Since these are already stored in the index, it would make perfect sense to use them to more efficiently locate records for e.g. updates and deletes.)
All this, however has a cost. For each matching row:
it has to read the row, to find out the value for c3
it has to find the entry for (c3,c1,c2) in the nonclustered index
it has to delete the entry from there as well.
Furthermore, while the range query can be efficient on the clustered index in your case (linear access, after finding a match), maintenance of the other indexes will most likely result in random access to them for every matching row. Random access has a much higher cost than just enumerating B+ tree leaf nodes starting from a given match.
Given the above query, more time is spent on the non-clustered index maintenance - the amount depends heavily on the number of records selected by the col1 = 12 AND col2 > 34
predicate.
My guess is that the cost is conceptually the same as if you did not have a secondary index but had e.g. a separate table, holding (c3,c1,c2) as the only columns in a clustered key and you did a DELETE for each matching row using (c3,c1,c2). Obviously, index maintenance is internal to SQL Server and is faster, but conceptually, I guess the above is close.
The above would mean that maintenance costs of indexes would stay pretty close to each other, since the number of entries in each secondary index is the same (the number of records) and deletion can proceed only one-by-one on each index.
If you need the indexes, performance-wise, depending on the number of deleted records, you might be better off scheduling the deletes, dropping the indexes - that are not used during the delete - before the delete and adding them back after. Depending on the number of records affected, rebuilding the indexes might be faster.
I am working on optimizing a SQL query that goes against a very wide table in a legacy system. I am not able to narrow the table at this point for various reasons.
My query is running slowly because it does an Index Seek on an Index I've created, and then uses a Bookmark Lookup to find the additional columns it needs that do not exist in the Index. The bookmark lookup takes 42% of the query time (according to the query optimizer).
The table has 38 columns, some of which are nvarchars, so I cannot make a covering index that includes all the columns. I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used.
Also, since 28 of the 38 columns are pulled out via this query, I'd have 28/38 of the columns in the table stored in these covering indexes, so I'm not sure how much this would help.
Do you think a Bookmark Lookup is as good as it is going to get, or what would another option be?
(I should specify that this is SQL Server 2000)
OH,
the covering index with include should work. Another option might be to create a clustered indexed view containing only the columns you need.
Regards,
Lieven
You could create an index with included columns as another option
example from BOL, this is for 2005 and up
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);
To answer this part "I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used."
An index can only be used when the query is created in a way that it is sargable, in other words if you use function on the left side of the operator or leave out the first column of the index in your WHERE clause then the index won't be used. If the selectivity of the index is low then also the index won't be used
Check out SQL Server covering indexes for some more info