We have a view that is used to lookup a record in a table by clustered index. The view also has a couple of subqueries in the select statement that lookup data in two large tables, also by clustered index.
To hugely simplify it would be something like this:
SELECT a,
(SELECT b FROM tableB where tableB.a=tableA.a) as b
(SELECT c FROM tableC where tableC.a=tableA.a) as c
FROM tableA
Most lookups to [tableB] correctly use a non-clustered index on [tableB] and work very efficiently. However, very occasionally SQL Server, in generating an execution plan, has instead used an index on [tableB] that doesn't contain the value being passed through. So, following the example above, although an index of column [a] exists on tableB, the plan instead does a scan of a clustered index that has column [z]. Using SQL's own language the plan's "predicate is not relevant to the object". I can't see why this would ever be practical. As a result, when SQL does this, it has to scan every record in the index, because it would never exist, taking up to 30 seconds. It just seems plain wrong, always.
Has any one seen this before, where an execution plan does something that looks like it could never be right? I am going to rewrite the query anyway, so my concern is less about the structure of the query, but more as to why SQL would ever get it that wrong.
I know sometimes SQL Server can choose a plan that worked once and it can become inefficient as the dataset changes but in this case it could never work.
Further information
[tableB] has 4 million records, and most values for [a] are null
I'm unable now to get hold of the initial query that generated the plan
These queries are run through Coldfusion but at this time I'm interested in anyone having seen this independently in SQL Server
It just seems plain wrong, always.
You might be interested in the First Rule of Programming.
So, following the example above, although an index of column [a]
exists on tableB, the plan instead does a scan of a clustered index
that has column [z].
A clustered index always includes all rows. It might be ordered by z, but it will still contain all other columns at the leaf level.
The reason SQL Server sometimes prefers a clustered scan over an index seek is this. When you do an index seek, you have to follow it up with a bookmark lookup to the clustered index to retrieve columns that are not in the index.
When you do a clustered index scan, you by definition find all columns. That means no bookmark lookup is required.
When SQL Server expects many rows, it tries to avoid the bookmark lookups. This is a time-tested choice. Nonclustered index seeks are routinely beaten by clustered index scans.
You can test this for your case by forcing either with the with (index(IX_YourIndex)) query hint.
Related
After I created the indexed view, I tried disabling all the indexes in base tables including the indexes for foreign key column (constraint is still there) and the query plan for the view stays the same.
It is just like magic to me that the indexed view would be able to optimize the query so much even without base table being indexed. Even without any index on the View, SQL Server is able to do an index scan on the primary key index of the indexed view to retrieve data like 1000 times faster than using the base table.
Something like SELECT * FROM MyView WITH(NOEXPAND) WHERE NotIndexedColumn = 5 ORDER BY NotIndexedColumn
So the first two questions are:
Is there any benefit to index base tables of indexed view?
What is Sql server doing when it is doing a index scan on the PK while the constraint is on a not indexed column?
Then I noticed that if I use full-text search + order by I would see a table spool (eager spool) in the query plan with a cost like 95%.
Query looks like SELECT ID FROM View WITH(NOEXPAND) WHERE CONTAINS(IndexedColumn, '"SomeText*"') ORDER BY IndexedColumn
Question n° 3:
Is there any index I could add to get rid of that operation?
It's important to understand that an indexed view is a "materialized view" and the results are stored onto disk.
So the speedup you are seeing is the actual result of the query you are seeing stored to disk.
To answer your questions:
1) Is there any benefit to index base tables of indexed view?
This is situational. If your view is flattening out data or having many extra aggregate columns, then an indexed view is better than the table. If you are just using your indexed view like such
SELECT * FROM foo WHERE createdDate > getDate() then probably not.
But if you are doing SELECT sum(price),min(id) FROM x GROUP BY id,price then the indexed view would probably be better. Granted, you are doing a more complex query with joins and other advanced options.
2) What is Sql server doing when it is doing a index scan on the PK while the constraint is on a not indexed column?
First we need to understand how clustered indexes are stored. The index is stored in a B-tree. So SQL Server is walking the tree finding all values that match your criteria when you are searching on a clustered index Depending on how you have your indexes set up i.e covering vs non covering and how your non-clustered indexes are set up will determine what the Pages and Extents look like. Without more knowledge of the table structure I can't help you understand what the scan is actually doing.
3)Is there any index I could add to get rid of that operation?
Just because something is taking 95% of the query's time doesn't make that a bad thing. The query time needs to add up to 100%, so no matter what you do there is always going to be something taking up a large percentage of time. What you need to check is the IO reads and how much time the query itself takes.
To determine this, you need to understand that SQL Server caches the results of queries. With this in mind, you can have a query take a long time the first time but afterward since the data itself is cached it would be much quicker. It all depends on the frequency of the query and how your system is set up.
For a more in-depth read on indexed view
Imagine Foo table has non-clustered indexes on ColA and ColB
and NO Indexes on ColC, ColD
SELECT colA, colB
FROM Foo
takes about 30 seconds.
SELECT colA, colB, colC, colD
FROM Foo
takes about 2 minutes.
Foo table has more than 5 million rows.
Question:
Is it possible that including columns that are not part of the indexes can slow down the query?
If yes, WHY? -Are not they part of the already read PAGEs?
If you write a query that uses a covering index, then the full data pages in the heap/clustered index are not accessed.
If you subsequently add more columns to the query, such that the index is no longer covering, then either additional lookups will occur (if the index is still used), or you force a different data access path entirely (such as using a table scan instead of using an index)
Since 2005, SQL Server has supported the concept of Included Columns in an index. This includes non-key columns in the leaf of an index - so they're of no use during the data-lookup phase of index usage, but still help to avoid performing an additional lookup back in the heap/clustered index, if they're sufficient to make the index a covering index.
Also, in future, if you want to get a better understanding on why one query is fast and another is slow, look into generating Execution Plans, which you can then compare.
Even if you don't understand the terms used, you should at least be able to play "spot the difference" between them and then search on the terms (such as table scan, index seek, or lookup)
Simple answer is: because non-clustered index is not stored in the same page as data so SQL Server has to lookup actual data pages to pick up the rest.
Non-clustered index are stored in separate data structures while clustered indexes are stored in the same place as the actual data. That’s why you can have only one clustered index.
I am very beginner in SQL Server 2005 and I am learning it from online tutorial, here is some of my question:
1: What is the difference between Select * from XYZ and Select ALL * from XYZ.
2: The purpose of Clustered index is like to make the search easier by physically sorting the table [as far as I kknow :-)]. Let say if have primary column on a table than is it good to create a clustered index on the table? because we have already a column which is sorted.
3: Why we can create 1 Clustered Index + 249 Nonclustered Index = 250 Index on a table? I understand the requirement of 1 clustered index. But why 249?? Why not more than 249?
No difference SELECT ALL is the default as opposed to SELECT DISTINCT
Opinion varies. For performance reasons Clustered indexes should ideally be small, stable, unique, and monotonically increasing. Primary keys should also be stable and unique so there is an obvious fit there. However clustered indexes are well suited for range queries. Looking up individual records by PK can perform well if the PK is nonclustered so some authors suggest not "wasting" the clustered index on the PK.
In SQL Server 2008 you can create up to 999 NCIs on a table. I can't imagine ever doing so but I think the limit was raised as potentially with "filtered indexes" there might be a viable case for this many. Indexes add a cost to data modification operations though as the changes need to be propagated in multiple places so I would imagine it would only be largely read only (e.g. reporting) databases that ever achieve even double figures of non clustered non filtered indexes.
For 3:
Everytime when you insert/delete record in the table ALL indexes must be updated. If you will have too many indexes it takes too long time.
If your table have more then 5-6 indexes I think you need take the time and check yourself.
I have a update query that runs slow (see first query below). I have an index created on the table PhoneStatus and column PhoneID that is named IX_PhoneStatus_PhoneID. The Table PhoneStatus contains 20 million records. When I run the following query, the index is not used and a Clustered Index Scan is used and in-turn the update runs slow.
UPDATE PhoneStatus
SET RecordEndDate = GETDATE()
WHERE PhoneID = 126
If I execute the following query, which includes the new FROM, I still have the same problem with the index not used.
UPDATE PhoneStatus
SET RecordEndDate = GETDATE()
FROM Cust_Profile.PhoneStatus
WHERE PhoneID = 126
But if I add the HINT to force the use of the index on the FROM it works correctly, and an Index Seek is used.
UPDATE PhoneStatus
SET RecordEndDate = GETDATE()
FROM Cust_Profile.PhoneStatus WITH(INDEX(IX_PhoneStatus_PhoneID))
WHERE PhoneID = 126
Does anyone know why the first query would not use the Index?
Update
In the table of 20 million records, each phoneID could show up 10 times at the most
BarDev
How many distinct PhoneIDs are in the 20M table? If the condition where PhoneID=126 is not selective enough, you may be hitting the index tipping point. If this query and access condition is very frequent, PhoneID is a good candidate for a clustered index leftmost key.
Pablo is correct, SQL Server will use an index only if it thinks this will run the query more efficiently. But with 20 million rows it should have known to use the index. I would imagine that you simply need to update statistics on the database.
Form more information, see http://msdn.microsoft.com/en-us/library/aa260645(SQL.80).aspx.
Take a look at Is an index seek always better or faster than an index scan?
Sometimes a seek and a scan will be the exact same.
An index might be disregarded because your stats could be stale or the selectivity of the index is so low that SQL Server thinks a scan will be better
Turn on stats and see if there are any differences between the query with and without a seek
SET STATISTICS io ON
UPDATE PhoneStatus
SET RecordEndDate = GETDATE()
WHERE PhoneID = 126
UPDATE PhoneStatus
SET RecordEndDate = GETDATE()
FROM Cust_Profile.PhoneStatus WITH(INDEX(IX_PhoneStatus_PhoneID))
WHERE PhoneID = 126
Now look at the reads that came back
SQLServer (or any other SQL Server product for that matter) if not forced to use any index at all. It will use it, if it thinks will help running the query more efficiently.
So, in your case, SQLServer is thinking that it doesn't need using IX_PhoneStatus_PhoneID and by using its clustered index might get better results. It might be wrong though, that's what index hints are for: letting the Server know it would do a better job by using other index.
If your table was recently created and populated, it might be the case that statistics are somewhat outdated. So you might want to force a statistic update.
To restate:
You have table PhoneStatus
With a clustered index
And a non-clustered index on columns PhoneStatus and PhoneId, in that order
You are issuing an update with "...WHERE PhoneId = 126"
There are 20 million rows in the table (i.e. it's big and then some)
SQL will take your query and try to figure out how to do the work without working over
the whole table. For your non-clustered index, the data might look like:
PhoneStatus PhoneID
A 124
A 125
A 126
B 127
C 128
C 129
C 130
etc.
The thing is, SQL will check the first column first, before it checks the value
of the second column. As the first column is not specified in the update, SQL
cannot "shortcut" through the index search tree to the relevant entries, and so will have to scan the entire table. (No, SQL is not clever enough to say "eh, I'll just check
the second column first", and yes, they're right to have done it that way.)
Since the non-clustered index won't make the query faster, it defaults to a table
scan -- and since there is a clustered index, that means it instead becomse a clustered index scan. (If the clustered index is on PhoneId, then you'd have optimal performance on your query, but I'm guessing that's not the case here.)
When you use the hint, it forces the use the non-clustered index, and that will be faster
than the full table scan if the table has a lot more columns than the index (which
essentially has only the two), because there'd be that much less data to sift through.
I have read that one of the tradeoffs for adding table indexes in SQL Server is the increased cost of insert/update/delete queries to benefit the performance of select queries.
I can conceptually understand what happens in the case of an insert because SQL Server has to write entries into each index matching the new rows, but update and delete are a little more murky to me because I can't quite wrap my head around what the database engine has to do.
Let's take DELETE as an example and assume I have the following schema (pardon the pseudo-SQL)
TABLE Foo
col1 int
,col2 int
,col3 int
,col4 int
PRIMARY KEY (col1,col2)
INDEX IX_1
col3
INCLUDE
col4
Now, if I issue the statement
DELETE FROM Foo WHERE col1=12 AND col2 > 34
I understand what the engine must do to update the table (or clustered index if you prefer). The index is set up to make it easy to find the range of rows to be removed and do so.
However, at this point it also needs to update IX_1 and the query that I gave it gives no obvious efficient way for the database engine to find the rows to update. Is it forced to do a full index scan at this point? Does the engine read the rows from the clustered index first and generate a smarter internal delete against the index?
It might help me to wrap my head around this if I understood better what is going on under the hood, but I guess my real question is this. I have a database that is spending a significant amount of time in delete and I'm trying to figure out what I can do about it.
When I display the execution plan for the deletion, it just shows an entry for "Clustered Index Delete" on table Foo which lists in the details section the other indices that need to be updated but I don't get any indication of the relative cost of these other indices.
Are they all equal in this case? Is there some way that I can estimate the impact of removing one or more of these indices without having to actually try it?
Nonclustered indexes also store the clustered keys.
It does not have to do a full scan, since:
your query will use the clustered index to locate rows
rows contain the other index value (c3)
using the other index value (c3) and the clustered index values (c1,c2), it can locate matching entries in the other index.
(Note: I had trouble interpreting the docs, but I would imagine that IX_1 in your case could be defined as if it was also sorted on c1,c2. Since these are already stored in the index, it would make perfect sense to use them to more efficiently locate records for e.g. updates and deletes.)
All this, however has a cost. For each matching row:
it has to read the row, to find out the value for c3
it has to find the entry for (c3,c1,c2) in the nonclustered index
it has to delete the entry from there as well.
Furthermore, while the range query can be efficient on the clustered index in your case (linear access, after finding a match), maintenance of the other indexes will most likely result in random access to them for every matching row. Random access has a much higher cost than just enumerating B+ tree leaf nodes starting from a given match.
Given the above query, more time is spent on the non-clustered index maintenance - the amount depends heavily on the number of records selected by the col1 = 12 AND col2 > 34
predicate.
My guess is that the cost is conceptually the same as if you did not have a secondary index but had e.g. a separate table, holding (c3,c1,c2) as the only columns in a clustered key and you did a DELETE for each matching row using (c3,c1,c2). Obviously, index maintenance is internal to SQL Server and is faster, but conceptually, I guess the above is close.
The above would mean that maintenance costs of indexes would stay pretty close to each other, since the number of entries in each secondary index is the same (the number of records) and deletion can proceed only one-by-one on each index.
If you need the indexes, performance-wise, depending on the number of deleted records, you might be better off scheduling the deletes, dropping the indexes - that are not used during the delete - before the delete and adding them back after. Depending on the number of records affected, rebuilding the indexes might be faster.