Index scan on primary key(ID) - sql-server

Query
SELECT MAX(ID) FROM Product
https://www.brentozar.com/pastetheplan/?id=Skv5OqZBU
Why optimizer uses index scan even if query is based on primary key(ID) ?

If you read the details inside the "index scan" node of your plan, you will find that it only expects 1 row returned. Scanning for 1 row from the end in fact has a better performance than an index seek.
The physical structure of a MSSQL index is a B+ tree. By index seek, it means you starts from the tree part to locate the item in O(log N). By index scan, it means you starts from the data part to locate the items. This time you read one, so it is O(1).
So your query is in fact performing very fast now.

Related

Is an Index seek operation more costly compared to index scan when the data volume is high?

I understand that table scan looks at every record in a table. And a table seek looks at specific records.
Also, an Index scan/seek concept is same as above with the difference being that the values are in sorted order.
Question: Is an Index seek operation more costly compared to index scan when the data volume of the item being searched is high? and why?
Example: Lets say statistics are stale and the estimated rows is 100 but actual rows are 100000. The engine decides to use index seek. Will this more costly than index scan and why?
SELECT StockItemID
FROM Examples.OrderLines
WHERE StockItemID = 1;
I am referring to a book "Exam Ref 70-762 Developing SQL Databases" which has this example and it reads on page number 338 as: "Because this (stale statistics) value is relatively low, the query optimizer generated a plan using an index seek, which could be less optimal than performing a scan when data volumes are high". I am trying to understand why a seek is considered expensive.
You will never see SQL Server choose a scan for this query if you have an index on StockItemID as this covers the query and there is no "tipping point" issue.
It will always choose a seek even if it estimates that 100% of the rows match.
Example
CREATE TABLE OrderLines
(
OrderID INT IDENTITY PRIMARY KEY,
StockItemID INT INDEX IX1
);
INSERT INTO OrderLines
(StockItemID)
SELECT 1
FROM sys.all_objects
SELECT StockItemID
FROM OrderLines
WHERE StockItemID = 1;
In the case that the seek returns all the rows in the table the only difference between a seek and an index ordered scan is how the first row is located (by navigating the depth of the B tree or simply going to the first index page from metadata). This is likely to be negligible.
One edge case where a scan may perform better would be if an allocation ordered scan was preferable and you are running with a table lock or nolock so this becomes a viable option.

Why does a index seek becomes more expensive than an index scan

I have a basic question about Index Scan/Seek. Index scan is effective when there are large number of rows to be fetched. That is index scan cost is inversely proportional to the number of rows returned. That is less the number of rows more expensive the query becomes as it has to scan all the pages which results in more IO.
I have searched the reason why seeks become more expensive than scans but I am not able to get the reason why seek becomes expensive.
What I am confused is with Index seek. Why index seek becomes expensive with more number of rows returned. Index seek will always be faster and efficient than scans as it directly touches the pages that contain the rows. So even with large number of rows returned index seek should always be efficent than Index scan. But this does not happen. I want to know exactly why at some point seek becomes expensive.
select id,name,col1,col2
from TableA -- Will result in index scan. Table has 10000 rows with clustered index on ID column. Query has covering index.
select id,name,col1,col2
where ID between 1 and 10
from TableA -- Optimizer Will use index Seek.
Now why does the below query becomes expensive when index seek is forced upon -
select id,name,col1,col2
from TableA with (forceseek)
The reason why Clustered index Seek is expensive than Index scan is because Index seek starts reading the B tree right from Root nodes to Leaf nodes. This involves reading the index and pages inside the Leaf nodes. Hence results in more IO. So when selectivity is less optimizer chooses index scan instead of Index seek. Seek is better only when records returned are not more than 2 to 3%.

using sql server index to better performance

I have 2 tables (UserLog and UserInfo) on which there is a nonclustered index on User_UID column which is a unique identifier.
I have a lot of select queries that join these 2 tables on the User_UID column.
There is no cluster index on these tables; so to improve read performance I decide to create, a new column User_ID and then create a cluster index on this column, on each table.
I then tested the new architecture and I obtained great results since I decrease the logical read on both tables since the query optimiser don't use anymore a RID lookup in order to retreive the remain informations. Instead it use only the cluster index seek.
I obtained these good results only when the pages are already in memory cache, i.e. after 2 executions. However if I clean the cache (dbcc dropcleanbuffers) the first execution of the select query give also less logical read but the elapsed time is greater than it was when I execute the same query whith the old architecture (without the clustered index) just after cleaning cache.
So my question is why the elapsed time with the new architecture increase after cleaning the cache. Is it because in the first execution all data have to go into the memory cache and since in the cluster index we have more data than in non cluster index it takes more time??
Thanks in advance
Regardless, you should have a clustered index on your table. If you don't, you have a heap and that requires scans through the leaf level of the table. With a clustered index, your table is now sorted into a b-tree that is used for navigation to the leaf level and is more efficient.
By blowing out the the buffer, whether you have a seek on a clustered index or scan on a heap, the pages will be pulled from disk and that takes time.

Clustered Index Scan instead of Clustered Index Seek on left join

SELECT *
FROM
tbl_transaction t
LEFT JOIN
tbl_transaction_hsbc ht
ON
t.transactionid = ht.transactionid
transactionid on both tables is the primary key so why no index seek?
Maybe it's the SELECT * ... and maybe because you're returning the entire table, there is no advantage to seeking. What do you want a seek to do, seek incrementally to every row? A scan is much more efficient.
I realize you've probably read or been told to avoid scans at all cost. I think there needs to be more context associated with that. Sometimes a scan is the right answer and the most efficient path to the data. If there query is slow, perhaps you could show an actual execution plan, and we can help pinpoint the problem. But the answer isn't going to be forcing this query to use a seek.
No index seek because you don't have a where clause.
Index seek means you check a range of values in the index.
As you have no where clause, there is no other choice but to scan all the index values.
So the name "index scan".
It's not a table scan, it's an index scan. If you don't have an index on this column in one of the tables, you'll have table scan for second table + index scan for first table.

SQL Server Index cost

I have read that one of the tradeoffs for adding table indexes in SQL Server is the increased cost of insert/update/delete queries to benefit the performance of select queries.
I can conceptually understand what happens in the case of an insert because SQL Server has to write entries into each index matching the new rows, but update and delete are a little more murky to me because I can't quite wrap my head around what the database engine has to do.
Let's take DELETE as an example and assume I have the following schema (pardon the pseudo-SQL)
TABLE Foo
col1 int
,col2 int
,col3 int
,col4 int
PRIMARY KEY (col1,col2)
INDEX IX_1
col3
INCLUDE
col4
Now, if I issue the statement
DELETE FROM Foo WHERE col1=12 AND col2 > 34
I understand what the engine must do to update the table (or clustered index if you prefer). The index is set up to make it easy to find the range of rows to be removed and do so.
However, at this point it also needs to update IX_1 and the query that I gave it gives no obvious efficient way for the database engine to find the rows to update. Is it forced to do a full index scan at this point? Does the engine read the rows from the clustered index first and generate a smarter internal delete against the index?
It might help me to wrap my head around this if I understood better what is going on under the hood, but I guess my real question is this. I have a database that is spending a significant amount of time in delete and I'm trying to figure out what I can do about it.
When I display the execution plan for the deletion, it just shows an entry for "Clustered Index Delete" on table Foo which lists in the details section the other indices that need to be updated but I don't get any indication of the relative cost of these other indices.
Are they all equal in this case? Is there some way that I can estimate the impact of removing one or more of these indices without having to actually try it?
Nonclustered indexes also store the clustered keys.
It does not have to do a full scan, since:
your query will use the clustered index to locate rows
rows contain the other index value (c3)
using the other index value (c3) and the clustered index values (c1,c2), it can locate matching entries in the other index.
(Note: I had trouble interpreting the docs, but I would imagine that IX_1 in your case could be defined as if it was also sorted on c1,c2. Since these are already stored in the index, it would make perfect sense to use them to more efficiently locate records for e.g. updates and deletes.)
All this, however has a cost. For each matching row:
it has to read the row, to find out the value for c3
it has to find the entry for (c3,c1,c2) in the nonclustered index
it has to delete the entry from there as well.
Furthermore, while the range query can be efficient on the clustered index in your case (linear access, after finding a match), maintenance of the other indexes will most likely result in random access to them for every matching row. Random access has a much higher cost than just enumerating B+ tree leaf nodes starting from a given match.
Given the above query, more time is spent on the non-clustered index maintenance - the amount depends heavily on the number of records selected by the col1 = 12 AND col2 > 34
predicate.
My guess is that the cost is conceptually the same as if you did not have a secondary index but had e.g. a separate table, holding (c3,c1,c2) as the only columns in a clustered key and you did a DELETE for each matching row using (c3,c1,c2). Obviously, index maintenance is internal to SQL Server and is faster, but conceptually, I guess the above is close.
The above would mean that maintenance costs of indexes would stay pretty close to each other, since the number of entries in each secondary index is the same (the number of records) and deletion can proceed only one-by-one on each index.
If you need the indexes, performance-wise, depending on the number of deleted records, you might be better off scheduling the deletes, dropping the indexes - that are not used during the delete - before the delete and adding them back after. Depending on the number of records affected, rebuilding the indexes might be faster.

Resources