Is a SqlProfiler Scan Started bad? - sql-server

If in SqlProfiler you can see that to execute a query a Scan is Started, does this mean a full table scan or can it just be a lookup? If it can be both, how can you tell which one of the two it is?

From the documentation:
The Scan:Started event class occurs when a table or index scan is started.
So it could be either one. The IndexID field will tell you if it is an index, and which one.
Not that it really matters very much. A clustered index scan basically is a table scan. A nonclustered index scan is better, but only a little. If you see any full scan, it means either (a) you're using non-sargable predicates or predicates on fields that aren't indexed, or (b) the predicate fields are indexed but the output columns aren't covered by the index, and the optimizer has decided that it is cheaper to perform a full scan than a bookmark/RID lookup.
Index scans aren't often much better than table scans, performance-wise, so you should try to eliminate whatever is leading to it, if possible.

Related

Query plan ignores index on large join

I have the following fairly simple query that returns about 1 million rows (I've left out columns as they are just for output), but the query plan doesn't seem to want to use the index and wants me to create one:
SELECT [SAU]
,nr.[Headend]
,[Source]
,[Destination]
,[FibreHop]
,[CableRef]
,[CableSectionRef]
,[nNGAFibres]
,[nEthFibres]
,[FromID]
,[ToID]
,[FromIDTerm]
,[ToIDTerm],Reversed
,#Now
FROM [NodeRouting] nr
join [TargetHeadends] tex ON nr.Headend=tex.Headend
The index is:
CREATE NONCLUSTERED INDEX [NodeRouting_Headend] ON [NodeRouting]
(
[Headend] ASC
)
the other table Headend is the PK
The query plan is this:
If I give it a hint to use the index already created (non-unique, non-clustered) on the id field:
join [TargetHeadends] tex ON nr.id=tex.id (index=NodeRouting_Headend)
It changes to this:
The estimated number of rows, btw, in reality is the first 966,000. The RID 761,000 is a few hundred thousand short and the operator cost seems a lot higher
One thing that is striking me as a little odd, is in the first example where it chose to not use the index it says this:
Missing Index (impact 99): CREATE NONCLUSTERED INDEX <NAME> ON NodeRouting(id) include (....)
CREATE NONCLUSTERED INDEX [<Name>]
ON [NodeRouting] ([Headend])
INCLUDE ([SAU],[Source],[Destination],[FibreHop],[CableRef],[CableSectionRef],[nNGAFibres],[nEthFibres],[FromID],[ToID],[FromIDTerm],[ToIDTerm],[Reversed])
I appreciate i'm returning more columns than in the index but would have thought the index would have still been used without the INCLUDE?
Indexes don't always help and they should not need to be forced into use. For example, for small tables a scan will be used because it's less work because of index overhead. Don't force the use of the index.
For a large table, an index helps when it is "selective" and the query is selective. It will get a few records quickly. It does not get a lot of records quickly. If the index is more than about 5% selective, then it might be used. If not, a scan might be faster than using the non-selective index.
If you are returning all the records, then there is no selectivity. A scan is going to be more efficient. For the join, other methods are more efficient than the lookup for a lot of records.
Using a phonebook analogy, just start at the front of the phone book and read it to the end. Don't start at the start of the index and lookup each name one at a time until you get to the end of the index.
A covered index can help because it can be scanned in place of scanning the original table (clustered index). For example, if you have a two phone books where one has address information and the other does not, then reading the one without address information will be faster if you are not interested in addresses.
FWI: Don't trust the order of the columns for the index suggestions. Also, the index suggested in this case might be a covering index to avoid reading unneeded columns - not for selectivity.

SQL makes key lookup instead of seek over indexed column

I have 1 large and 2 small tables inner joined. I added appropriate indexes over large table. Even if query is fast (most of the time) some times it is getting over 3 seconds. When I checked execution plan, seems like SQL goes with key lookup instead of index seek.
Here is my query;
and my execution plan;
and here execution details;
Am I missing something here?
A key lookup is a seek. It is looking up using the key.
The non clustered index always includes the clustered index key (or physical rid if the table isn't clustered in which case you get a "bookmark lookup" instead)
Because the index used in the previous index seek does not contain the CreateDate column it needs to use the clustered index key to seek into the clustered index to retrieve it. This type of seek to retrieve additional columns is called a key lookup.
If you wanted to get rid of the lookup you could consider adding CreateDate as an included column to the index on NewsCategoryUrlId.
Though as Hadi says in the comments your case sounds like parameter sniffing or outdated statistics. Often a plan with a non covering index seek and key lookups may be generated if the optimiser believes the parameter value to be selective and be problematic if it is not selective.
With parameter sniffing the problem can arise if a plan is compiled for a selective value and then cached and reused for a less selective value.
Outdated statistics may not reflect the true selectivity of the parameter value the plan is compiled for in the first place.
After suggestion from Martin Smith, I re-create my index as below;
and now, execution plan is mush satisfied for me;

Create more than one non clustered index on same column in SQL Server

What is the index creating strategy?
Is it possible to create more than one non-clustered index on the same column in SQL Server?
How about creating clustered and non-clustered on same column?
Very sorry, but indexing is very confusing to me.
Is there any way to find out the estimated query execution time in SQL Server?
The words are rather logical and you'll learn them quite quickly. :)
In layman's terms, SEEK implies seeking out precise locations for records, which is what the SQL Server does when the column you're searching in is indexed, and your filter (the WHERE condition) is accurrate enough.
SCAN means a larger range of rows where the query execution planner estimates it's faster to fetch a whole range as opposed to individually seeking each value.
And yes, you can have multiple indexes on the same field, and sometimes it can be a very good idea. Play out with the indexes and use the query execution planner to determine what happens (shortcut in SSMS: Ctrl + M). You can even run two versions of the same query and the execution planner will easily show you how much resources and time is taken by each, making optimization quite easy.
But to expand on these a bit, say you have an address table like so, and it has over 1 billion records:
CREATE TABLE ADDRESS
(ADDRESS_ID INT -- CLUSTERED primary key ADRESS_PK_IDX
, PERSON_ID INT -- FOREIGN KEY, NONCLUSTERED INDEX ADDRESS_PERSON_IDX
, CITY VARCHAR(256)
, MARKED_FOR_CHECKUP BIT
, **+n^10 different other columns...**)
Now, if you want to find all the address information for person 12345, the index on PERSON_ID is perfect. Since the table has loads of other data on the same row, it would be inefficient and space-consuming to create a nonclustered index to cover all other columns as well as PERSON_ID. In this case, SQL Server will execute an index SEEK on the index in PERSON_ID, then use that to do a Key Lookup on the clustered index in ADDRESS_ID, and from there return all the data in all other columns on that same row.
However, say you want to search for all the persons in a city, but you don't need other address information. This time, the most effective way would be to create an index on CITY and use INCLUDE option to cover PERSON_ID as well. That way, a single index seek / scan would return all the information you need without the need to resort to checking the CLUSTERED index for the PERSON_ID data on the same row.
Now, let's say both of those queries are required but still rather heavy because of the 1 billion records. But there's one special query that needs to be really really fast. That query wants all the persons on addresses that have been MARKED_FOR_CHECKUP, and who must live in New York (ignore whatever checkup means, that doesn't matter). Now you might want to create a third, filtered index on MARKED_FOR_CHECKUP and CITY, with INCLUDE covering PERSON_ID, and with a filter saying CITY = 'New York' and MARKED_FOR_CHECKUP = 1. This index would be insanely fast, as it only ever cover queries that satisfy those exact conditions, and therefore has a fraction of the data to go through compared to the other indexes.
(Disclaimer here, bear in mind that the query execution planner is not stupid, it can use multiple nonclustered indexes together to produce the correct results, so the examples above may not be the best ones available as it's very hard to imagine when you would need 3 different indexes covering the same column, but I'm sure you get the idea.)
The types of index, their columns, included columns, sorting orders, filters etc depend entirely on the situation. You will need to make covering indexes to satisfy several different types of queries, as well as customized indexes created specifically for singular, important queries. Each index takes up space on the HDD so making useless indexes is wasteful and requires extra maintenance whenever the data model changes, and wastes time in defragmentation and statistics update operations though... so you don't want to just slap an index on everything either.
Experiment, learn and work out which works best for your needs.
I'm not the expert on indexing either, but here is what I know.
You can have only ONE Clustered Index per table.
You can have up to a certain limit of non clustered indexes per table. Refer to http://social.msdn.microsoft.com/Forums/en-US/63ba3877-e0bd-4417-a04b-19c3bfb02ac9/maximum-number-of-index-per-table-max-no-of-columns-in-noncluster-index-in-sql-server?forum=transactsql
Indexes should just have different names, but its better not to use the same column(s) on a lot of different indexes as you will run into some performance problems.
A very important point to remember is that Indexes although it makes your select faster, influence your Insert/Update/Delete speed as the information needs to be added to the index, which means that the more indexes you have on a column that gets updated a lot, will drastically reduce the speed of the update.
You can include columns that is used on a CLUSTERED index in one or more NON-CLUSTERED indexes.
Here is some more reading material
http://www.sqlteam.com/article/sql-server-indexes-the-basics
http://www.programmerinterview.com/index.php/database-sql/what-is-an-index/
EDIT
Another point to remember is that an index takes up space just like the table. The more indexes you create the more space it uses, so try not to use char/varchar (or nchar/nvarchar) in an index. It uses to much space in the index, and on huge columns give basically no benefit. When your Indexes start to become bigger than your table, it also means that you have to relook your index strategy.

SQL Server Non Clustered Indexes with multiple columns

I cannot seem to find a straight answer. I have the following columns:
ZipCode
StateAbbreviation
CountyName
I will be querying the table by either ZipCode alone or StateAbbreviation/CountyName combination. I have a PK Id column defined, so this is my clustered index.
What is recommended approach to achieve better performance/efficiency? What kind of indexes should I create considering how I will be searching the table?
First - run your app and see if it's fast enough without any indexes.
Second - if it's slow in some places, find the SQL workload that this code generates - what kind of queries do you have? Try adding an index on either ZipCode or on (StateAbbreviation, CountyName) and then run your app again and see how it performs.
If that one index solved your issue -> go enjoy your spare time! :-)
If not: add the second index and see if that help.
Don't overdo indexes just because you can! One at a time, only when it hurts. Don't just add indexes ahead of time - only add when you have a perf problem. And to see if it helps, you need to measure, measure, and measure again and compare to a previous baseline of measurements.
Nonclustered indexes are being used surprisingly rarely - your queries need to be just right for a nonclustered index to even be considered by the SQL Server query optimizer. If the table is "too small", a table scan will be preferred. If you use SELECT * all the time - your nonclustered indexes are likely to be ignored, too.
If you do add indexes, the columns should be ordered in the index using the same order as your queries. If you don't do that, there's no guarantee the index will be used. Index 1) Zipcode; Index 2) State Abbreviation, CountyName.
As #marc_s as said, indexes should be used when they are necessary as they do create some additional overhead when rows are modified/inserted/deleted. If they are straight-up reference tables, though, the only negative impact an index would have is the disk space that it uses. I usually find that to be worth less than no index.

SQL Server: Index columns used in like?

Is it a good idea to index varchar columns only used in LIKE opertations? From what I can read from query analytics I get from the following query:
SELECT * FROM ClientUsers WHERE Email LIKE '%niels#bosmainter%'
I get an "Estimated subtree cost" of 0.38 without any index and 0.14 with an index. Is this a good metric to use for anlayzing if a query has been optimized with an index?
Given the data 'abcdefg'
WHERE Column1 LIKE '%cde%' --can't use an index
WHERE Column1 LIKE 'abc%' --can use an index
WHERE Column1 Like '%defg' --can't use an index, but see note below
Note: If you have important queries that require '%defg', you could use a persistent computed column where you REVERSE() the column and then index it. Your can then query on:
WHERE Column1Reverse Like REVERSE('defg')+'%' --can use the persistent computed column's index
In my experience the first %-sign will make any index useless, but one at the end will use the index.
To answer the metrics part of your question: The type of index/table scan/seek being performed is a good indicator for knowing if an index is being (properly) used. It's usually shown topmost in the query plan analyzer.
The following scan/seek types are sorted from worst (top) to best (bottom):
Table Scan
Clustered Index Scan
Index Scan
Clustered Index Seek
Index Seek
As a rule of thumb, you would normally try to get seeks over scans whenever possible. As always, there are exceptions depending on table size, queried columns, etc. I recommend doing a search on StackOverflow for "scan seek index", and you'll get a lot of good information about this subject.

Resources