SQL Server: Index columns used in like? - sql-server

Is it a good idea to index varchar columns only used in LIKE opertations? From what I can read from query analytics I get from the following query:
SELECT * FROM ClientUsers WHERE Email LIKE '%niels#bosmainter%'
I get an "Estimated subtree cost" of 0.38 without any index and 0.14 with an index. Is this a good metric to use for anlayzing if a query has been optimized with an index?

Given the data 'abcdefg'
WHERE Column1 LIKE '%cde%' --can't use an index
WHERE Column1 LIKE 'abc%' --can use an index
WHERE Column1 Like '%defg' --can't use an index, but see note below
Note: If you have important queries that require '%defg', you could use a persistent computed column where you REVERSE() the column and then index it. Your can then query on:
WHERE Column1Reverse Like REVERSE('defg')+'%' --can use the persistent computed column's index

In my experience the first %-sign will make any index useless, but one at the end will use the index.

To answer the metrics part of your question: The type of index/table scan/seek being performed is a good indicator for knowing if an index is being (properly) used. It's usually shown topmost in the query plan analyzer.
The following scan/seek types are sorted from worst (top) to best (bottom):
Table Scan
Clustered Index Scan
Index Scan
Clustered Index Seek
Index Seek
As a rule of thumb, you would normally try to get seeks over scans whenever possible. As always, there are exceptions depending on table size, queried columns, etc. I recommend doing a search on StackOverflow for "scan seek index", and you'll get a lot of good information about this subject.

Related

Query plan ignores index on large join

I have the following fairly simple query that returns about 1 million rows (I've left out columns as they are just for output), but the query plan doesn't seem to want to use the index and wants me to create one:
SELECT [SAU]
,nr.[Headend]
,[Source]
,[Destination]
,[FibreHop]
,[CableRef]
,[CableSectionRef]
,[nNGAFibres]
,[nEthFibres]
,[FromID]
,[ToID]
,[FromIDTerm]
,[ToIDTerm],Reversed
,#Now
FROM [NodeRouting] nr
join [TargetHeadends] tex ON nr.Headend=tex.Headend
The index is:
CREATE NONCLUSTERED INDEX [NodeRouting_Headend] ON [NodeRouting]
(
[Headend] ASC
)
the other table Headend is the PK
The query plan is this:
If I give it a hint to use the index already created (non-unique, non-clustered) on the id field:
join [TargetHeadends] tex ON nr.id=tex.id (index=NodeRouting_Headend)
It changes to this:
The estimated number of rows, btw, in reality is the first 966,000. The RID 761,000 is a few hundred thousand short and the operator cost seems a lot higher
One thing that is striking me as a little odd, is in the first example where it chose to not use the index it says this:
Missing Index (impact 99): CREATE NONCLUSTERED INDEX <NAME> ON NodeRouting(id) include (....)
CREATE NONCLUSTERED INDEX [<Name>]
ON [NodeRouting] ([Headend])
INCLUDE ([SAU],[Source],[Destination],[FibreHop],[CableRef],[CableSectionRef],[nNGAFibres],[nEthFibres],[FromID],[ToID],[FromIDTerm],[ToIDTerm],[Reversed])
I appreciate i'm returning more columns than in the index but would have thought the index would have still been used without the INCLUDE?
Indexes don't always help and they should not need to be forced into use. For example, for small tables a scan will be used because it's less work because of index overhead. Don't force the use of the index.
For a large table, an index helps when it is "selective" and the query is selective. It will get a few records quickly. It does not get a lot of records quickly. If the index is more than about 5% selective, then it might be used. If not, a scan might be faster than using the non-selective index.
If you are returning all the records, then there is no selectivity. A scan is going to be more efficient. For the join, other methods are more efficient than the lookup for a lot of records.
Using a phonebook analogy, just start at the front of the phone book and read it to the end. Don't start at the start of the index and lookup each name one at a time until you get to the end of the index.
A covered index can help because it can be scanned in place of scanning the original table (clustered index). For example, if you have a two phone books where one has address information and the other does not, then reading the one without address information will be faster if you are not interested in addresses.
FWI: Don't trust the order of the columns for the index suggestions. Also, the index suggested in this case might be a covering index to avoid reading unneeded columns - not for selectivity.

Why isn't a particular index being used in a query?

I have a table named Workflow. It has 37M rows in it. There is a primary key on the ID column (int) plus an additional column. The ID column is the first column in the index.
If I execute the following query, the PK is not used (unless I use an index hint)
Select Distinct(SubID) From Workflow Where ID >= #LastSeenWorkflowID
If I execute this query instead, the PK is used
Select Distinct(SubID) From Workflow Where ID >= 786400000
I suspect the problem is with using the parameter value in the query (which I have to do). I really don't want to use an index hint. Is there a workaround for this?
Please post the execution plan(s), as well as the exact table definition, including all indexes.
When you use a variable the optimizer does no know what selectivity the query will have, the #LastSeenWorkflowID may filter out all but very last few rows in Workflow, or it may include them all. The generated plan has to work in both situations. There is a threshold at which the range seek over the clustered index is becoming more expensive than a full scan over a non-clustered index, simply because the clustered index is so much wider (it includes every column in the leaf levels) and thus has so much more pages to iterate over. The plan generated, which considers an unknown value for #LastSeenWorkflowID, is likely crossing that threshold in estimating the cost of the clustered index seek and as such it chooses the scan over the non-clustered index.
You could provide a narrow index that is aimed specifically at this query:
CREATE INDEX WorkflowSubId ON Workflow(ID, SubId);
or:
CREATE INDEX WorkflowSubId ON Workflow(ID) INCLUDE (SubId);
Such an index is too-good-to-pass for your query, no matter the value of #LastSeenWorkflowID.
Assuming your PK is an identity OR is always greater than 0, perhaps you could try this:
Select Distinct(SubID)
From Workflow
Where ID >= #LastSeenWorkflowID
And ID > 0
By adding the 2nd condition, it may cause the optimizer to use an index seek.
This is a classic example of local variable producing a sub-optimal plan.
You should use OPTION (RECOMPILE) in order to compile your query with the actual parameter value of ID.
See my blog post for more information:
http://www.sqlbadpractices.com/using-local-variables-in-t-sql-queries/

Is a SqlProfiler Scan Started bad?

If in SqlProfiler you can see that to execute a query a Scan is Started, does this mean a full table scan or can it just be a lookup? If it can be both, how can you tell which one of the two it is?
From the documentation:
The Scan:Started event class occurs when a table or index scan is started.
So it could be either one. The IndexID field will tell you if it is an index, and which one.
Not that it really matters very much. A clustered index scan basically is a table scan. A nonclustered index scan is better, but only a little. If you see any full scan, it means either (a) you're using non-sargable predicates or predicates on fields that aren't indexed, or (b) the predicate fields are indexed but the output columns aren't covered by the index, and the optimizer has decided that it is cheaper to perform a full scan than a bookmark/RID lookup.
Index scans aren't often much better than table scans, performance-wise, so you should try to eliminate whatever is leading to it, if possible.

Index Seek with Bookmark Lookup Only Option for SQL Query?

I am working on optimizing a SQL query that goes against a very wide table in a legacy system. I am not able to narrow the table at this point for various reasons.
My query is running slowly because it does an Index Seek on an Index I've created, and then uses a Bookmark Lookup to find the additional columns it needs that do not exist in the Index. The bookmark lookup takes 42% of the query time (according to the query optimizer).
The table has 38 columns, some of which are nvarchars, so I cannot make a covering index that includes all the columns. I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used.
Also, since 28 of the 38 columns are pulled out via this query, I'd have 28/38 of the columns in the table stored in these covering indexes, so I'm not sure how much this would help.
Do you think a Bookmark Lookup is as good as it is going to get, or what would another option be?
(I should specify that this is SQL Server 2000)
OH,
the covering index with include should work. Another option might be to create a clustered indexed view containing only the columns you need.
Regards,
Lieven
You could create an index with included columns as another option
example from BOL, this is for 2005 and up
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);
To answer this part "I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used."
An index can only be used when the query is created in a way that it is sargable, in other words if you use function on the left side of the operator or leave out the first column of the index in your WHERE clause then the index won't be used. If the selectivity of the index is low then also the index won't be used
Check out SQL Server covering indexes for some more info

What columns generally make good indexes?

As a follow up to "What are indexes and how can I use them to optimise queries in my database?" where I am attempting to learn about indexes, what columns are good index candidates? Specifically for an MS SQL database?
After some googling, everything I have read suggests that columns that are generally increasing and unique make a good index (things like MySQL's auto_increment), I understand this, but I am using MS SQL and I am using GUIDs for primary keys, so it seems that indexes would not benefit GUID columns...
Indexes can play an important role in query optimization and searching the results speedily from tables. The most important step is to select which columns are to be indexed. There are two major places where we can consider indexing: columns referenced in the WHERE clause and columns used in JOIN clauses. In short, such columns should be indexed against which you are required to search particular records. Suppose, we have a table named buyers where the SELECT query uses indexes like below:
SELECT
buyer_id /* no need to index */
FROM buyers
WHERE first_name='Tariq' /* consider indexing */
AND last_name='Iqbal' /* consider indexing */
Since "buyer_id" is referenced in the SELECT portion, MySQL will not use it to limit the chosen rows. Hence, there is no great need to index it. The below is another example little different from the above one:
SELECT
buyers.buyer_id, /* no need to index */
country.name /* no need to index */
FROM buyers LEFT JOIN country
ON buyers.country_id=country.country_id /* consider indexing */
WHERE
first_name='Tariq' /* consider indexing */
AND
last_name='Iqbal' /* consider indexing */
According to the above queries first_name, last_name columns can be indexed as they are located in the WHERE clause. Also an additional field, country_id from country table, can be considered for indexing because it is in a JOIN clause. So indexing can be considered on every field in the WHERE clause or a JOIN clause.
The following list also offers a few tips that you should always keep in mind when intend to create indexes into your tables:
Only index those columns that are required in WHERE and ORDER BY clauses. Indexing columns in abundance will result in some disadvantages.
Try to take benefit of "index prefix" or "multi-columns index" feature of MySQL. If you create an index such as INDEX(first_name, last_name), don’t create INDEX(first_name). However, "index prefix" or "multi-columns index" is not recommended in all search cases.
Use the NOT NULL attribute for those columns in which you consider the indexing, so that NULL values will never be stored.
Use the --log-long-format option to log queries that aren’t using indexes. In this way, you can examine this log file and adjust your queries accordingly.
The EXPLAIN statement helps you to reveal that how MySQL will execute a query. It shows how and in what order tables are joined. This can be much useful for determining how to write optimized queries, and whether the columns are needed to be indexed.
Update (23 Feb'15):
Any index (good/bad) increases insert and update time.
Depending on your indexes (number of indexes and type), result is searched. If your search time is gonna increase because of index then that's bad index.
Likely in any book, "Index Page" could have chapter start page, topic page number starts, also sub topic page starts. Some clarification in Index page helps but more detailed index might confuse you or scare you. Indexes are also having memory.
Index selection should be wise. Keep in mind not all columns would require index.
Some folks answered a similar question here: How do you know what a good index is?
Basically, it really depends on how you will be querying your data. You want an index that quickly identifies a small subset of your dataset that is relevant to a query. If you never query by datestamp, you don't need an index on it, even if it's mostly unique. If all you do is get events that happened in a certain date range, you definitely want one. In most cases, an index on gender is pointless -- but if all you do is get stats about all males, and separately, about all females, it might be worth your while to create one. Figure out what your query patterns will be, and access to which parameter narrows the search space the most, and that's your best index.
Also consider the kind of index you make -- B-trees are good for most things and allow range queries, but hash indexes get you straight to the point (but don't allow ranges). Other types of indexes have other pros and cons.
Good luck!
It all depends on what queries you expect to ask about the tables. If you ask for all rows with a certain value for column X, you will have to do a full table scan if an index can't be used.
Indexes will be useful if:
The column or columns have a high degree of uniqueness
You frequently need to look for a certain value or range of values for
the column.
They will not be useful if:
You are selecting a large % (>10-20%) of the rows in the table
The additional space usage is an issue
You want to maximize insert performance. Every index on a table reduces insert and update performance because they must be updated each time the data changes.
Primary key columns are typically great for indexing because they are unique and are often used to lookup rows.
Any column that is going to be regularly used to extract data from the table should be indexed.
This includes:
foreign keys -
select * from tblOrder where status_id=:v_outstanding
descriptive fields -
select * from tblCust where Surname like "O'Brian%"
The columns do not need to be unique. In fact you can get really good performance from a binary index when searching for exceptions.
select * from tblOrder where paidYN='N'
In general (I don't use mssql so can't comment specifically), primary keys make good indexes. They are unique and must have a value specified. (Also, primary keys make such good indexes that they normally have an index created automatically.)
An index is effectively a copy of the column which has been sorted to allow binary search (which is much faster than linear search). Database systems may use various tricks to speed up search even more, particularly if the data is more complex than a simple number.
My suggestion would be to not use any indexes initially and profile your queries. If a particular query (such as searching for people by surname, for example) is run very often, try creating an index over the relevate attributes and profile again. If there is a noticeable speed-up on queries and a negligible slow-down on insertions and updates, keep the index.
(Apologies if I'm repeating stuff mentioned in your other question, I hadn't come across it previously.)
It really depends on your queries. For example, if you almost only write to a table then it is best not to have any indexes, they just slow down the writes and never get used. Any column you are using to join with another table is a good candidate for an index.
Also, read about the Missing Indexes feature. It monitors the actual queries being used against your database and can tell you what indexes would have improved the performace.
Your primary key should always be an index. (I'd be surprised if it weren't automatically indexed by MS SQL, in fact.) You should also index columns you SELECT or ORDER by frequently; their purpose is both quick lookup of a single value and faster sorting.
The only real danger in indexing too many columns is slowing down changes to rows in large tables, as the indexes all need updating too. If you're really not sure what to index, just time your slowest queries, look at what columns are being used most often, and index them. Then see how much faster they are.
Numeric data types which are ordered in ascending or descending order are good indexes for multiple reasons. First, numbers are generally faster to evaluate than strings (varchar, char, nvarchar, etc). Second, if your values aren't ordered, rows and/or pages may need to be shuffled about to update your index. That's additional overhead.
If you're using SQL Server 2005 and set on using uniqueidentifiers (guids), and do NOT need them to be of a random nature, check out the sequential uniqueidentifier type.
Lastly, if you're talking about clustered indexes, you're talking about the sort of the physical data. If you have a string as your clustered index, that could get ugly.
A GUID column is not the best candidate for indexing. Indexes are best suited to columns with a data type that can be given some meaningful order, ie sorted (integer, date etc).
It does not matter if the data in a column is generally increasing. If you create an index on the column, the index will create it's own data structure that will simply reference the actual items in your table without concern for stored order (a non-clustered index). Then for example a binary search can be performed over your index data structure to provide fast retrieval.
It is also possible to create a "clustered index" that will physically reorder your data. However you can only have one of these per table, whereas you can have multiple non-clustered indexes.
The ol' rule of thumb was columns that are used a lot in WHERE, ORDER BY, and GROUP BY clauses, or any that seemed to be used in joins frequently. Keep in mind I'm referring to indexes, NOT Primary Key
Not to give a 'vanilla-ish' answer, but it truly depends on how you are accessing the data
It should be even faster if you are using a GUID.
Suppose you have the records
100
200
3000
....
If you have an index(binary search, you can find the physical location of the record you are looking for in O( lg n) time, instead of searching sequentially O(n) time. This is because you dont know what records you have in you table.
Best index depends on the contents of the table and what you are trying to accomplish.
Taken an example A member database with a Primary Key of the Members Social Security Numnber. We choose the S.S. because the application priamry referes to the individual in this way but you also want to create a search function that will utilize the members first and last name. I would then suggest creating a index over those two fields.
You should first find out what data you will be querying and then make the determination of which data you need indexed.

Resources