Indexes with columns included - sql-server

I noticed in my database that I have a lot of indexes in a table that only differ in the included columns.
For example for table A I have this:
INDEX ON A(COLUMN_A) INCLUDE (COLUMN_B)
INDEX ON A(COLUMN_A) INCLUDE (COLUMN_C)
INDEX ON A(COLUMN_A) INCLUDE (COLUMN_D)
It seems to me it would be more efficient (for inserts/updates/deletes) to just have this:
INDEX ON A(COLUMN_A) INCLUDE (COLUMN_B, COLUMN_C, COLUMN_D)
Would there be any reason not to do this?
Thanks!

You are right, they should be combined into one. The non-key columns that are INCLUDE-ed are just stored in the leaf nodes of the index so they can be read. Unlike key columns they don't form part of the hierarchy of the index so the order isn't important. Having fewer indexes is a good thing if the indexes aren't adding anything useful, as in this case with your redundant indexes.
See also
https://stackoverflow.com/a/1308012/8479
http://msdn.microsoft.com/en-us/library/ms190806.aspx

Most likely, this is a read optimization, not write. If there are multiple queries that use the same key column (COLUMN_A), but using a different "data" column (COLUMN_B/C/D), using this kind of index saves some I/O (no need to load unnecessary data from the table, you already have it along with the index). Including all the data columns in one index would take less space overall (no need to have the key column saved three timse), but each of the indices in your case is smaller than one "combined" index.
Hopefully, the indices were created this way for a reason, based on performance profiling and real performance issues. This is where documentation with motivation comes in very handy - why were the indices created this way? Was it sloppiness? Little understanding of the way indices work? Automatic optimization based on query plan analysis? Unless you know that, it might be a bad idea to tweak this kind of thing, especially if you don't actually have a performance problem.

Related

SQL Server almost duplicate index - which one to delete

I have a table with two non-clustered indexes. Both have the same key columns and both have INCLUDE columns that are the same as well EXCEPT for one column, otherwise they are identical.
I need to drop one as having two almost identical indexes is unnecessary. I'm looking over the Usage Stats and Op Stats that are returned using sp_BlitzIndex.
Usage Stats
Index A: Reads: 652,366(652,366 seek) Writes: 3,297,125
Index B (Has additional column in INCLUDE): Reads 644,443(640,332 seek 4,111 scan) Writes: 3,897,213
Op Stats
Index A: 536,711 Singleton Lookups; 1,239,859 scans/seeks; 423,781 deletes; 5,125 updates
Index B: (Has additional column in INCLUDE): 1,070,124 singleton lookups; 1,225,548 scans/seeks; 913,185 deletes; 5,127 updates.
Index A has more seek reads but less singleton lookups. My first choice is to keep the index with the additional column (Index B) as I would think it would cover more queries in the long run. Should I just be focusing on reads only and keep index A?
EDIT: The table in question has 22 indexes including the two in question which were added by a prior DBA. Seems unnecessary to have both indexes around when they are so very similar and I'm trying to reduce overhead of so many indexes on this table as it seems to affect performance. I do realize that proper indexing can be tricky so I guess I'm just asking if the Usage and Op stats I provided should steer me to possibly eliminating one or the other or perhaps neither.
Why do you "need" to drop one? Unless there is a performance hit due to constant updating of the data thus updating the indexes, both could still provide a benefit for your selects with minimal impact.
Have you ran a trace under normal use and analyzed to see what would benefit you with no indexes? You can learn a lot about what is going on doing this.

Is there a benefit in eliminating the unique-ness of a redundant unique index on SQL Server?

Whilst analyzing the database structure of a legacy application, I discovered in several tables there are 2 unique indices which both have the exact same columns, except in a different order.
Having 2 unique indices covering the same columns is clearly redundant, so my first instinct was to completely drop one of them. But then I thought some of the queries emmitted by the application might be making use of the index I might delete, so I thought to convert it instead into a regular index.
To the best of my knowledge, whenever a row is inserted/updated in a table having a unique index, SQL Server spends some milliseconds validating each unique index/constraint still holds true - so by converting one of these indices into a non-unique I hope processing of this table might be sped up a bit, please confirm or dispel.
On the other hand, I don't understand what's the benefit in having to unique indices covering the same columns on a table. Any ideas what this could be done for? Could something get lost if I convert one of them onto a regular one?
check the index usage stats to see if they are both being used.
sys.dm_db_index_usage_stats.
If not, delete the unused index.
Generally speaking, indexes are used for filtering, then ordering. It is possible that you may have queries that are needing to filter on the leading columns of both indexes. If that is the case, you'll reduce how deep the query can be optimized by getting rid of one. That may not be a big deal as it may still be able to satisfactorily use the remaining index.
For example, if I have 2 indexes with four columns:
1: Columns A, B, C, D
2: Columns A, B, D, C
Any query that currently prefers #2 could still gain benefits by using #1 if #2 is not available. It would just limit the selectivity to column B rather than all the way down to column D.
If you're not sure, try disabling (not deleting) the less used index and see if you notice any problems. If something slows down, it is simple enough to enable it again.
As always, try it in a non-production environment first.
UPDATE
Yes you can safely remove the uniqueness of one of the indexes. It only needs to be enforced by one of them. The only concern would be if the vendor decided to do the same and chooses the other index.
However, since this is from a vendor, I'd recommend you contact them if there are performance concerns. If you're not running into a performance issue worth a support request to them, then just leave it alone.

Putting indices on all columns of a read only table

I have a table in a database that will be generated from the start and probably never be written to again. Even if it were ever written to, it'll be in the form of batch processes run during a release, and write time is not important at all.
It's a relatively large table with about 80k rows and maybe about 10-12 columns.
The application is likely to retrieve data from this table often.
I was thinking, since it'll never be written to again, should I just put indices on all the columns? That way it'll always be quick to read no matter what type of query I form?
Is this a good idea? Is there any downside to this I should be aware of?
My understanding is that each index does require some (a relatively small amount of) storage space. If you're tight for space this could matter. Exactly how much impact this might make may depend on which DB you are using.
It will depend on the table. If all of the columns will be used in search criteria, then it is not unreasonable to put indexes on them all. That is fairly unlikely though. Also, there may be compound (multi-column) indexes that would be more beneficial than some of the simple (single-column) indexes.
Finally, the query optimizer will have to review all the indexes that are present on the table when evaluating how answer queries. It is hard to say when this becomes a measurable overhead, but more indexes takes more time.
So, given the static nature of the table you describe, it is reasonable to index it more heavily than you might a more dynamic table. Indexing every column is probably not sensible. Choosing carefully which compound indexes to add may be important too.
Choose indexes for a table based on the queries you run against that table.
Indexes you never need for any query are just wasted space.
Individual indexes on each column isn't the full set of indexes possible. You also can make multi-column indexes (i.e. compound indexes), and these can be important for optimizing certain queries. The order of columns in a compound index matters.
SQL Server 2008 supports only 999 nonclustered indexes per table, so if you try to create all possible indexes on a table of more than a few columns, you will reach the limit.
Sorry, but you actually need to learn some things before you can optimize effectively. If it were simply a matter of indexing every column, then the RDBMS would do this by default.

What are the methods for identifying unnecessary columns within a covering index?

What methods are there for identifying superfluous columns in covering indices: columns which are never searched against, and therefore may be extracted into Includes, or even removed completely without affecting the applicability of the index?
To clarify things
The idea of a covering index is that it also includes columns which may not be searched by (used in the WHERE clause and such) but may be selected (part of the SELECT columns list).
There doesn't seem to be any easy way to assert the existence of unused colums in a covering index. I can only think of a painstaking process below:
For a representative period of time, record all queries being run on the server (or on the table desired)
Filter out (through regular expression) queries not involving the underlying table
For remaining queries, obtain the query plan; discard queries not involving the index in question
For the remaining queries, or rather for each "template" of query (many queries are same but for the search criteria values), make the list of the columns from the index that are either in select or where clause (or in JOIN...)
the columns from the index not found in that list are positively good to go.
Now, there may be a few more [columns to remove] because the process above doesn't check in which context the covering index is used (it is possible that it be used for resolving the where, but that the underlying table is still accessed as well (for example to get to columns not in the covering index...)
The above clinical approach is rather unattractive. An analytical approach may be preferable:
Find all queries "templates" that may be used in all the applications using the server. For each of these patterns, find the ones which may be using the covering index. These are (again a few holes...) queries that:
include a reference to the underlying table
do not cite in any way a column from the underlying table that is not a column in the index
do not use a search criteria from the underlying table that is more selective that the columns of the index (in their very order...)
Or... without even going to the applications: think of all the use cases, and if queries that would serve these cases would benefit of not from all columns in the index. Doing so would imply that you have a relatively good idea of the selectivity of the index, regarding its first few columns.
If you do audits of your use cases and data points, obviously anything that isn't used or caught in the audit is a candidate for deletion. If the database lacks such a thorough audit, you can save a time-window's worth of queries that hit the database by running a trace and saving it. You can analyze the trace and see what type of queries are hitting the database and from there intuit which columns can be dropped.
Trace analysis is typically used to find candidates for missing indices, but I'm guessing that it could be also used to analyze usage trends.

should nearly unique fields have indexes

I have a field in a database that is nearly unique: 98% of the time the values will be unique, but it may have a few duplicates. I won't be doing many searches on this field; say twice a month. The table currently has ~5000 records and will gain about 150 per month.
Should this field have an index?
I am using MySQL.
I think the 'nearly unique' is probably a red herring. The data is either unique, or it's not, but that doesn't determine whether you would want to index it for performance reasons.
Answer:
5000 records is really not many at all, and regardless of whether you have an index, searches will still be fast. At that rate of inserts, it'll take you 3 years to get to 10000 records, which is still also not many.
I personally wouldn't bother with adding an index, but it wouldn't matter if you did.
Explanation:
What you have to think about when deciding to add an index is the trade-off between insertion speed, and selection speed.
Without an index, doing a select on that field means MySQL has to walk over every single row and read every single field. Adding an index prevents this.
The downside of the index is that each time data gets inserted, the DB has to update the index in addition to adding the data. This is usually a small overhead, but you'd really notice it if you had loads of indexes, and were doing a lot of writes.
By the time you get this many rows in your database, you'd want an index anyway as otherwise your selects would take all day, but it's just something to be aware about so that you don't end up adding indexes on fields "just in case I need it"
That's not very many records at all; I wouldn't bother making any indexes on that table. The relative uniqueness of the field is irrelevant - even on years-old commodity hardware I'd expect a query on that table to take a fraction of a second.
you can use the general rule of thumb: optimize when it becomes a problem. Just don't use an index until you notice you need one.
From what you say, it doesn't sound like an index is necessary. Rule of thumb is index fields that are being used in SELECTS a lot to speed up the searching, which in turn (can) slows down INSERTS and UPDATES.
On a recordset as small as yours, I don't think you will see much of a real world hit either way.
If you'll only be doing searches on it twice a month and its that few rows then I would say don't index it. Its all but useless.
No. There aren't many records and it's not going to be frequently queried. No need to index.
It's really a judgement call. With such a small table you can search reasonably quickly without an index, so you could get by without it.
On the other hand, the cost of creating an index you don't really need is pretty low, so you're not saving yourself much by not doing it.
Also, if you do create the index, you're covered for the future if you suddenly start getting 1000 new records/week. Possibly you know enough about the situation to say for certain that that will never happen, but requirements do have a way of changing when you least expect.
EDIT: As far as changing requirements, the thing to consider is this: If the DB does grow and you find out later that you do need an index, can you simply create the index and be done? Or will you also need to change lots of code to make use of the new index?
It depends. As others have responded, there's a trade off between table update speed and selection speed. Table update includes inserts, updates, and deletes on the table.
One question you didn't address. Does the table have a primary key, and a corresponding index? A table with no indexes usually benefits form having at least one index. The most common way of getting that index is to declare a primary key, and rely on the DBMS to generate an index accordingly.
If a table has no candidates for primary key, that usually indicates a serious flaw in table design. That's a separate issue and should get a spearate discussion.

Resources