SQL Server Non Clustered Indexes with multiple columns - sql-server

I cannot seem to find a straight answer. I have the following columns:
ZipCode
StateAbbreviation
CountyName
I will be querying the table by either ZipCode alone or StateAbbreviation/CountyName combination. I have a PK Id column defined, so this is my clustered index.
What is recommended approach to achieve better performance/efficiency? What kind of indexes should I create considering how I will be searching the table?

First - run your app and see if it's fast enough without any indexes.
Second - if it's slow in some places, find the SQL workload that this code generates - what kind of queries do you have? Try adding an index on either ZipCode or on (StateAbbreviation, CountyName) and then run your app again and see how it performs.
If that one index solved your issue -> go enjoy your spare time! :-)
If not: add the second index and see if that help.
Don't overdo indexes just because you can! One at a time, only when it hurts. Don't just add indexes ahead of time - only add when you have a perf problem. And to see if it helps, you need to measure, measure, and measure again and compare to a previous baseline of measurements.
Nonclustered indexes are being used surprisingly rarely - your queries need to be just right for a nonclustered index to even be considered by the SQL Server query optimizer. If the table is "too small", a table scan will be preferred. If you use SELECT * all the time - your nonclustered indexes are likely to be ignored, too.

If you do add indexes, the columns should be ordered in the index using the same order as your queries. If you don't do that, there's no guarantee the index will be used. Index 1) Zipcode; Index 2) State Abbreviation, CountyName.
As #marc_s as said, indexes should be used when they are necessary as they do create some additional overhead when rows are modified/inserted/deleted. If they are straight-up reference tables, though, the only negative impact an index would have is the disk space that it uses. I usually find that to be worth less than no index.

Related

SQL server suggested an index that basically has all columns from the table- is this a good idea?

I have a table with 7 columns:
ProductId
WarehouseId
StockDate
OnHandQuantity
InTransitQuantity
CreateDate
CreateUser
One of my user's is trying to improve the performance of a query and has created an index that was suggested by SSMS. The index looks like the following:
CREATE NONCLUSTERED INDEX [IDX1_FactStock2016] ON [dbo].[FactStock2016]
(
[WarehouseId] ASC
)
INCLUDE ([ProductId],[StockDate],[OnHandQuantity],[InTransitQuantity])
Now to me, it seems counter intuitive to create an index like this when it basically includes all columns from the underlying table (CreateDate and CreateUser are not relevant). Isn't this basically creating a copy of the table as an index (minus the two audit columns)? Is this actually going to have any impact on the speed?
Such an index would be a covering index for all queries on the table. For many, queries this could be the best index.
There are other options, such as a clustered index that has WarehouseId as the first key in the index.
Whether this is a good idea or not depends on your environment. If your sole concern is the performance of this one user's query, then it sounds like a good idea. Additional indexes do incur additional overhead for inserts, updates, and deletes (although indexes can also help the performance of updates and deletes depending on the conditions). Adding more indexes requires a balance of considerations. There is nothing a priori wrong, though, about having covering indexes for a query.
Yes, a covering index like this is a redundant copy of the data, It can also improve query performance by avoiding the key lookup.
Indexing decisions often involve trade-offs. Evaluate whether the additional costs of storage and maintenance outweigh the gains in performance. A frequently executed query that benefits from the index may very well outweigh the costs.
This index suggestion basically means that, for this particular query, it would be better if table's clustered index would have been built on the WarehouseId column. Since you cannot have more than 1 clustered index on any given table or view, sometimes this excess can be well worth the extra space (and some additional performance degradation of data modification statements).
Whether it is beneficial or not, however, you can only decide for yourself and your system - there are no general recommendations on this matter that would fit every situation except SQL Server index usage statistics, and what you can get from it. Shop around for index suggestion scripts, there are plenty of them everywhere.

How do I know if I should create an Non clustered Index on a Clustered Index or on a heap?

I have a DB containing some tables, no table has non-clustered index defined. The big application which uses this DB is slow(because the number of rows are close to a million). I want to optimize DB fetch operations by adding indexes. When I read about indexes I came across index names like:
Clustered Index
Non clustered Index on a Clustered Index
Non Clustered Index on a heap
Also, indexes need to be created only on some columns. How will I identify that in a table which kind of index need to be created and across which column(s)?
P.S. Execution plan while running query tells to create NCI on all columns. Can I blindly go ahead and create index as suggested by SQL Server?
A clustered index is a type of index which defines how the data of your table will be stored (more precisely, how the data is sorted). This is the reason why the clustered index columns should be chosen very carefully (sequentially inserted data is primordial or you will end up with fragmentation and performance issues over time, an integer "identity" column is a good pick for example).
I found out that it is a good practice to always have a clustered index on your permanent tables.
A table without a clustered index is a heap because data is not sorted in a particular way (it'll be added at the end of the file), data is therefore harder to retrieve. The only improvement you can get from using a heap without indexes is that data insertion will be faster.
A non-clustered index is a separate file that will help speed up your queries on the columns you choose (it will store values of the indexed data and their reference to the location in the main file). As the data of your table become more and more important, having those separate files can dramatically improve the performance of your queries because the db engine won't have to scan the entire table for the data you are looking for, but just look for the position of the rows to retrieve in the index file (which contains ordered data of the columns you've chosen).
Adding indexes will speed up your select queries, but slow down writing operations as the indexes have to be updated. So, don't create too many indexes on too many columns !
There are two types of tables: heap tables (which have no clustered index) and clustered tables (which do). Each of these can have any number of non-clustered indexes built on them.
When do you use a heap table? Realistically, in only one scenario: when you're doing parallel bulk imports. This specific scenario requires that the table have no clustered index. In all other scenarios, a heap table has worse performance than a table with a clustered index -- don't take my word for it, though: Microsoft has an article on this that, while dated, is still relevant. In other words, for most practical database work, you can ignore heap tables as a curiosity.
On what do you create your clustered index? Ideally, on a column with values that are ever increasing (or decreasing) and aren't changed in updates. Why? Because this has the least overhead for updating, as no data has to be moved. Because of these two requirements, surrogate keys in the form of IDENTITY columns are popular, since they neatly meet them. This is certainly not the only possible choice, though: indexing on an ever increasing timestamp is also popular (in big data warehouses, for example).
With that (mostly) out of the way, how do you decide what other columns to index? Now that's a great question, but not one I feel qualified to answer in all its glory here. I've gotten a lot of experience myself with index design over the years, but I'm not aware of specific books or articles that I could recommend (which is not to say they don't exist, and I hope other people can chime in with suggestions). For what it's worth, Microsoft itself has written a guide here, which is quite in-depth (perhaps too much so), but I haven't thoroughly read this myself.
Can you blindly go ahead and create the indexes as suggested by the query optimizer? If by that you mean "should I", then the answer is almost certainly no. The query optimizer is very eager to suggest and and all possible indexes that could speed up a query, but that doesn't mean they should all be created -- every index increases the overhead of performing inserts and updates on the table. If you followed the optimizer's advice, it's probable that you would eventually end up with indexes covering every possible combination of columns, which would be pretty terrible for anything that's not a SELECT query. Having said that, creating too many indexes is almost always not as awful as creating no indexes at all, since that quickly kills performance for most queries that involve tables with more than about 10.000 rows.
I could write books on this topic, but I haven't the time or (I fear) the skill. I hope this at least gets you started.

Create more than one non clustered index on same column in SQL Server

What is the index creating strategy?
Is it possible to create more than one non-clustered index on the same column in SQL Server?
How about creating clustered and non-clustered on same column?
Very sorry, but indexing is very confusing to me.
Is there any way to find out the estimated query execution time in SQL Server?
The words are rather logical and you'll learn them quite quickly. :)
In layman's terms, SEEK implies seeking out precise locations for records, which is what the SQL Server does when the column you're searching in is indexed, and your filter (the WHERE condition) is accurrate enough.
SCAN means a larger range of rows where the query execution planner estimates it's faster to fetch a whole range as opposed to individually seeking each value.
And yes, you can have multiple indexes on the same field, and sometimes it can be a very good idea. Play out with the indexes and use the query execution planner to determine what happens (shortcut in SSMS: Ctrl + M). You can even run two versions of the same query and the execution planner will easily show you how much resources and time is taken by each, making optimization quite easy.
But to expand on these a bit, say you have an address table like so, and it has over 1 billion records:
CREATE TABLE ADDRESS
(ADDRESS_ID INT -- CLUSTERED primary key ADRESS_PK_IDX
, PERSON_ID INT -- FOREIGN KEY, NONCLUSTERED INDEX ADDRESS_PERSON_IDX
, CITY VARCHAR(256)
, MARKED_FOR_CHECKUP BIT
, **+n^10 different other columns...**)
Now, if you want to find all the address information for person 12345, the index on PERSON_ID is perfect. Since the table has loads of other data on the same row, it would be inefficient and space-consuming to create a nonclustered index to cover all other columns as well as PERSON_ID. In this case, SQL Server will execute an index SEEK on the index in PERSON_ID, then use that to do a Key Lookup on the clustered index in ADDRESS_ID, and from there return all the data in all other columns on that same row.
However, say you want to search for all the persons in a city, but you don't need other address information. This time, the most effective way would be to create an index on CITY and use INCLUDE option to cover PERSON_ID as well. That way, a single index seek / scan would return all the information you need without the need to resort to checking the CLUSTERED index for the PERSON_ID data on the same row.
Now, let's say both of those queries are required but still rather heavy because of the 1 billion records. But there's one special query that needs to be really really fast. That query wants all the persons on addresses that have been MARKED_FOR_CHECKUP, and who must live in New York (ignore whatever checkup means, that doesn't matter). Now you might want to create a third, filtered index on MARKED_FOR_CHECKUP and CITY, with INCLUDE covering PERSON_ID, and with a filter saying CITY = 'New York' and MARKED_FOR_CHECKUP = 1. This index would be insanely fast, as it only ever cover queries that satisfy those exact conditions, and therefore has a fraction of the data to go through compared to the other indexes.
(Disclaimer here, bear in mind that the query execution planner is not stupid, it can use multiple nonclustered indexes together to produce the correct results, so the examples above may not be the best ones available as it's very hard to imagine when you would need 3 different indexes covering the same column, but I'm sure you get the idea.)
The types of index, their columns, included columns, sorting orders, filters etc depend entirely on the situation. You will need to make covering indexes to satisfy several different types of queries, as well as customized indexes created specifically for singular, important queries. Each index takes up space on the HDD so making useless indexes is wasteful and requires extra maintenance whenever the data model changes, and wastes time in defragmentation and statistics update operations though... so you don't want to just slap an index on everything either.
Experiment, learn and work out which works best for your needs.
I'm not the expert on indexing either, but here is what I know.
You can have only ONE Clustered Index per table.
You can have up to a certain limit of non clustered indexes per table. Refer to http://social.msdn.microsoft.com/Forums/en-US/63ba3877-e0bd-4417-a04b-19c3bfb02ac9/maximum-number-of-index-per-table-max-no-of-columns-in-noncluster-index-in-sql-server?forum=transactsql
Indexes should just have different names, but its better not to use the same column(s) on a lot of different indexes as you will run into some performance problems.
A very important point to remember is that Indexes although it makes your select faster, influence your Insert/Update/Delete speed as the information needs to be added to the index, which means that the more indexes you have on a column that gets updated a lot, will drastically reduce the speed of the update.
You can include columns that is used on a CLUSTERED index in one or more NON-CLUSTERED indexes.
Here is some more reading material
http://www.sqlteam.com/article/sql-server-indexes-the-basics
http://www.programmerinterview.com/index.php/database-sql/what-is-an-index/
EDIT
Another point to remember is that an index takes up space just like the table. The more indexes you create the more space it uses, so try not to use char/varchar (or nchar/nvarchar) in an index. It uses to much space in the index, and on huge columns give basically no benefit. When your Indexes start to become bigger than your table, it also means that you have to relook your index strategy.

SQL Server: How does the type of an index affect a join's performance?

If I'm am trying to squeeze every last drop of performance out of a query what affect does having these types of index's being used by my joins.
clustered index.
non-clustered index.
clustered or non-clustered index with extra columns that may not be involved in the join.
Will I gain any performance if I go through and create clustered index's that only contain the columns involved in my joins and nothing else?
(I realize I may have to move the clustered index from another index(making that index non-clustered) since it can only have one.)
In addition to Gareth Saul's answer a tiny clarification:
Non-clustered indexes repeat the
included fields, with pointer to the
rows that have that value.
This pointer to the actual data value is the column (or the set of columns) that are in your clustering key.
That's one of the main reasons why you should try and keep the clustering key small and static - small because otherwise you'll waste a lot of space, on disk and in your server's RAM, and static because otherwise, you'll have to update not just your clustering index, but also all your non-clustered indices as well, if your value changes.
This "lookup pointer is the clustering key" feature has been in SQL Server since version 7, as Kim Tripp will explain in great detail here:
What is a clustered index?
In SQL Server 7.0 and higher the
internal dependencies on the
clustering key CHANGED. (Yes, it's
important to know that things CHANGED
in 7.0... why? Because there are still
some folks out there that don't
realize how RADICAL of a change
occurred in the internals (wrt to the
clustering key) in SQL Server 7.0).
What changed is that the clustering
key gets used as the "lookup" value
from the nonclustered indexes.
Will I gain any performance if I go through and create clustered index's that only contain the columns involved in my joins and nothing else?
Not as I understand. The point of a clustered index is that it then sorts the data on disk around that index (hence why you can only have the one), so if your join data isn't being sorted by those exact columns as well, I don't think it'd make any difference. Plus by putting data that might change (as opposed to the key) into the clustered index, you make it more likely that things will need rebuilding peridically, slowing the overall database down.
Sorry if this sounds a daft question, but have you tried running your query through the index tuning wizard? Not foolproof by any stretch but I've had some decent improvements from it in the past.
You only get one clustered index - this is what controls the physical storage of the table on disk / in memory.
Non-clustered indexes repeat the included fields, with pointer to the rows that have that value. Having an index on the columns being used in your joins should improve performance. You can further optimise by using "included columns" in your index - this duplicates the row information directly into the index, which can remove the performance penalty of having to look up the row itself to perform the select.
It is useful to pay attention to the order in which your joins occur - the sequence of columns in your index should match up to this. Remember that the SQL engine may optimise and re-order your query internally - profiling may be helpful.
In most situations, you can just use the Database Engine Tuning Advisor - the recommendations it provides are pretty much spot on.
If you can your best bet is for a non-clustered index that has all the element of your join in it and if possible the field you are selecting.
This will create a spanning index meaning that all the fields SQL requires to perform are on one index.
If possible have an index which has no unnessasery field in it. Every field added makes the an individual index record larger, the smaller each index record the more you get in each Page. The more index items you get in each page the less you have to go to the Disk.
Clustered Index - Will mean the table is layed out in the order specified in the Index, this means that you will get better performance for select * from TABLE where INDEXFIELD = 3. Unless you are selecting lots of large data items this should not be required.

Database indexes: Only selects!

Good day,
I have about 4GB of data, separated in about 10 different tables. Each table has a lot of columns, and each column can be a search criteria in a query. I'm not a DBA at all, and I don't know much about indexes, but I want to speed up the search as much as possible. The important point is, there won't be any update, insert or delete at any moment (the tables are populated once every 4 months). Is it appropriate to create an index on each and every column? Remember: no insert, update or delete, only selects!
Also, if I can make all of these columns integer instead of varchar, would i make a difference in speed?
Thank you very much!
Answer: No. Indexing every column separately is not good design. Indexes need to comprise multiple columns in many cases, and there are different types of indexes for different requirements.
The tuning wizard mentioned in other answers is a good first cut (esp. for a learner).
Don't try to guess your way through it, or hope you understand complex analyses - get advice specific to your situation. We seem to have several threads going here that are quite active for specific situations and query optimization.
Have you looked at running the Index Tuning Wizard? Will give you suggestions of indexes based on a workload.
Absolutely not.
You have to understand how indexes work. If you have a table of say, 1000 records, but it's a BIT and there can be one of two values, if you index on that column and that column only, it will be worthless, because it will not be selective enough. When you index on a column, be very cognizant of what types of selects are going to be done on the table. When you create an index on a column, will that index be selective enough for the optimizer to use effectively?
To that point, you may very well find that a few carefully selected composite indexes will vastly outperform the solution of many single indexes on each column. The golden rule: how the database is queried will determine how you should make your indexes.
Two pieces of missing information: how many distinct values are in each column, and which DBMS you're using. If you're using Oracle and have less than a few thousand distinct values per column, you can create bitmap indexes. These are very space- and execution-efficient for exact matches.
Otherwise, it's a tradeoff: each index will add roughly the same amount of space as a one-column name containing the same data, so you'll essentially double (probably 2.5x) your space requirements. So maybe 10G, which isn't a whole lot of data.
Then there's the question of whether your DBMS will efficiently merge multiple index-based selects. It's quite possible that it won't, unless you do self-joins for every column that you're selecting against.
Best answer: try it on a smaller dataset (so that you're not spending all your time building the indexes) and see how it works.
If you are selecting a set of columns from the table greater than those covered by the columns in the selected indexes, then you will inevitably incur a bookmark lookup in the query plan, which is where the query processor has to retrieve the non-covered columns from the clustered index using the reference ID from leaf rows in the associated non-clustered index.
In my experience, bookmark lookups can really kill query performance, due to the volume of extra reads required and the fact that each row in the clustered index has to be resolved individually. This is why I try to make NC indexes covering wherever possible, which is easier on smaller tables where the required query plans are well-known, but if you have large tables with lots of columns with arbitrary queries expected then this probably won't be feasible.
This means you only get bang for your buck with an NC index of any kind, if the index is covering, or selects a small-enough data set that the cost of a bookmark lookup is mitigated - indeed, you may find that the query optimizer won't even look at your indexes if the cost is prohibitive compared to a clustered index scan, where all the columns are already available.
So there is no point in creating an index unless you know that index will optimize the result of a given query. The value of an index is therefore proportional to the percentage of queries that it can optimize for a given table, and this can only be determined by analyzing the queries that are being executed, which is exactly what the Index Tuning Wizard does for you.
so in summary:
1) Don't index every column. This is classic premature optimization. You cannot optimize a large table with indexes for all possible query plans in advance.
2) Don't index any column, until you have captured and run a base workload through the Index Tuning Wizard. This workload needs to be representative of the usage patterns of your application, so that the wizard can determine what indexes would actually help the performance of your queries.

Resources