Multiple column indexes optimization for multiple column queries on SQL Server - sql-server

I have one table [table] with two columns that needs to be filtered: [column1] and [column2].
In my program I execute a query like:
select * from [table] where [column1] = 'foo' and [column2] = 'bar';
Which is faster:
Creating two indexes, one on each column. ([column1] and [column2])
Creating one index containing both columns. ([column1]+[column2])
This question have been bugging me for a while, I have no idea how query optimization works and how SQL Server uses the created indexes to speed up queries.

Second one is ALWAYS faster for this query - but you need to put the more selective one first (in the order of indexes) to benefit more. The only exception is if for performance reasons, SQL decides to use clustered index so ignores the non-clustered.
The combination of two values create a much more selective criteria. Also it helps with performance since there is no BOOKMARK LOOKUP required on a covering index.
Bookmark lookups are the source of major performance degradation and that is why covering index is always better than 2 indexes.
UPDATE
Bear in mind, if you have your index as column1+coulmn2, searches on just column2 cannot use this index so you will need a separate index on column2 as well.

It depends!
It depends on the selectivity of those columns.
If you were not selecting all columns '*', you might be able to utilise a really fast covering index, comprised of the where clause columns and INCLUDE'ing the columns in the SELECT list.

Related

SQL Server Non Clustered Index - Include values

I'm relatively new to the world of SQL Server Optimization and Indexes. I ran a query that recommends missing indexes (https://blog.sqlauthority.com/2011/01/03/sql-server-2008-missing-index-script-download/) and I'm having trouble understanding the differences of the Include clause.
The only difference in my two indexes is that Index1 contains the 'Email' column and Index2 does NOT. Would both of these indexes be required or will Index1 be sufficient? I believe only Index1 is needed but I'm not sure.
CREATE INDEX [Index1]
ON [ActiveDirectory].[dbo].[ActiveDirectory] ([MailEnabled], [Active])
INCLUDE ([EmployeeID], [DisplayName], [Email])
CREATE INDEX [Index2]
ON [ActiveDirectory].[dbo].[ActiveDirectory] ([MailEnabled], [Active])
INCLUDE ([EmployeeID], [DisplayName])
Thank you!
Griz
It depends on what columns do you need in your select statement
If you need all three [EmployeeID], [DisplayName], [Email] - create Index1
Otherwise, create Index2
Roughly - the fields that you use in your WHERE predicate should be in the ON part of the index, and the columns that you use in SELECT should be in the 'INCLUDE' (if they are not already mentioned in ON)
See Creating Indexes with Included Columns for an example
Both indexes at the same time, are most definitely - not needed.
Basically, the columns in the INCLUDE part of the index do not participate in the index structure, but rather are attached to the index so you don't have to go back to your table and look up the data via PK.
You are able to efficiently "look up" by the columns in the ON clause: [MailEnabled], [Active] , and you can include them in select without extra cost.
You are also able to select the columns in your INCLUDE clause at no extra cost [EmployeeID], [DisplayName], [Email] - but searching (filtering, joining, lookup) on them will not be fast
Only the first index is required.
If you got both suggestions, it's because one query required all three include columns (a 'covering index') to satisfy the query without having to do a lookup on the primary clustered index to grab the columns... and a second query only required two of them.
One covering index with all three include columns satisfies both queries.
Trying to create both would create a lot of duplicate data and slow down inserts more and use more disk space for no good reason.

Proper way of creating indexes for same column when "include" columns are different

Let's say I have 2 stored procedures and 1 table.
Table name: Table_A
Procedure names: proc1 and proc2
When I run the proc1 with execution plan, it suggests me to create an index for Table_A for tblID (which is NOT a Primary Key) column and suggests to include column_A and column_B.
And the proc2 suggests to create an index for Table_A again, for tblID column but this time it suggests to include column_B and column_C (it suggests column_C instead of column_A for this procedure)
So my question is, if I have create an index which included all suggested columns like:
CREATE NONCLUSTERED INDEX indexTest
ON [dbo].[Table_A] ([tblID])
INCLUDE ([column_A],[column_B],[column_C])
Does that cause any performance issue?
Is there any disadvantage of gathering INCLUDE columns?
Or should I create 2 different indexes as:
CREATE NONCLUSTERED INDEX indexTest_1
ON [dbo].[Table_A] ([tblID])
INCLUDE ([column_A],[column_B])
CREATE NONCLUSTERED INDEX indexTest_2
ON [dbo].[Table_A] ([tblID])
INCLUDE ([column_B],[column_C])
UPDATE: I would like to add one more thing to this question.
If I do the same thing for primary fields as well:
I mean,
proc-1 suggested to create an index on tblID field. And proc-2 suggested to create an index on tblID and column_A.
If I gather them as :
CREATE NONCLUSTERED INDEX indexTest_3
ON [dbo].[Table_A] ([tblID],[column_A])
INCLUDE ([[column_B])
Does that cause a performance issue? Or Should I create 2 separate index for suggested primary fields?
Definitely create one index that includes all three columns!
The fewer indexes you have, the better - index maintenance is a cost factor - more indices require more maintenance.
And the included columns are included in the leaf level of the index only - the have only a very marginal impact on performance.
Update: if you have a single index on (tblID, column_A), then you can use this for queries that use only tblID in their WHERE clause, or you can use it for queries that use both columns in their WHERE clause.
HOWEVER: this index is useless for queries that use only column_A in their WHERE clause. A compound index (index made up from multiple columns) is only ever useful if a given query uses the n left-most columns as specified in the index.
So in your case, one query seems to indicate tblID, while the other needs (tblID, column_A) - so yes, in this case, I would argue a single index on (tblID, column_A) would work for both queries.
It sounds like you're looking at the missing index dmvs. There are a couple things to realize here. The dmvs are really telling you about specific queries or groups of similar where a specific index might help.
In this sense, you are right to combine the indexes. This is the right idea.
However, also remember that indexes have a cost, and it's not the job of this dmv to weigh that cost. You definitely don't want to just automatically create an index to cover every recommendation. You also want to examine these indexes: once you include columns A,B, and C, are you keeping the entire table (or nearly so) in the index? Could you perhaps get better results by changing the primary key to match this index? Be careful evaluating that last part, because changing the primary key could then leave the prior key as an even more important missing index.

Can including columns into the SELECT from the same table slow down the query?

Imagine Foo table has non-clustered indexes on ColA and ColB
and NO Indexes on ColC, ColD
SELECT colA, colB
FROM Foo
takes about 30 seconds.
SELECT colA, colB, colC, colD
FROM Foo
takes about 2 minutes.
Foo table has more than 5 million rows.
Question:
Is it possible that including columns that are not part of the indexes can slow down the query?
If yes, WHY? -Are not they part of the already read PAGEs?
If you write a query that uses a covering index, then the full data pages in the heap/clustered index are not accessed.
If you subsequently add more columns to the query, such that the index is no longer covering, then either additional lookups will occur (if the index is still used), or you force a different data access path entirely (such as using a table scan instead of using an index)
Since 2005, SQL Server has supported the concept of Included Columns in an index. This includes non-key columns in the leaf of an index - so they're of no use during the data-lookup phase of index usage, but still help to avoid performing an additional lookup back in the heap/clustered index, if they're sufficient to make the index a covering index.
Also, in future, if you want to get a better understanding on why one query is fast and another is slow, look into generating Execution Plans, which you can then compare.
Even if you don't understand the terms used, you should at least be able to play "spot the difference" between them and then search on the terms (such as table scan, index seek, or lookup)
Simple answer is: because non-clustered index is not stored in the same page as data so SQL Server has to lookup actual data pages to pick up the rest.
Non-clustered index are stored in separate data structures while clustered indexes are stored in the same place as the actual data. That’s why you can have only one clustered index.

Single column/multiple column index, what is better?

I have one poor performing procedure with couple of queries in it.
I have identified few temp table queries that does scanning of temp table. I decided to add index on temp table to avoid table scanning. I have noticed that there are multiple columns of temp table which are being used in where clause. However, I am not sure whether I should include all columns in single index (composite index) or multiple indexes with one column each index to gain the maximum performance.
Database is DB2
This all depends greatly on your queries and the data on your table. As a rule of thumb you should include only the columns that reduce greatly the result rows.
If the where clause for first limiting column already drops for instance 90% of the rows and the next one would only reduce a few hundred rows anymore it is not worth the resources to include in the index. Always keep in mind that the database engine works first with the first column of composite index, and then proceeds to the next ones. If your queries have the columns in different order the index will potentially start even slowing your queries down.
Also, if you have a lot of data and using several indexed columns seems worth it you might in some cases want to have separate indexes and have intra-parallelism work. It is possible that running parallel index lookups using several CPUs has better performance - if your server has to spare.
In case of MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on.
If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
Lets say that you have INDEX nameIdx (last_name,first_name) created on table test
Therefore, the nameIdx index is used for lookups in the following queries:
SELECT * FROM test WHERE last_name='Widenius';
SELECT * FROM test
WHERE last_name='Widenius' AND first_name='Michael';
SELECT * FROM test
WHERE last_name='Widenius'
AND (first_name='Michael' OR first_name='Monty');
where as name nameIdx is not used for lookups in the following queries:
SELECT * FROM test WHERE first_name='Michael';
SELECT * FROM test
WHERE last_name='Widenius' OR first_name='Michael';
for more detail refer URL
summary of this is if you are using columns in where clause as mentioned in index order (from left to right ) then it is better than single column index

Index Seek with Bookmark Lookup Only Option for SQL Query?

I am working on optimizing a SQL query that goes against a very wide table in a legacy system. I am not able to narrow the table at this point for various reasons.
My query is running slowly because it does an Index Seek on an Index I've created, and then uses a Bookmark Lookup to find the additional columns it needs that do not exist in the Index. The bookmark lookup takes 42% of the query time (according to the query optimizer).
The table has 38 columns, some of which are nvarchars, so I cannot make a covering index that includes all the columns. I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used.
Also, since 28 of the 38 columns are pulled out via this query, I'd have 28/38 of the columns in the table stored in these covering indexes, so I'm not sure how much this would help.
Do you think a Bookmark Lookup is as good as it is going to get, or what would another option be?
(I should specify that this is SQL Server 2000)
OH,
the covering index with include should work. Another option might be to create a clustered indexed view containing only the columns you need.
Regards,
Lieven
You could create an index with included columns as another option
example from BOL, this is for 2005 and up
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);
To answer this part "I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used."
An index can only be used when the query is created in a way that it is sargable, in other words if you use function on the left side of the operator or leave out the first column of the index in your WHERE clause then the index won't be used. If the selectivity of the index is low then also the index won't be used
Check out SQL Server covering indexes for some more info

Resources