What is the Difference between these two indexes in SQL Server? - sql-server

I have created these two indexes based on how Index wizard suggested for different scenarios.
I am wondering if they are the same or are they different?
Index (1):
CREATE NONClustered Index [IX_Rec_RecoveryDate_RecoveryDateTypeKey_CaseId]
ON [Rec].[RecoveryDate] ([RecoveryDateTypeKey], [CurrentFlag])
INCLUDE (CaseId)
Index (2):
CREATE NONClustered Index [IX_Rec_RecoveryDate_currentFlag]
ON [Rec].[RecoveryDate] ([CurrentFlag])
INCLUDE (CaseId, RecoveryDateTypekey)

The question is, what queries are you trying to optimize?
These two indexes are very different. The first index would be great for a query like this:
SELECT CaseId
FROM Rec.RecoveryDate
WHERE RecoveryTypeKey = 5
AND CurrentFlag = 1 -- or whatever
The first one indexes the columns in the WHERE clause. SQL Server would be able to search for the specified RecoveryDateTypeKey and CurrentFlag. Since the CaseId is INCLUDEd in the index leaf nodes, SQL Server would not have to join back to the table to get it.
Using the same query, the second index would behave differently. If you are lucky, SQL Server would search for all records where CurrentFlag is 1. Then, it would traverse these leaf nodes looking for the matching RecoveryTypeKey. On the other hand, if there are a lot of records where the CurrentFlag is 1, SQL Server might choose to do an index scan instead.
Then again, if you are wanting to optimize a query like this:
SELECT CaseId, RecoveryTypeKey
FROM Rec.RecoveryDate
WHERE CurrentFlag = 1
The first index would be useless, because the CurrentFlag is the second column in the index. SQL Server wouldn't be able to search it for CurrentFlag = 1, so it would probably do an index scan.

they're different. Index 1 indexes two columns, Index 2 indexes one column but includes two columns in the nodes.
The difference basically means that if you search using the two columns, Index 1 could be much faster than Index 2.
Index 2 would be better than a normal index of the one column, because if you need the other column in your result, Index 2 already has the value so no lookup with the actual table would be needed.

The indexes store the same information (1 row per row in the Recovery table with the columns caseid, RecoveryDateTypeKey, CurrentFlag), but are organized in a different order and hence can be used for different queries.
The first index can handles where clauses such as
WHERE RecoveryDateTypeKey = #p1 --Prefix matching!
and
WHERE RecoveryDateTypeKey = #p1 AND CurrentFlag = #p2
The second index only handles
WHERE CurrentFlag = #p2
If CurrentFlag is a low cardinality column such as a bit, or a char(1) (Y/N), then I'd recommend filtered indexes.
CREATE INDEX IX_REC_Yes_fltr on Recovery (RecoveryDateTypeKey) WHERE (CurrentFlag = 'Y')
INCLUDE (CaseId) --Assumes that CurrentFlag = 'Y' is the most used value
--Maybe even a second one.
CREATE INDEX IX_REC_No_fltr on Recovery (RecoveryDateTypeKey) WHERE (CurrentFlag = 'N')
INCLUDE (CaseId) --Maybe handle CurrentFlag = 'N' as well.
Each filtered index only includes the values that meet the criteria, so combined they are the same size as the non-filtered index.

Related

Indexes turn SQL query too slow

I'm having a huge issue on a SQL query, after I added an index.
declare #DateFromCT date, #DateToCT date;
declare #DateFromCT2 date, #DateToCT2 date;
set dateformat dmy;
set #DateFromCT= '1/1/2015'; set #DateToCT= '31/3/2015';
set #DateFromCT2= '1/4/2015'; set #DateToCT2= '30/4/2015';
Select distinct CT.CodCliente,ct.codacesso FROM CT_Contabilidade CT
Inner join CD_PlanoContas PC ON CT.CodAcesso = PC.Cod
WHERE NOT exists (
SELECT 1 FROM ct_contabilidade CT2
WHERE CT2.CodAcesso = CT.CodAcesso
and CT2.Data between #DateFromCT2 and #DateToCT2
And ( CT2.CodEmpresa = 1) And CT2.codcliente = ct.codcliente )
and CT.Data between #DateFromCT and #DateToCT
AND PC.subgrupo = 'C'
And ( CT.CodEmpresa = 1 ) And ct.codCliente > 0
The CT_Contabilidade's PK is a Sequential (bigint identity), clustered index.
It has 1.5 million records.
Without other non-clustered indexes, it performs well, took less than 1 second. That's OK for me.
I create an index over the CodAcesso to match CD_PlanoContas key (cod);
The CD_PlanoContas PK (clustered index) is Cod.
It's still performing well. No notable difference...
So I create a index over the codCliente (since it also refers another table)
... And after this, the query is TOO slow; it is taking 7 or 8 MINUTES.
If I drop the CodAcesso index, it turn to be ok.
If I drop the CodCliente index, it is ok too.
If I let them both, but change the query , taking of the Inner Join with CD_Planocontas (and consequently , the filter "AND PC.subgrupo = 'C'") it is OK.
I can't imagine the indexes are causing the query to behave that way.
It's a HUGE difference, not just a "loss of performance". I try some other things, as take out each filter... not changed.
The execution plan suggests an index:
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[CT_Contabilidade] ([CodEmpresa],[Data],[CodCliente])
INCLUDE ([CodAcesso])
I created it, and the query works fine, even with the 2 other indexes (codCliente and codAcesso)
But I didn't like to create a specific index to this query (it's just one of many queries that uses these tables).
If runs well without no index, I think it should runs at least EQUAL with this 2 indexes.
What causes the performance to change so drastically? What do I need to change to speed things up?
try using an index optimizer hint to control which index is being used.
for example:
select *
from titles with (index (titleind))
where title = 'The Gourmet Microwave'
use the 'set statistics io on' command to see the number of pages being scanned with each query/index combo and use the 'rightclick/show execution plan' option to see how the query is being executed
It is not always good idea to follow the suggestion of execution plan.
I suggest you compare the execution plan before and after adding index and see the difference. Maybe that index cause SQL engine to choose a bad plan.
Also try update statistics on your table and index and see how if affects.

SQL Query is slow when ORDER BY statement added

I have a table [Documents] with the following columns:
Name (string)
Status (string)
DateCreated [datetime]
This table has around 1 million records. All three of these columns have an index (a single index for each one).
When I run this query:
select top 50 *
from [Documents]
where (Name = 'None' OR Name is null OR Name = '')
and Status = 'New';
Execution is really fast (300 ms.)
If I run the same query but with the ORDER BY clause, it's really slow (3000 ms)
select top 50 *
from [Documents]
where (Name = 'None' OR Name is null OR Name = '')
and Status = 'New'
order by DateCreated;
I understand that its searching in another index (DateCreated), but should it really be that much slower? If so, why? Anything I can do to speed this query up (a composite index)?
Thanks
BTW: All Indexes including DateCreated have really low fragmentation, in fact I ran a reorganize and it didn't change a thing.
As far as why the query is slower, the query is required to return the rows "in order", so it either needs to do a sort, or it needs to use an index.
Using the index with a leading column of CreatedDate, SQL Server can avoid a sort. But SQL Server would also have to visit the pages in the underlying table to evaluate whether the row is to be returned, looking at the values in Status and Name columns.
If the optimizer chooses not to use the index with CreatedDate as the leading column, then it needs to first locate all of the rows that satisfy the predicates, and then perform a sort operation to get those rows in order. Then it can return the first fifty rows from the sorted set. (SQL Server wouldn't necessarily need to sort the entire set, but it would need to go through that whole set, and do sufficient sorting to guarantee that it's got the "first fifty" that need to be returned.
NOTE: I suspect you already know this, but to clarify: SQL Server honors the ORDER BY before the TOP 50. If you wanted any 50 rows that satisfied the predicates, but not necessarily the 50 rows with the lowest values of DateCreated,you could restructure/rewrite your query, to get (at most) 50 rows, and then perform the sort of just those.
A couple of ideas to improve performance
Adding a composite index (as other answers have suggested) may offer some improvement, for example:
ON Documents (Status, DateCreated, Name)
SQL Server might be able to use that index to satisfy the equality predicate on Status, and also return the rows in DateCreated order without a sort operation. SQL server may also be able to satisfy the predicate on Name from the index, limiting the number of lookups to pages in the underlying table, which it needs to do for rows to be returned, to get "all" of the columns for the row.
For SQL Server 2008 or later, I'd consider a filtered index... dependent on the cardinality of Status='New' (that is, if rows that satisfy the predicate Status='New' is a relatively small subset of the table.
CREATE NONCLUSTERED INDEX Documents_FIX
ON Documents (Status, DateCreated, Name)
WHERE Status = 'New'
I would also modify the query to specify ORDER BY Status, DateCreated, Name
so that the order by clause matches the index, it doesn't really change the order that the rows are returned in.
As a more complicated alternative, I would consider adding a persisted computed column and adding a filtered index on that
ALTER TABLE Documents
ADD new_none_date_created AS
CASE
WHEN Status = 'New' AND COALESCE(Name,'') IN ('','None') THEN DateCreated
ELSE NULL
END
PERSISTED
;
CREATE NONCLUSTERED INDEX Documents_FIXP
ON Documents (new_none_date_created)
WHERE new_none_date_created IS NOT NULL
;
Then the query could be re-written:
SELECT TOP 50 *
FROM Documents
WHERE new_none_date_created IS NOT NULL
ORDER BY new_none_date_created
;
If DateCreated field means insertion time to table, you can create an integer id field and order by that integer field.
You need an index by 2 columns: (Name, DateCreated). The order of fields in the index is important. So, replace your index for just Name with a new index for two columns (Name, DateCreated).

SQL Server index on group of columns doesn't perform well on individual column seek

I have a non-clustered index on a group of columns as (a, b, c, d), and I already use this index using a common query we have which search for those four columns in the where clause.
On the other side, when I try to search for column (a) by simply using:
select count(*)
from table
where a = value
here the performance is fine and execution plan shows it used my index.
But when I try to search for column (d) by simply using:
select count(*)
from table
where d = value
here the performance is bad, and execution plan already used same index but it shows hint that index is missing and impact is 98% and it suggest creating new index for the column (d).
Just for testing, I tried to create new index on this column and the performance become very good.
I don't want to stuck in redundant indices as the table is very huge (30GB) and it has about 100 million rows.
Any idea why my main index didn't perform well with all columns?
Column a data type is INT
Column d data type is TINYINT
SQL Server version is 2014 Enterprise.
Thanks.
Abed
If you have complex index on 4 columns (A,B,C,D )then you could use queries which filter:
1) WHERE A=...
2) WHERE A=... AND B =...
3) WHERE A=... AND B =... AND C=....
3) WHERE A=... AND B =... AND C=.... AND D=...
You CAN'T skip lead portion of the index, if you will filter like this :
WHERE B= ... AND C= ... AND D=... (thus, skipping A) performance will be BAD.
TRY creating separate indexes on each column, they are more flexible.

Best index(es) to use for an OR Statement in SQL Server

I have a table which has a bunch of columns but the two relevant ones are:
Due_Amount MONEY
Bounced_Due_Amount MONEY
I have a SQL query like the following
SELECT * FROM table WHERE (Due_Amount > 0 OR Bounced_Due_Amount > 0)
Would the best index to put on this table for SQL Server 2008 be an index which includes both columns in the index, or should I put an separate index on each column?
An Index can't be used on an OR like that. try this:
SELECT * FROM table WHERE Due_Amount > 0
UNION ALL
SELECT * FROM table Bounced_Due_Amount > 0
--use "UNION" if Due_Amount and Bounced_Due_Amount could both >0 at any one time
have an index on Due_Amount and another on Bounced_Due_Amount.
It might be better to redesign your table. Without knowing your business logic or table, I'm going to guess that you could have a "Bounced" Y/N or 1/0 char/bit column and just a "Due_Amount" column. Add an index on that "Due_Amount" and the query would just be:
SELECT * FROM table WHERE Due_Amount > 0
you could still differentiate between a Bounced or not row. This will not work if you need to have both a bounced and non-bounced due amount at the same time.
My guess is that you would be better off with an index on each individual column. Having it on both won't help any more than having it on just the first column unless you have other queries that would use the compound index.
Your best bet is to try the query with an index on one column, an index on the other column, and two indexes - one on each column. Do some tests with each (on real data, not test data) and see which works best. Take a look at the query plans to understand why.
Depending on the specific data (both size and cardinality) SQL Server may end up using one, both, or possibly even neither index. The only way to know for sure is to test them each.
Technically, you can have an index on a persisted computed column and use the computed column instead of the OR condition in the query, see Creating Indexes on Computed Columns:
alter table [table] add Max_Due_Amount as
case
when Due_Amount > Bounced_Due_Amount the Due_Ammount
else Bounced_Due_Amount
end
persisted;
go
create index idxTableMaxDueAmount on table (Max_Due_Amount );
go
SELECT * FROM table WHERE Max_Due_Amount > 0;
But in general I'd recommend using the UNION approach like KM suggested.
Specifically for this query, it would be best to create an index on both columns in the order they are used in the where clause. Otherwise the index might not be used.

Why does SQL choose an incorrect index in my case?

I have a table with two indices; one is a multi-column clustered index, on a 3 columns:
(
symbolid int16,
bartime int32,
typeid int8
)
The second is non clustered on
(
bartime int16
)
The select statement i'm trying to run is:
SELECT symbolID, vTrdBuy
FROM mvTrdHidUhd
WHERE typeID = 1
AND barDateTime = 44991
AND symbolid in (1010,1020,1030,1040,1050,1060)
I run this query on sql2008 using sql management studio editor and enabling actual execution plan, I found that the sql uses the second index and propse to create a new index for the three columns (symbolid,bartime,typeid) but nonclustered!!! (I think it sayes non clustered index as there is already clustered one)
This selection is wrong, again I rerun the same query and forced SQL to use the clusted index (using "with index") and performance is better as it should.
I have two questions here one related to this behavior and the second for the query itself
Why SQL chooses wrong index and propse the same index
Which one I should use in the "where" condition for better performance
symbolid in (1010,1020,1030,1040,1050,1060)
(symbolid = 1010 or symbolid = 1020 ..etc)
(symbolid between (1010 and 1060))
After Testing
I found that when I change the where condition from using IN to use >= and <=the non clustered index on bartime column gives better performance than clustered index on 3 columns.
SO I have two cases if the WHERE uses IN it is better to use the clustered index, if it contains >= and <= it uses the second one.
SELECT symbolID, vTrdBuy
FROM mvTrdHidUhd
WHERE typeID = 1
AND barDateTime = 44991
AND symbolid IN (1010,1020,1030,1040,1050,1060)
This condition is not covered by a single contiguous range of your clustered index.
These rows:
1010, 44991, 1
1010, 50000, 1
1020, 44991, 1
will come in order in the index, but your query will select the first and the third one, skipping the second.
SQL Server can use Clustered Index Seek if there is a limited number of predicates, like in your IN case. In this case it uses a number of ranges:
SELECT symbolID, vTrdBuy
FROM mvTrdHidUhd
WHERE (typeID = 1
AND barDateTime = 44991
AND symbolid = 1010)
OR
(typeID = 1
AND barDateTime = 44991
AND symbolid = 1010)
OR …
But in case of a BETWEEN range on symbolid it cannot construct such a limited number of predicates, that's why it reverts to less efficient Clustered Index Scan (which scans on symbolid and just filters the wrong results out).
In this case your nonclustered index performs better.
You could rewrite your query like this:
SELECT symbolID, vTrdBuy
FROM (
SELECT DISTINCT symbolid
FROM mvTrdHidUhd
WHERE symbolid BETWEEN 1010 AND 1050
) s
JOIN mvTrdHidUhd m
ON m.symbolid = s.symbolid
AND m.typeID = 1
AND m.barDateTime = 44991
, which will use Clustered Index Seek on your table as well, both to build a list of DISTINCT symbolid and to join on this list.
Updating the statistics on the table / indexes may make it choose the correct index
Use symbolid BETWEEN 1010 AND 1050 if possible. The use of BETWEEN or = or >= or > or <n or <= or the combination of these with AND generally leads to better performance and better index selection than the use of OR or IN.
It is possible the order of index column affects whether the optimiser will choose your index. You indicate the index is (symbolid int16,bartime int32,typeid int8) but the symbolid is the least distinct value in your where clauses. This would require 6 index lookups for the 6 values you have.
I would probably start with the between statement but only testing with your data, server, indexes etc will prove the best case.
If you are going to create another index try the 2 other orders for those columns.
And as noted elsewhere update your statistics
You can also try out a covering index on (symbolid,bartime,typeid,mvTrdBuy)
Your query references four columns:
symbolID
vTrdBuy
typeID
barDateTime
While the clustered index only covers three of them
symbolID
vTrdBuy
typeID
barDateTime
The reason SQL Server ignores that index is that it's useless to it. The index is first sorted by symbolID, and you don't want a specific symbolID, but a bunch of random values. This means that it has to read all over the table.
The next column in the clustered index is vTrdBuy. This isn't used to help it to skip to the rows it actually wants.
Looking at the query, two columns are very specific in limiting what rows you want to return:
WHERE typeID = 1
AND barDateTime = 44991
Creating an index that starts with typeID and barDateTime can really be useful in helping SQL Server jump to the rows that you are interested in.
First SQL Server can jump right to the rows that are
typeID = 1.
Once there, it can jump right to bars where
barDateTime = March 8, 2023
It can do this by seeking right through the index, since the index is ordered by the columns in it. This is very fast, and it's eliminated the majority of rows from being looked at.
If you were to create the index:
(
typeID
barDateTime
symbolID
)
it still might not useful if the query returns a lot of rows. In order to finish the SELECT statement, SQL Server still needs the vTrdBuy value. It has to do this by jumping through the table for each one of the rows that matches the criteria (called a Bookmark Lookup). If there are too many rows (say > 500), SQL Server will just forget the index and just scan the entire table - cause it would be faster.
You want to prevent the bookmark lookup, by letting it not have to go back to the table for the missing value, you want to include the value in the index:
CREATE INDEX IX_mvTrdHidUhd_FancyCovering ON mvTrdHidUhd
(
typeID, barDateTime, symbolID, vTrdBuy
)
Now you have an index that contians everything SQL Server wants, in the order that it wants, and you don't have to mess with the physical sort order (i.e. clustering) of the physical table.

Resources