Columns of an index contain all PKs. Not efficient? - database

For example,
TABLE_A has a, b, c columns.
a is a PK and indexed as PK_TABLE_A. And there is an index called IDX_TABLE_A that contains b, a in order.
SELECT a, b
FROM TABLE_A
WHERE a = #P1 AND b = #P2
This query will use PK_TABLE_A and b predicate will be ignored.
SELECT a, b
FROM TABLE_A
WHERE b = #P2
This query will use IDX_TABLE_A. But a doesn't have to be indexed. Being an included column will be more efficient.
Are there any reasonable cases IDX_TABLE_A indexes a column?

Including columns in an index that do not help with locating particular rows can still improve performance of a query, by allowing values for those columns to be retrieved directly from the index record, without following a reference from the index record to the table record to obtain them. Queries whose selected columns are all included in one (or more) indexes are called "covered" queries; an index "covers" all the desired columns and the database does not need to access the table rows themselves to build the query results.
The index on (b,a) in TABLE_A might exist to speed up a query that matches on b, or possibly both b and a (these could be exact matches, range matches or other kinds), and wants to quickly return only the values of b and a in the query results, but not the values of column c.

Related

teradata, optimization query

I make a request in teradata. after a while the session is terminated cpu> 100000 s
How can I optimize a query?
select a, b, c, d, e from table where (a = '55' or a='055') and date > '20180701'
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows.
ALTER TABLE table ADD INDEX (a);

SQL Index performance ,, which is better?

Is the following SQL good or bad practice from a performance perspective?
Two queries, searching by a common column:
CREATE INDEX tbl_idx ON tbl (a, b);
SELECT id, a, b
FROM tbl
WHERE a = #a
AND b = #b;
SELECT id, a, b
FROM tbl
WHERE b = #b;
This index
CREATE INDEX tbl_idx ON tbl (a, b);
Will be useful for these queries
where a= and b =
where a= and b>
where a like 'someval%' and b=
but not useful for these queries:
where b=
where a> and b=
where a like '%someval%' and b=
where isnull(a,'')= and b=
In summary, in a multicolumn index, if SQL Server was able to do a seek on first key column then the index would be useful..
Coming to your question, the first query would benefit from index you created whereas second query may tend to do a scan on this index..
There are many factors which dictate whether seek is good or bad.In some cases SQL Server may tend to not use the index available like bookmark lookup cost exceeds limit..
References:
https://blogs.msdn.microsoft.com/craigfr/2006/07/07/seek-predicates/
https://blogs.msdn.microsoft.com/craigfr/2006/06/26/scans-vs-seeks/
https://www.youtube.com/watch?v=-m426WYclz8
If you reverse the index column order (b,a), then the index may be useful to both queries. Furthermore, if id is the primary key implemented as a clustered index, the index will cover both queries because the clustering key is implicitly included as the row locator. Otherwise, id could be explictly added as an included column to provide the best performance:
CREATE INDEX tbl_idx ON tbl (a, b)
INCLUDE(id);

SQL Query is slow when ORDER BY statement added

I have a table [Documents] with the following columns:
Name (string)
Status (string)
DateCreated [datetime]
This table has around 1 million records. All three of these columns have an index (a single index for each one).
When I run this query:
select top 50 *
from [Documents]
where (Name = 'None' OR Name is null OR Name = '')
and Status = 'New';
Execution is really fast (300 ms.)
If I run the same query but with the ORDER BY clause, it's really slow (3000 ms)
select top 50 *
from [Documents]
where (Name = 'None' OR Name is null OR Name = '')
and Status = 'New'
order by DateCreated;
I understand that its searching in another index (DateCreated), but should it really be that much slower? If so, why? Anything I can do to speed this query up (a composite index)?
Thanks
BTW: All Indexes including DateCreated have really low fragmentation, in fact I ran a reorganize and it didn't change a thing.
As far as why the query is slower, the query is required to return the rows "in order", so it either needs to do a sort, or it needs to use an index.
Using the index with a leading column of CreatedDate, SQL Server can avoid a sort. But SQL Server would also have to visit the pages in the underlying table to evaluate whether the row is to be returned, looking at the values in Status and Name columns.
If the optimizer chooses not to use the index with CreatedDate as the leading column, then it needs to first locate all of the rows that satisfy the predicates, and then perform a sort operation to get those rows in order. Then it can return the first fifty rows from the sorted set. (SQL Server wouldn't necessarily need to sort the entire set, but it would need to go through that whole set, and do sufficient sorting to guarantee that it's got the "first fifty" that need to be returned.
NOTE: I suspect you already know this, but to clarify: SQL Server honors the ORDER BY before the TOP 50. If you wanted any 50 rows that satisfied the predicates, but not necessarily the 50 rows with the lowest values of DateCreated,you could restructure/rewrite your query, to get (at most) 50 rows, and then perform the sort of just those.
A couple of ideas to improve performance
Adding a composite index (as other answers have suggested) may offer some improvement, for example:
ON Documents (Status, DateCreated, Name)
SQL Server might be able to use that index to satisfy the equality predicate on Status, and also return the rows in DateCreated order without a sort operation. SQL server may also be able to satisfy the predicate on Name from the index, limiting the number of lookups to pages in the underlying table, which it needs to do for rows to be returned, to get "all" of the columns for the row.
For SQL Server 2008 or later, I'd consider a filtered index... dependent on the cardinality of Status='New' (that is, if rows that satisfy the predicate Status='New' is a relatively small subset of the table.
CREATE NONCLUSTERED INDEX Documents_FIX
ON Documents (Status, DateCreated, Name)
WHERE Status = 'New'
I would also modify the query to specify ORDER BY Status, DateCreated, Name
so that the order by clause matches the index, it doesn't really change the order that the rows are returned in.
As a more complicated alternative, I would consider adding a persisted computed column and adding a filtered index on that
ALTER TABLE Documents
ADD new_none_date_created AS
CASE
WHEN Status = 'New' AND COALESCE(Name,'') IN ('','None') THEN DateCreated
ELSE NULL
END
PERSISTED
;
CREATE NONCLUSTERED INDEX Documents_FIXP
ON Documents (new_none_date_created)
WHERE new_none_date_created IS NOT NULL
;
Then the query could be re-written:
SELECT TOP 50 *
FROM Documents
WHERE new_none_date_created IS NOT NULL
ORDER BY new_none_date_created
;
If DateCreated field means insertion time to table, you can create an integer id field and order by that integer field.
You need an index by 2 columns: (Name, DateCreated). The order of fields in the index is important. So, replace your index for just Name with a new index for two columns (Name, DateCreated).

SQL Server index on group of columns doesn't perform well on individual column seek

I have a non-clustered index on a group of columns as (a, b, c, d), and I already use this index using a common query we have which search for those four columns in the where clause.
On the other side, when I try to search for column (a) by simply using:
select count(*)
from table
where a = value
here the performance is fine and execution plan shows it used my index.
But when I try to search for column (d) by simply using:
select count(*)
from table
where d = value
here the performance is bad, and execution plan already used same index but it shows hint that index is missing and impact is 98% and it suggest creating new index for the column (d).
Just for testing, I tried to create new index on this column and the performance become very good.
I don't want to stuck in redundant indices as the table is very huge (30GB) and it has about 100 million rows.
Any idea why my main index didn't perform well with all columns?
Column a data type is INT
Column d data type is TINYINT
SQL Server version is 2014 Enterprise.
Thanks.
Abed
If you have complex index on 4 columns (A,B,C,D )then you could use queries which filter:
1) WHERE A=...
2) WHERE A=... AND B =...
3) WHERE A=... AND B =... AND C=....
3) WHERE A=... AND B =... AND C=.... AND D=...
You CAN'T skip lead portion of the index, if you will filter like this :
WHERE B= ... AND C= ... AND D=... (thus, skipping A) performance will be BAD.
TRY creating separate indexes on each column, they are more flexible.

What is the Difference between these two indexes in SQL Server?

I have created these two indexes based on how Index wizard suggested for different scenarios.
I am wondering if they are the same or are they different?
Index (1):
CREATE NONClustered Index [IX_Rec_RecoveryDate_RecoveryDateTypeKey_CaseId]
ON [Rec].[RecoveryDate] ([RecoveryDateTypeKey], [CurrentFlag])
INCLUDE (CaseId)
Index (2):
CREATE NONClustered Index [IX_Rec_RecoveryDate_currentFlag]
ON [Rec].[RecoveryDate] ([CurrentFlag])
INCLUDE (CaseId, RecoveryDateTypekey)
The question is, what queries are you trying to optimize?
These two indexes are very different. The first index would be great for a query like this:
SELECT CaseId
FROM Rec.RecoveryDate
WHERE RecoveryTypeKey = 5
AND CurrentFlag = 1 -- or whatever
The first one indexes the columns in the WHERE clause. SQL Server would be able to search for the specified RecoveryDateTypeKey and CurrentFlag. Since the CaseId is INCLUDEd in the index leaf nodes, SQL Server would not have to join back to the table to get it.
Using the same query, the second index would behave differently. If you are lucky, SQL Server would search for all records where CurrentFlag is 1. Then, it would traverse these leaf nodes looking for the matching RecoveryTypeKey. On the other hand, if there are a lot of records where the CurrentFlag is 1, SQL Server might choose to do an index scan instead.
Then again, if you are wanting to optimize a query like this:
SELECT CaseId, RecoveryTypeKey
FROM Rec.RecoveryDate
WHERE CurrentFlag = 1
The first index would be useless, because the CurrentFlag is the second column in the index. SQL Server wouldn't be able to search it for CurrentFlag = 1, so it would probably do an index scan.
they're different. Index 1 indexes two columns, Index 2 indexes one column but includes two columns in the nodes.
The difference basically means that if you search using the two columns, Index 1 could be much faster than Index 2.
Index 2 would be better than a normal index of the one column, because if you need the other column in your result, Index 2 already has the value so no lookup with the actual table would be needed.
The indexes store the same information (1 row per row in the Recovery table with the columns caseid, RecoveryDateTypeKey, CurrentFlag), but are organized in a different order and hence can be used for different queries.
The first index can handles where clauses such as
WHERE RecoveryDateTypeKey = #p1 --Prefix matching!
and
WHERE RecoveryDateTypeKey = #p1 AND CurrentFlag = #p2
The second index only handles
WHERE CurrentFlag = #p2
If CurrentFlag is a low cardinality column such as a bit, or a char(1) (Y/N), then I'd recommend filtered indexes.
CREATE INDEX IX_REC_Yes_fltr on Recovery (RecoveryDateTypeKey) WHERE (CurrentFlag = 'Y')
INCLUDE (CaseId) --Assumes that CurrentFlag = 'Y' is the most used value
--Maybe even a second one.
CREATE INDEX IX_REC_No_fltr on Recovery (RecoveryDateTypeKey) WHERE (CurrentFlag = 'N')
INCLUDE (CaseId) --Maybe handle CurrentFlag = 'N' as well.
Each filtered index only includes the values that meet the criteria, so combined they are the same size as the non-filtered index.

Resources