Is the following SQL good or bad practice from a performance perspective?
Two queries, searching by a common column:
CREATE INDEX tbl_idx ON tbl (a, b);
SELECT id, a, b
FROM tbl
WHERE a = #a
AND b = #b;
SELECT id, a, b
FROM tbl
WHERE b = #b;
This index
CREATE INDEX tbl_idx ON tbl (a, b);
Will be useful for these queries
where a= and b =
where a= and b>
where a like 'someval%' and b=
but not useful for these queries:
where b=
where a> and b=
where a like '%someval%' and b=
where isnull(a,'')= and b=
In summary, in a multicolumn index, if SQL Server was able to do a seek on first key column then the index would be useful..
Coming to your question, the first query would benefit from index you created whereas second query may tend to do a scan on this index..
There are many factors which dictate whether seek is good or bad.In some cases SQL Server may tend to not use the index available like bookmark lookup cost exceeds limit..
References:
https://blogs.msdn.microsoft.com/craigfr/2006/07/07/seek-predicates/
https://blogs.msdn.microsoft.com/craigfr/2006/06/26/scans-vs-seeks/
https://www.youtube.com/watch?v=-m426WYclz8
If you reverse the index column order (b,a), then the index may be useful to both queries. Furthermore, if id is the primary key implemented as a clustered index, the index will cover both queries because the clustering key is implicitly included as the row locator. Otherwise, id could be explictly added as an included column to provide the best performance:
CREATE INDEX tbl_idx ON tbl (a, b)
INCLUDE(id);
Related
I make a request in teradata. after a while the session is terminated cpu> 100000 s
How can I optimize a query?
select a, b, c, d, e from table where (a = '55' or a='055') and date > '20180701'
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows.
ALTER TABLE table ADD INDEX (a);
For example,
TABLE_A has a, b, c columns.
a is a PK and indexed as PK_TABLE_A. And there is an index called IDX_TABLE_A that contains b, a in order.
SELECT a, b
FROM TABLE_A
WHERE a = #P1 AND b = #P2
This query will use PK_TABLE_A and b predicate will be ignored.
SELECT a, b
FROM TABLE_A
WHERE b = #P2
This query will use IDX_TABLE_A. But a doesn't have to be indexed. Being an included column will be more efficient.
Are there any reasonable cases IDX_TABLE_A indexes a column?
Including columns in an index that do not help with locating particular rows can still improve performance of a query, by allowing values for those columns to be retrieved directly from the index record, without following a reference from the index record to the table record to obtain them. Queries whose selected columns are all included in one (or more) indexes are called "covered" queries; an index "covers" all the desired columns and the database does not need to access the table rows themselves to build the query results.
The index on (b,a) in TABLE_A might exist to speed up a query that matches on b, or possibly both b and a (these could be exact matches, range matches or other kinds), and wants to quickly return only the values of b and a in the query results, but not the values of column c.
I have a non-clustered columnstore index on all columns a 40m record non-memory optimized table on SQL Server 2016 Enterprise Edition.
A query forcing the use of the columnstore index will perform significantly faster but the optimizer continues to choose to use the clustered index and other non-clustered indexes. I have lots of available RAM and am using appropriate queries against a dimensional model.
Why won't the optimizer choose the columnstoreindex? And how can I encourage its use (without using a hint)?
Here is a sample query not using columnstore:
SELECT
COUNT(*),
SUM(TradeTurnover),
SUM(TradeVolume)
FROM DWH.FactEquityTrade e
--with (INDEX(FactEquityTradeNonClusteredColumnStoreIndex))
JOIN DWH.DimDate d
ON e.TradeDateId = d.DateId
JOIN DWH.DimInstrument i
ON i.instrumentid = e.instrumentid
WHERE d.DateId >= 20160201
AND i.instrumentid = 2
It takes 7 seconds without hint and a fraction of a second with the hint.
The query plan without the hint is here.
The query plan with the hint is here.
The create statement for the columnstore index is:
CREATE NONCLUSTERED COLUMNSTORE INDEX [FactEquityTradeNonClusteredColumnStoreIndex] ON [DWH].[FactEquityTrade]
(
[EquityTradeID],
[InstrumentID],
[TradingSysTransNo],
[TradeDateID],
[TradeTimeID],
[TradeTimestamp],
[UTCTradeTimeStamp],
[PublishDateID],
[PublishTimeID],
[PublishedDateTime],
[UTCPublishedDateTime],
[DelayedTradeYN],
[EquityTradeJunkID],
[BrokerID],
[TraderID],
[CurrencyID],
[TradePrice],
[BidPrice],
[OfferPrice],
[TradeVolume],
[TradeTurnover],
[TradeModificationTypeID],
[InColumnStore],
[TradeFileID],
[BatchID],
[CancelBatchID]
)
WHERE ([InColumnStore]=(1))
WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
GO
Update. Plan using Count(EquityTradeID) instead of Count(*)
and with hint included
You're asking SQL Server to choose a complicated query plan over a simple one. Note that when using the hint, SQL Server has to concatenate the columnstore index with a rowstore non-clustered index (IX_FactEquiteTradeInColumnStore). When using just the rowstore index, it can do a seek (I assume TradeDateId is the leading column on that index). It does still have to do a key lookup, but it's simpler.
I can see two options to get this behavior without a hint:
First, remove InColumnStore from the columnstore index definition and cover the entire table. That's what you're asking from the columnstore - to cover everything.
If that's not possible, you can use a UNION ALL to explicitly split the data:
WITH workaround
AS (
SELECT TradeDateId
, instrumentid
, TradeTurnover
, TradeVolume
FROM DWH.FactEquityTrade
WHERE InColumnStore = 1
UNION ALL
SELECT TradeDateId
, instrumentid
, TradeTurnover
, TradeVolume
FROM DWH.FactEquityTrade
WHERE InColumnStore = 0 -- Assuming this is a non-nullable BIT
)
SELECT COUNT(*)
, SUM(TradeTurnover)
, SUM(TradeVolume)
FROM workaround e
JOIN DWH.DimDate d
ON e.TradeDateId = d.DateId
JOIN DWH.DimInstrument i
ON i.instrumentid = e.instrumentid
WHERE d.DateId >= 20160201
AND i.instrumentid = 2;
Your index is a filtered index (it has a WHERE predicate).
Optimizer would use such index only when the query's WHERE matches the index's WHERE. This is true for classic indexes and most likely true for columnstore indexes. There can be other limitations when optimizer would not use filtered index.
So, either add WHERE ([InColumnStore]=(1)) to your query, or remove it from the index definition.
You said in the comments: "the InColumnStore filter is for efficiency when loading data. For all tests so far the filter covers 100% of all rows". Does "all rows" here mean "all rows of the whole table" or just "all rows of the result set"? Anyway, most likely optimizer doesn't know that (even though it could have derived that from statistics), which means that the plan which uses such index has to explicitly do extra checks/lookups, which optimizer considers too expensive.
Here are few articles on this topic:
Why isn’t my filtered index being used? by
Rob Farley
Optimizer Limitations with Filtered Indexes by Paul White.
An Unexpected Side-Effect of Adding a Filtered Index by Paul White.
How filtered indexes could be a more powerful feature by Aaron Bertrand, see the section Optimizer Limitations.
Try this one:
Bridge your query
Select *
Into #DimDate
From DWH.DimDate
WHERE DateId >= 20160201
Select COUNT(1), SUM(TradeTurnover), SUM(TradeVolume)
From DWH.FactEquityTrade e
Inner Join DWH.DimInstrument i ON i.instrumentid = e.instrumentid
And i.instrumentid = 2
Left Join #DimDate d ON e.TradeDateId = d.DateId
How fast this query running ?
I have a non-clustered index on a group of columns as (a, b, c, d), and I already use this index using a common query we have which search for those four columns in the where clause.
On the other side, when I try to search for column (a) by simply using:
select count(*)
from table
where a = value
here the performance is fine and execution plan shows it used my index.
But when I try to search for column (d) by simply using:
select count(*)
from table
where d = value
here the performance is bad, and execution plan already used same index but it shows hint that index is missing and impact is 98% and it suggest creating new index for the column (d).
Just for testing, I tried to create new index on this column and the performance become very good.
I don't want to stuck in redundant indices as the table is very huge (30GB) and it has about 100 million rows.
Any idea why my main index didn't perform well with all columns?
Column a data type is INT
Column d data type is TINYINT
SQL Server version is 2014 Enterprise.
Thanks.
Abed
If you have complex index on 4 columns (A,B,C,D )then you could use queries which filter:
1) WHERE A=...
2) WHERE A=... AND B =...
3) WHERE A=... AND B =... AND C=....
3) WHERE A=... AND B =... AND C=.... AND D=...
You CAN'T skip lead portion of the index, if you will filter like this :
WHERE B= ... AND C= ... AND D=... (thus, skipping A) performance will be BAD.
TRY creating separate indexes on each column, they are more flexible.
I have created these two indexes based on how Index wizard suggested for different scenarios.
I am wondering if they are the same or are they different?
Index (1):
CREATE NONClustered Index [IX_Rec_RecoveryDate_RecoveryDateTypeKey_CaseId]
ON [Rec].[RecoveryDate] ([RecoveryDateTypeKey], [CurrentFlag])
INCLUDE (CaseId)
Index (2):
CREATE NONClustered Index [IX_Rec_RecoveryDate_currentFlag]
ON [Rec].[RecoveryDate] ([CurrentFlag])
INCLUDE (CaseId, RecoveryDateTypekey)
The question is, what queries are you trying to optimize?
These two indexes are very different. The first index would be great for a query like this:
SELECT CaseId
FROM Rec.RecoveryDate
WHERE RecoveryTypeKey = 5
AND CurrentFlag = 1 -- or whatever
The first one indexes the columns in the WHERE clause. SQL Server would be able to search for the specified RecoveryDateTypeKey and CurrentFlag. Since the CaseId is INCLUDEd in the index leaf nodes, SQL Server would not have to join back to the table to get it.
Using the same query, the second index would behave differently. If you are lucky, SQL Server would search for all records where CurrentFlag is 1. Then, it would traverse these leaf nodes looking for the matching RecoveryTypeKey. On the other hand, if there are a lot of records where the CurrentFlag is 1, SQL Server might choose to do an index scan instead.
Then again, if you are wanting to optimize a query like this:
SELECT CaseId, RecoveryTypeKey
FROM Rec.RecoveryDate
WHERE CurrentFlag = 1
The first index would be useless, because the CurrentFlag is the second column in the index. SQL Server wouldn't be able to search it for CurrentFlag = 1, so it would probably do an index scan.
they're different. Index 1 indexes two columns, Index 2 indexes one column but includes two columns in the nodes.
The difference basically means that if you search using the two columns, Index 1 could be much faster than Index 2.
Index 2 would be better than a normal index of the one column, because if you need the other column in your result, Index 2 already has the value so no lookup with the actual table would be needed.
The indexes store the same information (1 row per row in the Recovery table with the columns caseid, RecoveryDateTypeKey, CurrentFlag), but are organized in a different order and hence can be used for different queries.
The first index can handles where clauses such as
WHERE RecoveryDateTypeKey = #p1 --Prefix matching!
and
WHERE RecoveryDateTypeKey = #p1 AND CurrentFlag = #p2
The second index only handles
WHERE CurrentFlag = #p2
If CurrentFlag is a low cardinality column such as a bit, or a char(1) (Y/N), then I'd recommend filtered indexes.
CREATE INDEX IX_REC_Yes_fltr on Recovery (RecoveryDateTypeKey) WHERE (CurrentFlag = 'Y')
INCLUDE (CaseId) --Assumes that CurrentFlag = 'Y' is the most used value
--Maybe even a second one.
CREATE INDEX IX_REC_No_fltr on Recovery (RecoveryDateTypeKey) WHERE (CurrentFlag = 'N')
INCLUDE (CaseId) --Maybe handle CurrentFlag = 'N' as well.
Each filtered index only includes the values that meet the criteria, so combined they are the same size as the non-filtered index.