Indexes turn SQL query too slow - sql-server

I'm having a huge issue on a SQL query, after I added an index.
declare #DateFromCT date, #DateToCT date;
declare #DateFromCT2 date, #DateToCT2 date;
set dateformat dmy;
set #DateFromCT= '1/1/2015'; set #DateToCT= '31/3/2015';
set #DateFromCT2= '1/4/2015'; set #DateToCT2= '30/4/2015';
Select distinct CT.CodCliente,ct.codacesso FROM CT_Contabilidade CT
Inner join CD_PlanoContas PC ON CT.CodAcesso = PC.Cod
WHERE NOT exists (
SELECT 1 FROM ct_contabilidade CT2
WHERE CT2.CodAcesso = CT.CodAcesso
and CT2.Data between #DateFromCT2 and #DateToCT2
And ( CT2.CodEmpresa = 1) And CT2.codcliente = ct.codcliente )
and CT.Data between #DateFromCT and #DateToCT
AND PC.subgrupo = 'C'
And ( CT.CodEmpresa = 1 ) And ct.codCliente > 0
The CT_Contabilidade's PK is a Sequential (bigint identity), clustered index.
It has 1.5 million records.
Without other non-clustered indexes, it performs well, took less than 1 second. That's OK for me.
I create an index over the CodAcesso to match CD_PlanoContas key (cod);
The CD_PlanoContas PK (clustered index) is Cod.
It's still performing well. No notable difference...
So I create a index over the codCliente (since it also refers another table)
... And after this, the query is TOO slow; it is taking 7 or 8 MINUTES.
If I drop the CodAcesso index, it turn to be ok.
If I drop the CodCliente index, it is ok too.
If I let them both, but change the query , taking of the Inner Join with CD_Planocontas (and consequently , the filter "AND PC.subgrupo = 'C'") it is OK.
I can't imagine the indexes are causing the query to behave that way.
It's a HUGE difference, not just a "loss of performance". I try some other things, as take out each filter... not changed.
The execution plan suggests an index:
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[CT_Contabilidade] ([CodEmpresa],[Data],[CodCliente])
INCLUDE ([CodAcesso])
I created it, and the query works fine, even with the 2 other indexes (codCliente and codAcesso)
But I didn't like to create a specific index to this query (it's just one of many queries that uses these tables).
If runs well without no index, I think it should runs at least EQUAL with this 2 indexes.
What causes the performance to change so drastically? What do I need to change to speed things up?

try using an index optimizer hint to control which index is being used.
for example:
select *
from titles with (index (titleind))
where title = 'The Gourmet Microwave'
use the 'set statistics io on' command to see the number of pages being scanned with each query/index combo and use the 'rightclick/show execution plan' option to see how the query is being executed

It is not always good idea to follow the suggestion of execution plan.
I suggest you compare the execution plan before and after adding index and see the difference. Maybe that index cause SQL engine to choose a bad plan.
Also try update statistics on your table and index and see how if affects.

Related

Why is columnstore index not being used

I have a non-clustered columnstore index on all columns a 40m record non-memory optimized table on SQL Server 2016 Enterprise Edition.
A query forcing the use of the columnstore index will perform significantly faster but the optimizer continues to choose to use the clustered index and other non-clustered indexes. I have lots of available RAM and am using appropriate queries against a dimensional model.
Why won't the optimizer choose the columnstoreindex? And how can I encourage its use (without using a hint)?
Here is a sample query not using columnstore:
SELECT
COUNT(*),
SUM(TradeTurnover),
SUM(TradeVolume)
FROM DWH.FactEquityTrade e
--with (INDEX(FactEquityTradeNonClusteredColumnStoreIndex))
JOIN DWH.DimDate d
ON e.TradeDateId = d.DateId
JOIN DWH.DimInstrument i
ON i.instrumentid = e.instrumentid
WHERE d.DateId >= 20160201
AND i.instrumentid = 2
It takes 7 seconds without hint and a fraction of a second with the hint.
The query plan without the hint is here.
The query plan with the hint is here.
The create statement for the columnstore index is:
CREATE NONCLUSTERED COLUMNSTORE INDEX [FactEquityTradeNonClusteredColumnStoreIndex] ON [DWH].[FactEquityTrade]
(
[EquityTradeID],
[InstrumentID],
[TradingSysTransNo],
[TradeDateID],
[TradeTimeID],
[TradeTimestamp],
[UTCTradeTimeStamp],
[PublishDateID],
[PublishTimeID],
[PublishedDateTime],
[UTCPublishedDateTime],
[DelayedTradeYN],
[EquityTradeJunkID],
[BrokerID],
[TraderID],
[CurrencyID],
[TradePrice],
[BidPrice],
[OfferPrice],
[TradeVolume],
[TradeTurnover],
[TradeModificationTypeID],
[InColumnStore],
[TradeFileID],
[BatchID],
[CancelBatchID]
)
WHERE ([InColumnStore]=(1))
WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
GO
Update. Plan using Count(EquityTradeID) instead of Count(*)
and with hint included
You're asking SQL Server to choose a complicated query plan over a simple one. Note that when using the hint, SQL Server has to concatenate the columnstore index with a rowstore non-clustered index (IX_FactEquiteTradeInColumnStore). When using just the rowstore index, it can do a seek (I assume TradeDateId is the leading column on that index). It does still have to do a key lookup, but it's simpler.
I can see two options to get this behavior without a hint:
First, remove InColumnStore from the columnstore index definition and cover the entire table. That's what you're asking from the columnstore - to cover everything.
If that's not possible, you can use a UNION ALL to explicitly split the data:
WITH workaround
AS (
SELECT TradeDateId
, instrumentid
, TradeTurnover
, TradeVolume
FROM DWH.FactEquityTrade
WHERE InColumnStore = 1
UNION ALL
SELECT TradeDateId
, instrumentid
, TradeTurnover
, TradeVolume
FROM DWH.FactEquityTrade
WHERE InColumnStore = 0 -- Assuming this is a non-nullable BIT
)
SELECT COUNT(*)
, SUM(TradeTurnover)
, SUM(TradeVolume)
FROM workaround e
JOIN DWH.DimDate d
ON e.TradeDateId = d.DateId
JOIN DWH.DimInstrument i
ON i.instrumentid = e.instrumentid
WHERE d.DateId >= 20160201
AND i.instrumentid = 2;
Your index is a filtered index (it has a WHERE predicate).
Optimizer would use such index only when the query's WHERE matches the index's WHERE. This is true for classic indexes and most likely true for columnstore indexes. There can be other limitations when optimizer would not use filtered index.
So, either add WHERE ([InColumnStore]=(1)) to your query, or remove it from the index definition.
You said in the comments: "the InColumnStore filter is for efficiency when loading data. For all tests so far the filter covers 100% of all rows". Does "all rows" here mean "all rows of the whole table" or just "all rows of the result set"? Anyway, most likely optimizer doesn't know that (even though it could have derived that from statistics), which means that the plan which uses such index has to explicitly do extra checks/lookups, which optimizer considers too expensive.
Here are few articles on this topic:
Why isn’t my filtered index being used? by
Rob Farley
Optimizer Limitations with Filtered Indexes by Paul White.
An Unexpected Side-Effect of Adding a Filtered Index by Paul White.
How filtered indexes could be a more powerful feature by Aaron Bertrand, see the section Optimizer Limitations.
Try this one:
Bridge your query
Select *
Into #DimDate
From DWH.DimDate
WHERE DateId >= 20160201
Select COUNT(1), SUM(TradeTurnover), SUM(TradeVolume)
From DWH.FactEquityTrade e
Inner Join DWH.DimInstrument i ON i.instrumentid = e.instrumentid
And i.instrumentid = 2
Left Join #DimDate d ON e.TradeDateId = d.DateId
How fast this query running ?

How do indexes work behind the scenes

Im a begginer. I know indexes are necessary for performance boosts, but i want to know how they actually work behind the scenes. Beforehand, I used to think that we should make indexes on those columns which are included in where clause (which I realized is wrong)
For example, SELECT * from MARKS where marks_obtained > 50
Consider that there's a clustered index on primary key of this table and I created a non-clustered index on marks_obtained column as its there in my where clause.
My perception: So the leaf nodes will be containing pointers to clustered index and as clustered index points to actual rows, it will select entire rows (due to asteric in my query)
Scenario
I came across following query (from AdventureWorks DB on which a non-clustered index was created) which works fine and took less than a second to execute 3200000 rows until a new column was inserted into it:
Query
SELECT x.*
INTO#X
FROM dbo.bigProduct AS p
CROSS APPLY
(
SELECT TOP 1000 *
FROM dbo.bigTransactionHistory AS bth
WHERE
bth.ProductId = p.bth.ProductId
ORDER BY
TransactionDate DESC
) AS x
WHERE
p.ProductId BETWEEN 1000 AND 7500
GO
NEW INSERTED COLUMN
ALTER TABLE dbo.bigTransactionHistory
ADD CustomerId INT NULL
After insertion of above column it took 17 seconds! means 17 times slower. A non-clusered index was now missing CustomerId column in the index. Just after including CustomerId, problem was gone.
Question CustomerId seemed to be the culprit until it was added to the index. BUT HOW???
The execution plan would answer this but I'll make a guess: The non-clustered index was no longer enough to satisfy the query after the additional column had been added. This can cause the index to not be used anymore. It also can cause one clustered index seek per row.
Learn to read execution plans. Turn on the "actual execution plan" feature routinely for each query that you test.

Optimize SQL in MS SQL Server that returns more than 90% of records in the table

I have the below sql
SELECT Cast(Format(Sum(COALESCE(InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet,
BP.BoundProjectId AS ProjectId
FROM BoundProducts BP
WHERE ( BP.IsDeleted IS NULL
OR BP.IsDeleted = 0 )
GROUP BY BP.BoundProjectId
I already have an index on the table BoundProducts on this column order (BoundProjectId, IsDeleted)
Currently this query takes around 2-3 seconds to return the result. I am trying to reduce it to zero seconds.
This query returns 25077 rows as of now.
Please provide me any ideas to improvise the query.
Looking at this in a bit different point of view, I can think that your OR condition is screwing up your query, why not to rewrite it like this?
SELECT CAST(FORMAT(SUM(COALESCE(BP.InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet
, BP.BoundProjectId AS ProjectId
FROM (
SELECT BP.BoundProjectId, BP.InstalledSubtotal
FROM dbo.BoundProducts AS BP
WHERE BP.IsDeleted IS NULL
UNION ALL
SELECT BP.BoundProjectId, BP.InstalledSubtotal
FROM dbo.BoundProducts AS BP
WHERE BP.IsDeleted = 0
) AS BP
GROUP BY BP.BoundProjectId;
I've had better experience with UNION ALL rather than OR.
I think it should work totally the same. On top of that, I'd create this index:
CREATE NONCLUSTERED INDEX idx_BoundProducts_IsDeleted_BoundProjectId_iInstalledSubTotal
ON dbo.BoundProducts (IsDeleted, BoundProjectId)
INCLUDE (InstalledSubTotal);
It should satisfy your query conditions and seek index quite well. I know it's not a good idea to index bit fields, but it's worth trying.
P.S. Why not to default your IsDeleted column value to 0 and make it NOT NULLABLE? By doing that, it should be enough to do a simple check WHERE IsDeleted = 0, that'd boost your query too.
If you really want to try index seek, it should be possible using query hint forceseek, but I don't think it's going to make it any faster.
The options I suggested last time are still valid, remove format and / or create an indexed view.
You should also test if the problem is the query itself or just displaying the results after that, for example trying it with "select ... into #tmp". If that's fast, then the problem is not the query.
The index name in the screenshot is not the same as in create table statement, but I assume that's just a name you changed for the question. If the scan is happening to another index, then you should include that too.

Query slow for certain criteria on clustered index

I have a table called readings that has > 76 million rows in it that I'm running this query on:
declare #tunnel_id int = 13
SELECT TOP 1 local_time, recorded_time
FROM readings
WHERE tunnel_id = #tunnel_id
ORDER BY id DESC
The id column is a bigint, set as the primary key, and has a clustered index, and there is also an index on the tunnel_id field.
The works great and returns in less than a second for about 16 out of the 20 different tunnel_id's I'm trying. However, on the last 4 or so the query takes 40 seconds and uses hundreds of thousands of reads.
I tried modifying the query into this:
SELECT TOP (1) local_time, recorded_time
FROM readings
where id = (
SELECT TOP 1 id
FROM readings
WHERE tunnel_id = 13
ORDER BY id DESC
)
Which once again is only slow for a few tunnel_id's. What perplexes me more is that the inner select runs quickly for the slow id's and if I hardcode the maximum id instead of the subquery it also runs quickly.
What am I missing here that's making this query perform poorly?
Edit for comments:
Tunnel_id is not unique, each tunnel has multiple millions of rows. This is running on Sql Server 2012.
I included the actual execution plans from both the fast and slow runs and they are identical.
Fast:
Slow:
But as you can see, the first executes in less than a second while the second takes 51 seconds.
The plan basically scans the entire clustered index from start to end and looks for the first row with tunnel_id = #tunnel_id.
My educated guess is that the 'slow' tunnels don't have any rows in the beginning of the clustered index and so it has to scan more of it.
This non-clustered index should speed things up:
CREATE NONCLUSTERED INDEX [IX_FOO] ON [readings]
(
tunnel_id,
ID
)
INCLUDE
(
local_time,
recorded_time
)
This could replace the existing index on tunnel_id.
The interesting part here is that SQL isn't using the index in tunnel_id at all and is just scanning the table in whole, which is slow if it's big like 76 millions rows.
I think the real cause it isn't using it is because the ordering by id, as it must perform a lookup and then an additional sorting. I doubt at first that parameter sniffing is the main problem here.
I would try to change the index instead, and make it covering. If possible include in the index the local time, recorded time and the id (not 100% sure if it's needed as it's the cluster key anyway).
CREATE NONCLUSTERED INDEX IX_tunnel_id ON dbo.readings (tunnel_id) INCLUDE (id, local_time, recorded_time)
Note that, while this can improve this particular query, it will make inserts and updates a little slower, and require additional storage space.
Just found that you can hint to use the tunnel_id index:
declare #tunnel_id int = 13
SELECT TOP 1 local_time, recorded_time
FROM readings
WITH (INDEX(idx_tunnel_id))
WHERE tunnel_id = #tunnel_id
ORDER BY id DESC
which works as expected and returns in less than 1 second.

UPDATE slow when setting column to NULL

I have a SQL Server 2008 table with 80,000 rows and am executing the following query:
UPDATE dbo.TableName WITH (ROWLOCK)
SET HelloWorldID = NULL
WHERE HelloWorldID = #helloWorldID
HelloWorldID is an int and the #helloWorldID parameter is also int.
The query is taking too long and I'd like to optimize it. I created a nonclustered index on HelloWorldID but it didn't matter. I may have to redesign this...maybe put the HelloWorldID on another table that links it to the TableName table?
Since the command you're waiting on is DELETE I have to guess that there is a trigger on dbo.TableName and that it is performing additional work that you do not expect. Or perhaps some CASCADE option that is affecting other tables that have triggers on them.
It all depends on how much rows will be updated by this query.
If you're updating a lot of rows, say 30% of the table, then the index will actually slow down the query (as index will be updated along with the table, and it won't help with filtering the rows for update). Also ROWLOCK will slow it down, because the engine will issue a separate lock for each row (as opposed to pagelocks that would occur normally).
Try removing the index and running this update using WITH(TABLOCK) just to see what happens.
I get this problem sometimes. Your query is dependent upon simultaneously getting a write-lock on every row in the table meeting the conditions of the WHERE-Clause . Depending on your needs for full 'ACID', you could do something like this:
SELECT getdate() -- force ##rowcount=1
while ##rowcount > 0
UPDATE TOP (1000) dbo.TableName
SET HelloWorldID = NULL
WHERE HelloWorldID = #helloWorldID
This will do the update is smaller chunks, and help overcome locking issues. But remember, this-method gives up on doing this-query as a single-transaction. You will need to tune the 1000 to a value that is right for your server.

Resources