Query slow for certain criteria on clustered index - sql-server

I have a table called readings that has > 76 million rows in it that I'm running this query on:
declare #tunnel_id int = 13
SELECT TOP 1 local_time, recorded_time
FROM readings
WHERE tunnel_id = #tunnel_id
ORDER BY id DESC
The id column is a bigint, set as the primary key, and has a clustered index, and there is also an index on the tunnel_id field.
The works great and returns in less than a second for about 16 out of the 20 different tunnel_id's I'm trying. However, on the last 4 or so the query takes 40 seconds and uses hundreds of thousands of reads.
I tried modifying the query into this:
SELECT TOP (1) local_time, recorded_time
FROM readings
where id = (
SELECT TOP 1 id
FROM readings
WHERE tunnel_id = 13
ORDER BY id DESC
)
Which once again is only slow for a few tunnel_id's. What perplexes me more is that the inner select runs quickly for the slow id's and if I hardcode the maximum id instead of the subquery it also runs quickly.
What am I missing here that's making this query perform poorly?
Edit for comments:
Tunnel_id is not unique, each tunnel has multiple millions of rows. This is running on Sql Server 2012.
I included the actual execution plans from both the fast and slow runs and they are identical.
Fast:
Slow:
But as you can see, the first executes in less than a second while the second takes 51 seconds.

The plan basically scans the entire clustered index from start to end and looks for the first row with tunnel_id = #tunnel_id.
My educated guess is that the 'slow' tunnels don't have any rows in the beginning of the clustered index and so it has to scan more of it.
This non-clustered index should speed things up:
CREATE NONCLUSTERED INDEX [IX_FOO] ON [readings]
(
tunnel_id,
ID
)
INCLUDE
(
local_time,
recorded_time
)
This could replace the existing index on tunnel_id.

The interesting part here is that SQL isn't using the index in tunnel_id at all and is just scanning the table in whole, which is slow if it's big like 76 millions rows.
I think the real cause it isn't using it is because the ordering by id, as it must perform a lookup and then an additional sorting. I doubt at first that parameter sniffing is the main problem here.
I would try to change the index instead, and make it covering. If possible include in the index the local time, recorded time and the id (not 100% sure if it's needed as it's the cluster key anyway).
CREATE NONCLUSTERED INDEX IX_tunnel_id ON dbo.readings (tunnel_id) INCLUDE (id, local_time, recorded_time)
Note that, while this can improve this particular query, it will make inserts and updates a little slower, and require additional storage space.

Just found that you can hint to use the tunnel_id index:
declare #tunnel_id int = 13
SELECT TOP 1 local_time, recorded_time
FROM readings
WITH (INDEX(idx_tunnel_id))
WHERE tunnel_id = #tunnel_id
ORDER BY id DESC
which works as expected and returns in less than 1 second.

Related

Select query takes long time even for 10 records from a very big records table

I have a very big records table about 6.5 million records. When I try to select some records from it even 10 records I have to wait a long random time.
SELECT [Column1], [Column2], [Column3], [Column4], [Column5]
FROM [table]
WHERE deviceDataId = '640'
ORDER BY id ASC OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY
My database is deployed on azure I also download and deploy it local system but it takes same time.
Query execution plan:
Now that we have your real query we can see that it appears that you have no index on your column deviceDataId; meaning that the entire table needs to be scanned. As such, even though you only want 10 rows, all 6.5M rows need to be scanned and the value of deviceDataId checked.
If you create an index on your column deviceDataId and minimally INCLUDE the others in your query, then you'll have a covering index which'll greatly help. This also assumes id is the column your CLUSTERED INDEX is ordered on.
CREATE NONCLUSTERED INDEX IX_Table_DeviceTableID_Cols1_5
ON dbo.[table] (deviceDataId)
INCLUDE ([Column1], [Column2], [Column3], [Column4], [Column5]);
Also, as I note in the comments, if deviceDataId is an int, use an int value in your WHERE clause. Don't wrap numerical values in single quotes, that is for literal strings.
WHERE deviceDataId = 640

SQL Actual Execution Plan with Sort took high cost

I have a table named DocumentItem with Id column was clustered index (primary key).
Please see these two query strings:
Query 1 (not use order by):
select *
from DocumentItem
where (HistoryCreateDate >= '2019-09-04 05:00:00' AND HistoryCreateDate <= '2019-12-04 05:00:00') and ActNodeState>140100
The result took: 00:00:09 with 168.357 rows.
Query 2 (used order by):
select *
from DocumentItem
where (HistoryCreateDate >= '2019-09-04 05:00:00' AND HistoryCreateDate <= '2019-12-04 05:00:00') and ActNodeState>140100 order by Id
The result took: 00:02:41 with 168.357 rows.
Here is the actual execution plan:
Why it took so long in the 2nd query?
SQL Server has decided that your index IX_HistoryCreateDate (not sure of the full name) is sufficiently selective that it will use it to find the rows that it needs. However, that index isn't sorted on the ID column. It does include the ID column already (whether you specified it or not) because it's the clustering key.
I'd suggest recreating your IX_HistoryCreateDate index like this:
CREATE INDEX IX_HistoryCreateDate ON DocumentItem
( HistoryCreateDate, ID)
INCLUDE (ActNodeState);
And I think you'll be fine. It's still not going to be great and it will have to do a large number of lookups, because your query uses SELECT *. Do you really need all columns returned? If so, and you do this all the time, you might consider reclustering the table in the order that you need.

Indexes turn SQL query too slow

I'm having a huge issue on a SQL query, after I added an index.
declare #DateFromCT date, #DateToCT date;
declare #DateFromCT2 date, #DateToCT2 date;
set dateformat dmy;
set #DateFromCT= '1/1/2015'; set #DateToCT= '31/3/2015';
set #DateFromCT2= '1/4/2015'; set #DateToCT2= '30/4/2015';
Select distinct CT.CodCliente,ct.codacesso FROM CT_Contabilidade CT
Inner join CD_PlanoContas PC ON CT.CodAcesso = PC.Cod
WHERE NOT exists (
SELECT 1 FROM ct_contabilidade CT2
WHERE CT2.CodAcesso = CT.CodAcesso
and CT2.Data between #DateFromCT2 and #DateToCT2
And ( CT2.CodEmpresa = 1) And CT2.codcliente = ct.codcliente )
and CT.Data between #DateFromCT and #DateToCT
AND PC.subgrupo = 'C'
And ( CT.CodEmpresa = 1 ) And ct.codCliente > 0
The CT_Contabilidade's PK is a Sequential (bigint identity), clustered index.
It has 1.5 million records.
Without other non-clustered indexes, it performs well, took less than 1 second. That's OK for me.
I create an index over the CodAcesso to match CD_PlanoContas key (cod);
The CD_PlanoContas PK (clustered index) is Cod.
It's still performing well. No notable difference...
So I create a index over the codCliente (since it also refers another table)
... And after this, the query is TOO slow; it is taking 7 or 8 MINUTES.
If I drop the CodAcesso index, it turn to be ok.
If I drop the CodCliente index, it is ok too.
If I let them both, but change the query , taking of the Inner Join with CD_Planocontas (and consequently , the filter "AND PC.subgrupo = 'C'") it is OK.
I can't imagine the indexes are causing the query to behave that way.
It's a HUGE difference, not just a "loss of performance". I try some other things, as take out each filter... not changed.
The execution plan suggests an index:
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[CT_Contabilidade] ([CodEmpresa],[Data],[CodCliente])
INCLUDE ([CodAcesso])
I created it, and the query works fine, even with the 2 other indexes (codCliente and codAcesso)
But I didn't like to create a specific index to this query (it's just one of many queries that uses these tables).
If runs well without no index, I think it should runs at least EQUAL with this 2 indexes.
What causes the performance to change so drastically? What do I need to change to speed things up?
try using an index optimizer hint to control which index is being used.
for example:
select *
from titles with (index (titleind))
where title = 'The Gourmet Microwave'
use the 'set statistics io on' command to see the number of pages being scanned with each query/index combo and use the 'rightclick/show execution plan' option to see how the query is being executed
It is not always good idea to follow the suggestion of execution plan.
I suggest you compare the execution plan before and after adding index and see the difference. Maybe that index cause SQL engine to choose a bad plan.
Also try update statistics on your table and index and see how if affects.

How do indexes work behind the scenes

Im a begginer. I know indexes are necessary for performance boosts, but i want to know how they actually work behind the scenes. Beforehand, I used to think that we should make indexes on those columns which are included in where clause (which I realized is wrong)
For example, SELECT * from MARKS where marks_obtained > 50
Consider that there's a clustered index on primary key of this table and I created a non-clustered index on marks_obtained column as its there in my where clause.
My perception: So the leaf nodes will be containing pointers to clustered index and as clustered index points to actual rows, it will select entire rows (due to asteric in my query)
Scenario
I came across following query (from AdventureWorks DB on which a non-clustered index was created) which works fine and took less than a second to execute 3200000 rows until a new column was inserted into it:
Query
SELECT x.*
INTO#X
FROM dbo.bigProduct AS p
CROSS APPLY
(
SELECT TOP 1000 *
FROM dbo.bigTransactionHistory AS bth
WHERE
bth.ProductId = p.bth.ProductId
ORDER BY
TransactionDate DESC
) AS x
WHERE
p.ProductId BETWEEN 1000 AND 7500
GO
NEW INSERTED COLUMN
ALTER TABLE dbo.bigTransactionHistory
ADD CustomerId INT NULL
After insertion of above column it took 17 seconds! means 17 times slower. A non-clusered index was now missing CustomerId column in the index. Just after including CustomerId, problem was gone.
Question CustomerId seemed to be the culprit until it was added to the index. BUT HOW???
The execution plan would answer this but I'll make a guess: The non-clustered index was no longer enough to satisfy the query after the additional column had been added. This can cause the index to not be used anymore. It also can cause one clustered index seek per row.
Learn to read execution plans. Turn on the "actual execution plan" feature routinely for each query that you test.

Why this query is running so slow?

This query runs very fast (<100 msec):
SELECT TOP (10)
[Extent2].[CompanyId] AS [CompanyId]
,[Extent1].[Id] AS [Id]
,[Extent1].[Status] AS [Status]
FROM [dbo].[SplittedSms] AS [Extent1]
INNER JOIN [dbo].[Sms] AS [Extent2]
ON [Extent1].[SmsId] = [Extent2].[Id]
WHERE [Extent2].[CompanyId] = 4563
AND ([Extent1].[NotifiedToClient] IS NULL)
If I add just a time filter, it takes too long (22 seconds!):
SELECT TOP (10)
[Extent2].[CompanyId] AS [CompanyId]
,[Extent1].[Id] AS [Id]
,[Extent1].[Status] AS [Status]
FROM [dbo].[SplittedSms] AS [Extent1]
INNER JOIN [dbo].[Sms] AS [Extent2]
ON [Extent1].[SmsId] = [Extent2].[Id]
WHERE [Extent2].Time > '2015-04-10'
AND [Extent2].[CompanyId] = 4563
AND ([Extent1].[NotifiedToClient] IS NULL)
I tried adding an index on the [Time] column of the Sms table, but the optimizer seems not using the index. Tried using With (index (Ix_Sms_Time)); but to my surprise, it takes even more time (29 seconds!).
Here is the actual execution plan:
The execution plan is same for both queries. Tables mentioned here have 5M to 8M rows (indices are < 1% fragmented and stats are updated). I am using MS SQL Server 2008R2 on a 16core 32GB memory Windows 2008 R2 machine)
Does it help when you force the time filter to kick in only after the client filter has run?
FI like in this example:
;WITH ClientData AS (
SELECT
[E2].[CompanyId]
,[E2].[Time]
,[E1].[Id]
,[E1].[Status]
FROM [dbo].[SplittedSms] AS [E1]
INNER JOIN [dbo].[Sms] AS [E2]
ON [E1].[SmsId] = [E2].[Id]
WHERE [E2].[CompanyId] = 4563
AND ([E1].[NotifiedToClient] IS NULL)
)
SELECT TOP 10
[CompanyId]
,[Id]
,[Status]
FROM ClientData
WHERE [Time] > '2015-04-10'
Create an index on Sms with the following Index Key Columns (in this order):
CompanyID
Time
You may or may not need to add Id as an Included Column.
What datatype is your Time column?
If it's datetime, try converting your '2015-04-10' into equivalent data-type, so that it can use the index.
Declare #test datetime
Set #test='2015-04-10'
Then modify your condition:
[Extent2].Time > #test
The sql server implicitly casts to matching data-type if there is a data-type mismatch. And any function or cast operation prevent using indexes.
I'm on the same track with #JonTirjan, the index with just Time results into a lot of key lookups, so you should try at least following:
create index xxx on Sms (Time, CompanyId) include (Id)
or
create index xxx on Sms (CompanyId, Time) include (Id)
If Id is your clustered index, then it's not needed in include clause. If significant part of your data belongs to CompanyID 4563, it might be ok to have it as include column too.
The percentages you see in actual plan are just estimates based on the row count assumptions, so those are sometimes totally wrong. Looking at actual number of rows / executions + statistics IO output should give you idea what's actually happening.
Two things come to mind:
By adding an extra restriction it will be 'harder' for the database to find the first 10 items that match your restrictions. Finding the first 10 rows from let's say 10.000 items (from a total of 1 milion) is a easier then finding the first 10 rows from maybe 100 items (from a total of 1 milion).
The index is not being used probably because the index is created on a datetime column, which is not very efficient if you are also storing the time in them. You might want to create a clustered index on the [time] column (but then you would have to remove the clustered index which is now on the [CompanyId] column or you could create a computed column which stores the date-part of the [time] column, create an index on this computed column and filter on this column.
I found out that there was no index on the foreign key column (SmsId) on the SplittedSms table. I made one and it seems the second query is almost as fast as the first one now.
The execution plan now:
Thanks everyone for the effort.

Resources