SQL server grouping on NULLABLE column - sql-server

I have a situation in SQL Server (with a legacy DB) that i can't understand why?
I have a table A (about 2 million rows) that have column CODE that allow null. The number rows that have CODE = NULL is just several (< 10 rows). When i run the query:
select code, sum(C1)
from A
-- where code is not null
group by code;
It runs forever. But when i un-comment the where clause, it took around 1.5s (still too slow, right?)
Could anyone here help me pointing out what are the possible causes for such situation?
Execution plan add:

As a general rule, NULL values cannot be stored by a conventional index. So even if you have an index on code, your WHERE condition cannot benefit from that index.
If C1 is included in the index (which I assume is NOT NULL), things are different, because all the tuples (code=NULL, C1=(some value)) can and will be indexed. These are few, according to your question; so SQL Server can get a considerable speedup by just returning the rows for all these tuples.

First of all, a few words about performance. We have a several variants in your case.
Indexes View -
IF OBJECT_ID('dbo.t', 'U') IS NOT NULL
DROP TABLE dbo.t
GO
CREATE TABLE dbo.t (
ID INT IDENTITY PRIMARY KEY,
Code VARCHAR(10) NULL,
[Status] INT NULL
)
GO
ALTER VIEW dbo.v
WITH SCHEMABINDING
AS
SELECT Code, [Status] = SUM(ISNULL([Status], 0)), Cnt = COUNT_BIG(*)
FROM dbo.t
WHERE Code IS NOT NULL
GROUP BY Code
GO
CREATE UNIQUE CLUSTERED INDEX ix ON dbo.v (Code)
SELECT Code, [Status]
FROM dbo.v
Filtered Index -
CREATE NONCLUSTERED INDEX ix ON dbo.t (Code)
INCLUDE ([Status])
WHERE Code IS NOT NULL
Will wait your second execution plan.

Related

Why would a Persisted Computed Column that is stored in a Covering Index NOT be used for Updates in SQL Server?

I am seeing an odd behavior in SQL Server that doesn't make any sense. I have a PERSISTED Computed Column that is stored in a Covering Index. However, when this computed column in the covering index needs to be referenced in order to update another column in another table, the optomizer will choose not to use it at all and instead do a Key Lookup to get the values from the clustered index. Why?
The UPDATE statement at the very end of this demo script works as expected, but if you look closely at the execution plan for it, the update does NOT use the covering index IX_MyTable_VarcharValue1_ComputedColumn. Instead, it will do a Key Lookup and go back to the clustered index to get VarcharValue2, even tough the ComputedColumn that needs for the update is literally already there! In my mind PERSISTED means persisted to disk. So why isn't it using the value when it looks at the non-clustered index the first time to get VarcharValue1? Is this not extra work doing the Key Lookup?
CREATE TABLE dbo.MyTable (
[ID] INT NOT NULL
, [VarcharValue1] VARCHAR(50) NOT NULL
, [NotComputedColumn] VARCHAR(50) NULL
CONSTRAINT [PK_MyTable]
PRIMARY KEY CLUSTERED(ID ASC)
) ON [PRIMARY];
CREATE NONCLUSTERED INDEX IX_MyTable_VarcharValue1
ON dbo.MyTable ([VarcharValue1] ASC);
CREATE TABLE dbo.ComputedColumnTable (
[ID] INT NOT NULL
, [VarcharValue1] VARCHAR(50) NOT NULL
, [VarcharValue2] VARCHAR(50) NOT NULL
, [ComputedColumn] AS [VarcharValue1] + [VarcharValue2] PERSISTED NOT NULL
CONSTRAINT [PK_ComputedColumnTable]
PRIMARY KEY CLUSTERED(ID ASC)
) ON [PRIMARY];
CREATE NONCLUSTERED INDEX IX_MyTable_VarcharValue1_ComputedColumn
ON dbo.ComputedColumnTable ([VarcharValue1] ASC, [ComputedColumn] ASC);
INSERT INTO dbo.MyTable VALUES(1,'e',NULL)
INSERT INTO dbo.MyTable VALUES(2,'d',NULL)
INSERT INTO dbo.MyTable VALUES(3,'c',NULL)
INSERT INTO dbo.MyTable VALUES(4,'b',NULL)
INSERT INTO dbo.MyTable VALUES(5,'a',NULL)
INSERT INTO dbo.ComputedColumnTable VALUES(1,'a','b')
INSERT INTO dbo.ComputedColumnTable VALUES(2,'b','c')
INSERT INTO dbo.ComputedColumnTable VALUES(3,'c','d')
INSERT INTO dbo.ComputedColumnTable VALUES(4,'d','e')
INSERT INTO dbo.ComputedColumnTable VALUES(5,'e','f')
SELECT * FROM dbo.MyTable
SELECT * FROM dbo.ComputedColumnTable
-- uses a Key Lookup to get VarcharValue2 instead of the ComputedColumn in the covering index
UPDATE m
SET m.NotComputedColumn = c.ComputedColumn
FROM MyTable m
JOIN ComputedColumnTable c
ON m.VarcharValue1 = c.VarcharValue1
Edit: Adding link for Execution Plan: https://www.brentozar.com/pastetheplan/?id=Hkk_MZ8JK
The clustered index update operator modifies the clustered index pages so the optimizer decided to expand the definition of the persisted column definition.
It can come as quite a shock to see SQL Server recomputing the
underlying expression each time while ignoring the
deliberately-provided stored value
Properly Persisted Computed Columns
In this case, the optimizer required the VarcharValue2 to demand a new calculation from the compute scalar operator. When we look at the compute scalar operator it re-calculates the persisted computed column value. This situation seems in the output of the ScalarString attribute.
On the other hand, we can avoid the operator key lookup operation to eliminating the Stream Aggregate but this time optimizer will decide to perform a Clustered Index Scan on the ComputedColumnTable. Because it still requires the VarcharValue2 column but only the optimizer changes the data access methods.
--Don't use this options in the production database
DBCC TRACEON (3604);
DBCC RULEOFF('GbAggToStrm')
GO
UPDATE m
SET m.NotComputedColumn = c.ComputedColumn
FROM MyTable m
JOIN ComputedColumnTable c
ON m.VarcharValue1 = c.VarcharValue1
DBCC RULEON('GbAggToStrm')
As a result the VarcharValue2 column requirement did not change.
What we can do: To resolve this situation we can add the VarcharValue2IX_MyTable_VarcharValue1_ComputedColumn index definition.So that, we can eliminate the key lookup operation.

Does SQL Server allow including a computed column in a non-clustered index? If not, why not?

When a column is included in non-clustered index, SQL Server copies the values for that column from the table into the index structure (B+ tree). Included columns don't require table look up.
If the included column is essentially a copy of original data, why does not SQL Server also allow including computed columns in the non-clustered index - applying the computations when it is copying/updating the data from table to index structure? Or am I just not getting the syntax right here?
Assume:
DateOpened is datetime
PlanID is varchar(6)
This works:
create nonclustered index ixn_DateOpened_CustomerAccount
on dbo.CustomerAccount(DateOpened)
include(PlanID)
This does not work with left(PlanID, 3):
create nonclustered index ixn_DateOpened_CustomerAccount
on dbo.CustomerAccount(DateOpened)
include(left(PlanID, 3))
or
create nonclustered index ixn_DateOpened_CustomerAccount
on dbo.CustomerAccount(DateOpened)
include(left(PlanID, 3) as PlanType)
My use case is somewhat like below query.
select
case
when left(PlanID, 3) = '100' then 'Basic'
else 'Professional'
end as 'PlanType'
from
CustomerAccount
where
DateOpened between '2016-01-01 00:00:00.000' and '2017-01-01 00:00:00.000'
The query cares only for the left 3 of PlanID and I was wondering instead of computing it every time the query runs, I would include left(PlanID, 3) in the non-clustered index so the computations are done when the index is built/updated (fewer times) instead at the query time (frequently)
EDIT: We use SQL Server 2014.
As Laughing Vergil stated - you CAN index persisted columns provided that they are persisted. You have a few options, here's a couple:
Option 1: Create the column as PERSISTED then index it
(or, in your case, include it in the index)
First the sample data:
CREATE TABLE dbo.CustomerAccount
(
PlanID int PRIMARY KEY,
DateOpened datetime NOT NULL,
First3 AS LEFT(PlanID,3) PERSISTED
);
INSERT dbo.CustomerAccount (PlanID, DateOpened)
VALUES (100123, '20160114'), (100999, '20151210'), (255657, '20150617');
and here's the index:
CREATE NONCLUSTERED INDEX nc_CustomerAccount ON dbo.CustomerAccount(DateOpened)
INCLUDE (First3);
Now let's test:
-- Note: IIF is available for SQL Server 2012+ and is cleaner
SELECT PlanID, PlanType = IIF(First3 = 100, 'Basic', 'Professional')
FROM dbo.CustomerAccount;
Execution Plan:
As you can see- the optimizer picked the nonclustered index.
Option #2: Perform the CASE logic inside your table DDL
First the updated table structure:
DROP TABLE dbo.CustomerAccount;
CREATE TABLE dbo.CustomerAccount
(
PlanID int PRIMARY KEY,
DateOpened datetime NOT NULL,
PlanType AS
CASE -- NOTE: casting as varchar(12) will make the column a varchar(12) column:
WHEN LEFT(PlanID,3) = 100 THEN CAST('Basic' AS varchar(12))
ELSE 'Professional'
END
PERSISTED
);
INSERT dbo.CustomerAccount (PlanID, DateOpened)
VALUES (100123, '20160114'), (100999, '20151210'), (255657, '20150617');
Notice that I use CAST to assign the data type, the table will be created with this column as varchar(12).
Now the index:
CREATE NONCLUSTERED INDEX nc_CustomerAccount ON dbo.CustomerAccount(DateOpened)
INCLUDE (PlanType);
Let's test again:
SELECT DateOpened, PlanType FROM dbo.CustomerAccount;
Execution plan:
... again, it used the nonclustered index
A third option, which I don't have time to go into, would be to create an indexed view. This would be a good option for you if you were unable to change your existing table structure.
SQL Server 2014 allows creating indexes on computed columns, but you're not doing that -- you're attempting to create the index directly on an expression. This is not allowed. You'll have to make PlanType a column first:
ALTER TABLE dbo.CustomerAccount ADD PlanType AS LEFT(PlanID, 3);
And now creating the index will work just fine (if your SET options are all correct, as outlined here):
CREATE INDEX ixn_DateOpened_CustomerAccount ON CustomerAccount(DateOpened) INCLUDE (PlanType)
It is not required that you mark the column PERSISTED. This is required only if the column is not precise, which does not apply here (this is a concern only for floating-point data).
Incidentally, the real benefit of this index is not so much that LEFT(PlanType, 3) is precalculated (the calculation is inexpensive), but that no clustered index lookup is needed to get at PlanID. With an index only on DateOpened, a query like
SELECT PlanType FROM CustomerAccounts WHERE DateOpened >= '2012-01-01'
will result in an index seek on CustomerAccounts, followed by a clustered index lookup to get PlanID (so we can calculate PlanType). If the index does include PlanType, the index is covering and the extra lookup disappears.
This benefit is relevant only if the index is truly covering, however. If you select other columns from the table, an index lookup is still required and the included computed column is only taking up space for little gain. Likewise, suppose that you had multiple calculations on PlanID or you needed PlanID itself as well -- in this case it would make much more sense to include PlanID directly rather than PlanType.
Computed columns are only allowed in indexes if they are Persisted - that is, if the data is written to the table. If the information is not persisted, then the information isn't even calculated / available until the field is queried.

High Sort Cost on Merge Operation

I am using the MERGE feature to insert data into a table using a bulk import table as source. (as described here)
This is my query:
DECLARE #InsertMapping TABLE (BulkId int, TargetId int);
MERGE dbo.Target T
USING dbo.Source S
ON 0=1 WHEN NOT MATCHED THEN
INSERT (Data) VALUES (Data)
OUTPUT S.Id BulkId, inserted.Id INTO #InsertMapping;
When evaluating the performance by displaying the actual execution plan, I saw that there is a high cost sorting done on the primary key index. I don't get it because the primary key should already be sorted ascending, there shouldn't be a need for additional sorting.
!
Because of this sort cost the query takes several seconds to complete. Is there a way to speed up the inserting? Maybe some index hinting or additional indices? Such an insert shouldn't take that long, even if there are several thousand entries.
I can reproduce this issue with the following
CREATE TABLE dbo.TargetTable(Id int IDENTITY PRIMARY KEY, Value INT)
CREATE TABLE dbo.BulkTable(Id int IDENTITY PRIMARY KEY, Value INT)
INSERT INTO dbo.BulkTable
SELECT TOP (1000000) 1
FROM sys.all_objects o1, sys.all_objects o2
DECLARE #TargetTableMapping TABLE (BulkId INT,TargetId INT);
MERGE dbo.TargetTable T
USING dbo.BulkTable S
ON 0 = 1
WHEN NOT MATCHED THEN
INSERT (Value)
VALUES (Value)
OUTPUT S.Id AS BulkId,
inserted.Id AS TargetId
INTO #TargetTableMapping;
This gives a plan with a sort before the clustered index merge operator.
The sort is on Expr1011, Action1010 which are both computed columns output from previous operators.
Expr1011 is the result of calling the internal and undocumented function getconditionalidentity to produce an id column for the identity column in TargetTable.
Action1010 is a flag indicating insert, update, delete. It is always 4 in this case as the only action this MERGE statement can perform is INSERT.
The reason the sort is in the plan is because the clustered index merge operator has the DMLRequestSort property set.
The DMLRequestSort property is set based on the number of rows expected to be inserted. Paul White explains in the comments here
[DMLRequestSort] was added to support the ability to minimally-log
INSERT statements in 2008. One of the preconditions for minimal
logging is that the rows are presented to the Insert operator in
clustered key order.
Inserting into tables in clustered index key order can be more efficient anyway as it reduces random IO and fragmentation.
If the function getconditionalidentity returns generated identity values in ascending order (as would seem reasonable) then the input to the sort will already be in the desired order. The sort in the plan would in that case be logically redundant, (there was previously a similar issue with unnecessary sorts with NEWSEQUENTIALID)
It is possible to get rid of the sort by making the expression a bit more opaque.
DECLARE #TargetTableMapping TABLE (BulkId INT,TargetId INT);
DECLARE #N BIGINT = 0x7FFFFFFFFFFFFFFF
MERGE dbo.TargetTable T
USING (SELECT TOP(#N) * FROM dbo.BulkTable) S
ON 1=0
WHEN NOT MATCHED THEN
INSERT (Value)
VALUES (Value)
OUTPUT S.Id AS BulkId,
inserted.Id AS TargetId
INTO #TargetTableMapping;
This reduces the estimated row count and the plan no longer has a sort. You will need to test whether or not this actually improves performance though. Possibly it might make things worse.

Querying minimum value in SQL Server is a lot longer than querying all the rows

I'm currently confronted with a strange behaviour in my database when I'm querying a minimum ID for a specific date in a table contains about a hundred million rows. The query is quite simple :
SELECT MIN(Id) FROM Connection WITH(NOLOCK) WHERE DateConnection = '2012-06-26'
This query nevers end, at least I let it run for hours. The DateConnection column is not an index neither included in one. So I would understand that this query can last quite a bit. But I tried the following query which runs in few seconds :
SELECT Id FROM Connection WITH(NOLOCK) WHERE DateConnection = '2012-06-26'
It returns 300k rows.
My table is defined as this :
CREATE TABLE [dbo].[Connection](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[DateConnection] [datetime] NOT NULL,
[TimeConnection] [time](7) NOT NULL,
[Hour] AS (datepart(hour,[TimeConnection])) PERSISTED NOT NULL,
CONSTRAINT [PK_Connection] PRIMARY KEY CLUSTERED
(
[Hour] ASC,
[Id] ASC
)
)
And it has the following index :
CREATE UNIQUE NONCLUSTERED INDEX [IX_Connection_Id] ON [dbo].[Connection]
(
[Id] ASC
)ON [PRIMARY]
One solutions I find using this strange behaviour is using the following code. But it seems to me quite a bit heavy for such a simple query.
create table #TempId
(
[Id] bigint
)
go
insert into #TempId
select id from partitionned_connection with(nolock) where dateconnection = '2012-06-26'
declare #displayId bigint
select #displayId = min(Id) from #CoIdTest
print #displayId
go
drop table #TempId
go
Has anybody been confronted to this behaviour and what is the cause of it ? Is the minimum aggregate scanning the entire table ? And if this is the case why the simple select does not ?
The root cause of the problem is the non-aligned nonclustered index, combined with the statistical limitation Martin Smith points out (see his answer to another question for details).
Your table is partitioned on [Hour] along these lines:
CREATE PARTITION FUNCTION PF (integer)
AS RANGE RIGHT
FOR VALUES (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23);
CREATE PARTITION SCHEME PS
AS PARTITION PF ALL TO ([PRIMARY]);
-- Partitioned
CREATE TABLE dbo.Connection
(
Id bigint IDENTITY(1,1) NOT NULL,
DateConnection datetime NOT NULL,
TimeConnection time(7) NOT NULL,
[Hour] AS (DATEPART(HOUR, TimeConnection)) PERSISTED NOT NULL,
CONSTRAINT [PK_Connection]
PRIMARY KEY CLUSTERED
(
[Hour] ASC,
[Id] ASC
)
ON PS ([Hour])
);
-- Not partitioned
CREATE UNIQUE NONCLUSTERED INDEX [IX_Connection_Id]
ON dbo.Connection
(
Id ASC
)ON [PRIMARY];
-- Pretend there are lots of rows
UPDATE STATISTICS dbo.Connection WITH ROWCOUNT = 200000000, PAGECOUNT = 4000000;
The query and execution plan are:
SELECT
MinID = MIN(c.Id)
FROM dbo.Connection AS c WITH (READUNCOMMITTED)
WHERE
c.DateConnection = '2012-06-26';
The optimizer takes advantage of the index (ordered on Id) to transform the MIN aggregate to a TOP (1) - since the minimum value will by definition be the first value encountered in the ordered stream. (If the nonclustered index were also partitioned, the optimizer would not choose this strategy since the required ordering would be lost).
The slight complication is that we also need to apply the predicate in the WHERE clause, which requires a lookup to the base table to fetch the DateConnection value. The statistical limitation Martin mentions explains why the optimizer estimates it will only need to check 119 rows from the ordered index before finding one with a DateConnection value that will match the WHERE clause. The hidden correlation between DateConnection and Id values means this estimate is a very long way off.
In case you are interested, the Compute Scalar calculates which partition to perform the Key Lookup into. For each row from the nonclustered index, it computes an expression like [PtnId1000] = Scalar Operator(RangePartitionNew([dbo].[Connection].[Hour] as [c].[Hour],(1),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23))), and this is used as the leading key of the lookup seek. There is prefetching (read-ahead) on the nested loops join, but this needs to be an ordered prefetch to preserve the sorting required by the TOP (1) optimization.
Solution
We can avoid the statistical limitation (without using query hints) by finding the minimum Id for each Hour value, and then taking the minimum of the per-hour minimums:
-- Global minimum
SELECT
MinID = MIN(PerHour.MinId)
FROM
(
-- Local minimums (for each distinct hour value)
SELECT
MinID = MIN(c.Id)
FROM dbo.Connection AS c WITH(READUNCOMMITTED)
WHERE
c.DateConnection = '2012-06-26'
GROUP BY
c.[Hour]
) AS PerHour;
The execution plan is:
If parallelism is enabled, you will see a plan more like the following, which uses parallel index scan and multi-threaded stream aggregates to produce the result even faster:
Although it might be wise to fix the problem in a way that doesn't require index hints, a quick solution is this:
SELECT MIN(Id) FROM Connection WITH(NOLOCK, INDEX(PK_Connection)) WHERE DateConnection = '2012-06-26'
This forces a table scan.
Alternatively, try this although it probably produces the same problem:
select top 1 Id
from Connection
WHERE DateConnection = '2012-06-26'
order by Id
It makes sense that finding the minimum takes longer than going through all the records. Finding the minimum of an unsorted structure takes much longer than traversing it once (unsorted because MIN() doesn't take advantage of the identity column). What you could do, since you're using an identity column, is have a nested select, where you take the first record from the set of records with the specified date.
The NC index scan is issue in you case.It is using the unique non clustered index scan and then for each row that is hundred million rows it will traverse the clustered index and thus it causes millions of io's(usually say your index hieght is 4 then it might cause 100million*4 IO's +index scan of the nonclustered index leaf page).Optimizer must have chosen this index to avoid the strem aggregate to get the minimum.To find minimum there are 3 main technique,one is using index on the column for which we want min (it is efficient if there is index and in that case no calc required as soon as you get the row it is returned),2nd it could use hash aggregate (but it usually happens when you have group by) and 3rd is stream aggregate here it will scan through all the rows which are qualified and keep the min value always and return min when all rows are scanned..
Howvere, when the query without min used the clustered index scan and thus is fast as it has to read less number of page and thus less io's.
Now question is why optimizer picked up the index scan on non clustered index.I am sure it is to avoid the compuation involved in stream aggregate to find the min value using stream aggregate but in thise case not using the stream aggregate is much more costly. This depends on estimation so i guess stats are not up to date in the table.
So fist of all check whether your stats are upto date.When was the stats were updated last?
Thus to avoid the issue.Do following
1. First update the table stats and I am sure it must remove your issue.
2. In case, you can not use update stats or update stats doesnt change the plan and still uses the NC index scan then you can force the clustered index scan so that it uses less IO's followed by stream aggregate to get min value.

Big Table Advice (SQL Server)

I'm experiencing massive slowness when accessing one of my tables and I need some re-factoring advice. Sorry if this is not the correct area for this sort of thing.
I'm working on a project that aims to report on server performance statistics for our internal servers. I'm processing windows performance logs every night (12 servers, 10 performance counters and logging every 15 seconds). I'm storing the data in a table as follows:
CREATE TABLE [dbo].[log](
[id] [int] IDENTITY(1,1) NOT NULL,
[logfile_id] [int] NOT NULL,
[test_id] [int] NOT NULL,
[timestamp] [datetime] NOT NULL,
[value] [float] NOT NULL,
CONSTRAINT [PK_log] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]
There's currently 16,529,131 rows and it will keep on growing.
I access the data to produce reports and create graphs from coldfusion like so:
SET NOCOUNT ON
CREATE TABLE ##RowNumber ( RowNumber int IDENTITY (1, 1), log_id char(9) )
INSERT ##RowNumber (log_id)
SELECT l.id
FROM log l, logfile lf
WHERE lf.server_id = #arguments.server_id#
and l.test_id = #arguments.test_id#"
and l.timestamp >= #arguments.report_from#
and l.timestamp < #arguments.report_to#
and l.logfile_id = lf.id
order by l.timestamp asc
select rn.RowNumber, l.value, l.timestamp
from log l, logfile lf, ##RowNumber rn
where lf.server_id = #arguments.server_id#
and l.test_id = #arguments.test_id#
and l.logfile_id = lf.id
and rn.log_id = l.id
and ((rn.rownumber % #modu# = 0) or (rn.rownumber = 1))
order by l.timestamp asc
DROP TABLE ##RowNumber
SET NOCOUNT OFF
(for not CF devs #value# inserts value and ## maps to #)
I basically create a temporary table so that I can use the rownumber to select every x rows. In this way I'm only selecting the amount of rows I can display. this helps but it's still very slow.
SQL Server Management Studio tells me my index's are as follows (I have pretty much no knowledge about using index's properly):
IX_logfile_id (Non-Unique, Non-Clustered)
IX_test_id (Non-Unique, Non-Clustered)
IX_timestamp (Non-Unique, Non-Clustered)
PK_log (Clustered)
I would be very grateful to anyone who could give some advice that could help me speed things up a bit. I don't mind re-organising things and I have complete control of the project (perhaps not over the server hardware though).
Cheers (sorry for the long post)
Your problem is that you chose a bad clustered key. Nobody is ever interested in retrieving one particular log value by ID. I your system is like anything else I've seen, then all queries are going to ask for:
all counters for all servers over a range of dates
specific counter values over all servers for a range of dates
all counters for one server over a range of dates
specific counter for specific server over a range of dates
Given the size of the table, all your non-clustered indexes are useless. They are all going to hit the index tipping point, guaranteed, so they might just as well not exists. I assume all your non-clustered index are defined as a simple index over the field in the name, with no include fields.
I'm going to pretend I actually know your requirements. You must forget common sense about storage and actually duplicate all your data in every non-clustered index. Here is my advice:
Drop the clustered index on [id], is a as useless as is it gets.
Organize the table with a clustered index (logfile_it, test_id, timestamp).
Non-clusterd index on (test_id, logfile_id, timestamp) include (value)
NC index on (logfile_id, timestamp) include (value)
NC index on (test_id, timestamp) include (value)
NC index on (timestamp) include (value)
Add maintenance tasks to reorganize all indexes periodically as they are prone to fragmentation
The clustered index covers the query 'history of specific counter value at a specific machine'. The non clustered indexes cover various other possible queries (all counters at a machine over time, specific counter across all machines over time etc).
You notice I did not comment anything about your query script. That is because there isn't anything in the world you can do to make the queries run faster over the table structure you have.
Now one thing you shouldn't do is actually implement my advice. I said I'm going to pretend I know your requirements. But I actually don't. I just gave an example of a possible structure. What you really should do is study the topic and figure out the correct index structure for your requirements:
General Index Design Guidelines.
Index Design Basics
Index with Included Columns
Query Types and Indexes
Also a google on 'covering index' will bring up a lot of good articles.
And of course, at the end of the day storage is not free so you'll have to balance the requirement to have a non-clustered index on every possible combination with the need to keep the size of the database in check. Luckly you have a very small and narrow table, so duplicating it over many non-clustered index is no big deal. Also I wouldn't be concerned about insert performance, 120 counters at 15 seconds each means 8-9 inserts per second, which is nothing.
A couple things come to mind.
Do you need to keep that much data? If not, consider either creating an archive table if you want to keep it (but don't create it just to join it with the primary table every time you run a query).
I would avoid using a temp table with so much data. See this article on temp table performance and how to avoid using them.
http://www.sql-server-performance.com/articles/per/derived_temp_tables_p1.aspx
It looks like you are missing an index on the server_id field. I would consider creating a covered index using this field and others. Here is an article on that as well.
http://www.sql-server-performance.com/tips/covering_indexes_p1.aspx
Edit
With that many rows in the table over such a short time frame, I would also check the indexes for fragmentation which may be a cause for slowness. In SQL Server 2000 you can use the DBCC SHOWCONTIG command.
See this link for info http://technet.microsoft.com/en-us/library/cc966523.aspx
Also, please note that I have numbered these items as 1,2,3,4 however the editor is automatically resetting them
Once when still working with sql server 2000, i needed to do some paging, and i came accross a method of paging that realy blew my mind. Have a look at this method.
DECLARE #Table TABLE(
TimeVal DATETIME
)
DECLARE #StartVal INT
DECLARE #EndVal INT
SELECT #StartVal = 51, #EndVal = 100
SELECT *
FROM (
SELECT TOP (#EndVal - #StartVal + 1)
*
FROM (
--select up to end number
SELECT TOP (#EndVal)
*
FROM #Table
ORDER BY TimeVal ASC
) PageReversed
ORDER BY TimeVal DESC
) PageVals
ORDER BY TimeVal ASC
As an example
SELECT *
FROM (
SELECT TOP (#EndVal - #StartVal + 1)
*
FROM (
SELECT TOP (#EndVal)
l.id,
l.timestamp
FROM log l, logfile lf
WHERE lf.server_id = #arguments.server_id#
and l.test_id = #arguments.test_id#"
and l.timestamp >= #arguments.report_from#
and l.timestamp < #arguments.report_to#
and l.logfile_id = lf.id
order by l.timestamp asc
) PageReversed ORDER BY timestamp DESC
) PageVals
ORDER BY timestamp ASC

Resources