table needs indexes to improve performance - sql-server

I was having timeout issue when giving long period of DateTime in below query (query runs from c# application). Table had 30 million rows with a non-clustered index on ID(not a primary key).
Found that there was no primary key so I recently updated ID as Primary Key, it’s not giving me timeout now. Can anyone help me for the below query to create index on more than one key for future and also if I remove non clustered index from this table and create on more than one column? Data is increasing rapidly and need improvement on performace
select
ID, ReferenceNo, MinNo, DateTime, DataNo from tbl1
where
DateTime BETWEEN '04/09/2013' AND '20/11/2013'
and ReferenceNo = 4 and MinNo = 3 and DataNo = 14 Order by ID
this is the create script
CREATE TABLE [dbo].[tbl1]( [ID] [int] IDENTITY(1,1) not null, [ReferenceNo] [int] not null, [MinNo] [int] not null, [DateTime] [datetime] not null, [DataNo] [int] not null, CONSTRAINT [tbl1_pk] PRIMARY KEY CLUSTERED ([ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS
= ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]

Its hard to tell which index you should use without knowing more about your database and how its used.
You may want to change the ID column to a clustered index. If ID is an identity column you will get very few page splits while inserting new data. It will however require you to rebuild the table and that may be a problem depending on your usage of the database. You will be looking at some downtime.
If you want a covering index it should look something like this:
CREATE NONCLUSTERED INDEX [MyCoveringIndex] ON tbl1
(
[ReferenceNo] ASC,
[MinNo] ASC,
[DataNo] ASC,
[DateTime ] ASC
)
Its no need to include ID as a column as its already in the clusted index (clusted index columns will be included in all other indexes). This will however use up a whole lot of space (somewhere in the range of 1GB if the columns above are of the types int and datetime). It will also affect your insert, update and delete performance on the table in (most cases) a negative way.
You can create the index in online mode if you are using Enterprice Edition of SQL server. In all other cases there will be a lock on the table while creating the index.
Its also hard to know what other queries that are made against the table. You may want to tweek the order of the columns in the index to better match other queries.

Indexing all fields would be fastest, but would likely waste a ton of space. I would guess that a date index would provide the most benefit with the least storage capacity cost because the data is probably evenly spread out over a large period of time. If the MIN() MAX() dates are close together, then this will not be as effective:
CREATE NONCLUSTERED INDEX [IDX_1] ON [dbo].[tbl1] (
[DateTime] ASC
)
GO
As a side note, you can use SSMSE's "Display Estimated Execution Plan" which will show you what the DB needs to do to get your data. It will suggest missing indexes and also provide CREATE INDEX statements. These suggestions can be quite wasteful, but they will give you an idea of what is taking so long. This option is in the Standard Toolbar, four icons to the right from "Execute".

Related

How can a query with ORDER BY run faster than the same query without ordering?

I have a SQL Server database with EventJournal table with the following columns:
Ordering (bigint, primary key)
PersistenceID (nvarchar(255))
SequenceNr (bigint)
Payload (varbinary(max))
Other columns are omitted for clarity. In addition to the primary key on Ordering there is a unique constraint on PersistenceID+SequenceNr.
If I run a query
select top 100 * from EventJournal where PersistenceID like 'msc:%'
... it takes very long time to execute (the table contains more than 100M rows)
But if I add ordering to results:
select top 100 * from EventJournal where PersistenceID like 'msc:%' order by Ordering
... then it returns the result immediately.
The execution plan for both queries are the same and in essence is the clustered index scan on PK. Then why does the first query take long time to execute?
Here's the table definition:
CREATE TABLE [dbo].[EventJournal](
[PersistenceID] [nvarchar](255) NOT NULL,
[SequenceNr] [bigint] NOT NULL,
[IsDeleted] [bit] NOT NULL,
[Manifest] [nvarchar](500) NOT NULL,
[Payload] [varbinary](max) NOT NULL,
[Timestamp] [bigint] NOT NULL,
[Tags] [nvarchar](100) NULL,
[Ordering] [bigint] IDENTITY(1,1) NOT NULL,
[SerializerId] [int] NULL,
CONSTRAINT [PK_EventJournal] PRIMARY KEY CLUSTERED
(
[Ordering] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [QU_EventJournal] UNIQUE NONCLUSTERED
(
[PersistenceID] ASC,
[SequenceNr] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
This it 1st plan:
https://www.brentozar.com/pastetheplan/?id=SJ3kCo-Fv
And here's the 2nd one:
https://www.brentozar.com/pastetheplan/?id=Hy1B0ibtP
As I mentioned in my comments, the plans are different, the difference is in access method:
The first plan uses unordered scan:
And the second plan uses ordered scan:
By the way, the other answer suggests useless index.
SQL Server will NOT use this index, as it's equivalent to the non-clustered index already in place. As the index QU_EventJournal on ([PersistenceID], [SequenceNr]) was not used, the same way the index on (PersistenceID, Ordering) will not be used.
Both of these indexes has PersistenceID, Ordering in the index as Ordering is clustered index key, so it is presented in index on ([PersistenceID], [SequenceNr]) even if you don't see it in the definition. The suggested index will be also bigger as it is not defined as unique, and the sizes of other fields are the same: Ordering is bigint, SequenceNr is bigint.
It's wrong to think that in index on 2 fields the second field(Ordering) can be used to avoid the sort in the second query, it's not true.
For example the index on PersistenceID, Ordering can have rows like these:
msc:123, 100
msc:124, 5
msc:124, 6
msc:125, 1
I hope you clearly see that the index is ordered by PersistenceID, Ordering,
but the result of the second query is expected to be
msc:125, 1
msc:124, 5
msc:124, 6
msc:123, 100
So the SORT operator is needed, so this index will not be used.
Now to your question:
shouldn't lack of ORDER BY be used by the query analyzer as an
opportunity to build more efficient execution plan without ordering
constraints
Yes you are correct, without order by server is free to choose both the ordered and unordered scan, and yes you are right in this:
I also don't understand why using TOP without ORDER BY is a bad
practice in case I want ANY N rows from the result
When you don't need top N ordered by, because you just want to see what kind of records have 'msc:' in them, you should not add order by because it could cause a SORT in your plan.
And to your main question:
Then why does the first query take long time to execute?
The answer is: this was pure coincidence.
Your data is laying in way that the rows with 'msc:' in them go first, in the order defined by Ordering. And if you scan your index not in order they are just in the middle or at the end of the table.
If you seek for another pattern in PersistenceID the unordered scan will be faster

Speeding up SQL query with index on identity column or the searched column?

When I extract my table I get this. The table has an ID column which is an identity column (autoinc).
Then there is still the readable Customer number which is theoretically unique, but the table does not enforce anything so far.
My customers are searching for the customer number not the Id.
My question now: should I still add an index (if yes clustered/nonclustered?) to the CUSTERMERNUMBER column to increase the speed of the search?
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE Customer
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[CUSTOMERNUMBER] [nvarchar](50) NULL,
-- other columns
CONSTRAINT [PK_Customer]
PRIMARY KEY CLUSTERED ([ID] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
If the ID field is referenced in other tables as a foreign key then leave that as the clustered index for sure, and create a non-clustered index on CUSTOMERNUMBER.
Consider not just creating an index on CUSTOMERNUMBER but going further and creating a Unique Constraint on it (which comes with an index). This will prevent the business rule requiring unique CUSTOMERNUMBER from being violated, and also gives additional information to the database that it can use to make operations more efficient.
As always, test first:
Alter Table Customer Add Constraint uCUSTOMERNUMBER Unique (CUSTOMERNUMBER);
(A downside of a unique constraint is that the unique index it creates can't include additional columns. If having includes was a requirement then a unique, non-clustered index is an option.)
If the majority of searches on your table are being done on the Customer Number, it might be a great idea. You can also create several test queries before you create your index and run the queries on your index again after you create your index to see if the Index actually increases your performance.
When deciding if you should make the index clustered or non-clustered, you should determine if you already have a clustered index (because they are created automatically on the primary key) and if this new index would be better utilized as the primary key. If so, you may have to create some constraints so customer number has to be unique to guarantee search correctness.
If you are interested in learning more about indexes, feel free to checkout this article I wrote:
https://dataschool.com/learn/how-indexing-works
Creating a Unique/Non Clustered index on customer number is a good idea .But we should be careful as to not create too many indexes especially if the table is huge with lots of DML s happening .

Cluster index on varchar on small table

Hy guys,
I inherited a database with the following table with only 200 rows:
CREATE TABLE [MyTable](
[Id] [uniqueidentifier] NOT NULL,
[Name] [varchar](255) NULL,
[Value] [varchar](8000) NULL,
[EffectiveStartDate] [datetime] NULL,
[EffectiveEndDate] [datetime] NULL,
[Description] [varchar](2000) NOT NULL DEFAULT (''),
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
As you can see there is a Clustered PK on a UniqueIdentifier column. I was doing some performance checks and the most expensive query so far (CPU and IO) is the following:
SELECT #Result = Value
FROM MyTable
WHERE #EffectiveDate BETWEEN EffectiveStartDate AND EffectiveEndDate
AND Name=#VariableName
The query above is encapsulated in a UDF and usually the udf is not called in a select list or where clause, instead it's return is usually assigned to a variable.
The execution plan shows a Clustered Index Scan
Our system is based in a large number of aggregations and math processing in real time. Every time our Web Application refreshes the main page, it calls a bunch of Stored Procedures and UDFs and the query above is run around 500 times per refresh per user.
My question is: Should I change the PK to nonclustered and create a clustered index on the Name, EffectiveStartDate, EffectiveEndDate in a such small table?
No you should not. You can just add another index which will be covering index:
CREATE INDEX [IDX_Covering] ON dbo.MyTable(Name, EffectiveStartDate, EffectiveEndDate)
INCLUDE(Value)
If #VariableName and #EffectiveDate are variables with correct types you should now see index seek.
I am not sure this will help, but you need to try, because index scan of 200 rows is just nothing, but calling it 500 times may be a problem. By the way if those 200 rows are in one page I suspect this will not help. The problem may be somewhere else, like opening a connection 500 times or something like that...

Partitioning in SQL Server Standard Edition with billion of rows

hi would like to ask about how to partition the following table (see below). The problem i'm having is not in the retrieval of History records which was resolved by the clustered Index. But as you can see the index is based on the HistoryParameterID then TimeStamp, this is needed because the retrieval of rows are based on the columns stated above.
The problem here is that whenever it reaches ~1 billion records, inserts are slowing down since the scenario is there will be 15k rows\second (note this can be 30k - 100k) to be inserted and per row it corresponds to a HistoryParameterID.
Basically, the HistoryParameterID is not unique , it has a one -> many relation ship with the other columns of the table below.
My hunch is that because of the index, it slows down the inserts because inserts are not always at the bottom because it is arranged by HistoryParameterID.
I did some testing using Timestamp as index but to no avail since query performance is unacceptable.
is there any way to partition this by history ParameterID? I was trying it so i created 15k Tables for partition view. But when i created the view it didn't finish executing. Any tips? or is there any way to partition ? Please note that i'm using Standard edition and using enterprise edition is not an option.
CREATE TABLE [dbo].[HistorySampleValues]
(
[HistoryParameterID] [int] NOT NULL,
[SourceTimeStamp] [datetime2](7) NOT NULL,
[ArchiveTimestamp] [datetime2](7) NOT NULL CONSTRAINT [DF__HistorySa__Archi__2A164134] DEFAULT (getutcdate()),
[ValueStatus] [int] NOT NULL,
[ArchiveStatus] [int] NOT NULL,
[IntegerValue] [bigint] SPARSE NULL,
[DoubleValue] [float] SPARSE NULL,
[StringValue] [varchar](100) SPARSE NULL,
[EnumNamedSetName] [varchar](100) SPARSE NULL,
[EnumNumericValue] [int] SPARSE NULL,
[EnumTextualValue] [varchar](256) SPARSE NULL
) ON [PRIMARY]
CREATE CLUSTERED INDEX [Source_HistParameterID_Index] ON [dbo].[HistorySampleValues]
(
[HistoryParameterID] ASC,
[SourceTimeStamp] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
I was trying it so i created 15k Tables for partition view. But when
i created the view it didn't finish executing. Any tips? or is there
any way to partition ? Please note that i'm using Standard edition and
using enterprise edition is not an option.
If you go down the partitioned view path (http://technet.microsoft.com/en-us/library/ms190019.aspx), I suggest fewer tables (under one hundred). Without partitioned tables, the optimizer must go through a lot of work since each table of the view could be indexed differently.
I would not expect inserts to slow down with table size if HistoryParameterID is incremental. However, in the case of a random value, inserts will become progressively slower as the table size grows due to lower buffer cache efficiency. That problem will exist with a single table, partitioned table, or partitioned view. See http://www.dbdelta.com/improving-uniqueidentifier-performance/ for an example using a guid but the issue applies to any random key value.
You might try a single table with SourceTimestamp alone as the clustered index key and a non-clustered index on HistoryID nad SourceTimestamp. That would provide the best insert performance and the non-clustered index (maybe with included columns) might be good enough for your select queries.
Everything you need is here. I'll hope you can figure it out.
http://msdn.microsoft.com/en-us/library/ms188730.aspx
and for Standard Edition alternative solutions exist like this answer.
and this is an interesting article too.
also we implement that in our enterprise automation application with custom indexing around table of users and it worked well.
Here's the cons and pros of custom implementation:
Pros:
Higher performance that partitioned table because of application's logic awareness.
Cons:
Implementing routing method and updating indexes.
Un-Centralized data.

Sql Server Delete and Merge performance

I've table that contains some buy/sell data, with around 8M records in it:
CREATE TABLE [dbo].[Transactions](
[id] [int] IDENTITY(1,1) NOT NULL,
[itemId] [bigint] NOT NULL,
[dt] [datetime] NOT NULL,
[count] [int] NOT NULL,
[price] [float] NOT NULL,
[platform] [char](1) NOT NULL
) ON [PRIMARY]
Every X mins my program gets new transactions for each itemId and I need to update it. My first solution is two step DELETE+INSERT:
delete from Transactions where platform=#platform and itemid=#itemid
insert into Transactions (platform,itemid,dt,count,price) values (#platform,#itemid,#dt,#count,#price)
[...]
insert into Transactions (platform,itemid,dt,count,price) values (#platform,#itemid,#dt,#count,#price)
The problem is, that this DELETE statement takes average 5 seconds. It's much too long.
The second solution I found is to use MERGE. I've created such Stored Procedure, wchich takes Table-valued parameter:
CREATE PROCEDURE [dbo].[sp_updateTransactions]
#Table dbo.tp_Transactions readonly,
#itemId bigint,
#platform char(1)
AS
BEGIN
MERGE Transactions AS TARGET
USING #Table AS SOURCE
ON (
TARGET.[itemId] = SOURCE.[itemId] AND
TARGET.[platform] = SOURCE.[platform] AND
TARGET.[dt] = SOURCE.[dt] AND
TARGET.[count] = SOURCE.[count] AND
TARGET.[price] = SOURCE.[price] )
WHEN NOT MATCHED BY TARGET THEN
INSERT VALUES (SOURCE.[itemId],
SOURCE.[dt],
SOURCE.[count],
SOURCE.[price],
SOURCE.[platform])
WHEN NOT MATCHED BY SOURCE AND TARGET.[itemId] = #itemId AND TARGET.[platform] = #platform THEN
DELETE;
END
This procedure takes around 7 seconds with table with 70k records. So with 8M it would probably take few minutes. The bottleneck is "When not matched" - when I commented this line, this procedure runs on average 0,01 second.
So the question is: how to improve perfomance of the delete statement?
Delete is needed to make sure, that table doesn't contains transaction that as been removed in application. But it real scenario it happens really rarely, ane the true need of deleting records is less than 1 on 10000 transaction updates.
My theoretical workaround is to create additional column like "transactionDeleted bit" and use UPDATE instead of DELETE, ane then make table cleanup by batch job every X minutes or hours and Execute
delete from transactions where transactionDeleted=1
It should be faster, but I would need to update all SELECT statements in other parts of application, to use only transactionDeleted=0 records and so it also may afect application performance.
Do you know any better solution?
UPDATE: Current indexes:
CREATE NONCLUSTERED INDEX [IX1] ON [dbo].[Transactions]
(
[platform] ASC,
[ItemId] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 50) ON [PRIMARY]
CONSTRAINT [IX2] UNIQUE NONCLUSTERED
(
[ItemId] DESC,
[count] ASC,
[dt] DESC,
[platform] ASC,
[price] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
OK, here is another approach also. For a similar problem (large scan WHEN NOT MATCHED BY SOURCE then DELETE) I reduced the MERGE execute time from 806ms to 6ms!
One issue with the problem above is that the "WHEN NOT MATCHED BY SOURCE" clause is scanning the whole TARGET table.
It is not that obvious but Microsoft allows the TARGET table to be filtered (by using a CTE) BEFORE doing the merge. So in my case the TARGET rows were reduced from 250K to less than 10 rows. BIG difference.
Assuming that the above problem works with the TARGET being filtered by #itemid and #platform then the MERGE code would look like this. The changes above to the indexes would help this logic too.
WITH Transactions_CTE (itemId
,dt
,count
,price
,platform
)
AS
-- Define the CTE query that will reduce the size of the TARGET table.
(
SELECT itemId
,dt
,count
,price
,platform
FROM Transactions
WHERE itemId = #itemId
AND platform = #platform
)
MERGE Transactions_CTE AS TARGET
USING #Table AS SOURCE
ON (
TARGET.[itemId] = SOURCE.[itemId]
AND TARGET.[platform] = SOURCE.[platform]
AND TARGET.[dt] = SOURCE.[dt]
AND TARGET.[count] = SOURCE.[count]
AND TARGET.[price] = SOURCE.[price]
)
WHEN NOT MATCHED BY TARGET THEN
INSERT
VALUES (
SOURCE.[itemId]
,SOURCE.[dt]
,SOURCE.[count]
,SOURCE.[price]
,SOURCE.[platform]
)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
Using a BIT field for IsDeleted (or IsActive as many people do) is valid but it does require modifying all code plus creating a separate SQL Job to periodically come through and remove the "deleted" records. This might be the way to go but there is something less intrusive to try first.
I noticed in your set of 2 indexes that neither is CLUSTERED. Can I assume that the IDENTITY field is? You might consider making the [IX2] UNIQUE index the CLUSTERED one and changing the PK (again, I assume the IDENTITY field is a CLUSTERED PK) to be NONCLUSTERED. I would also reorder the IX2 fields to put [Platform] and [ItemID] first. Since your main operation is looking for [Platform] and [ItemID] as a set, physically ordering them this way might help. And since this index is unique, that is a good candidate for being CLUSTERED. It is certainly worth testing as this will impact all queries against the table.
Also, if changing the indexes as I have suggested helps, it still might be worth trying both ideas and hence doing the IsDeleted field as well to see if that increases performance even more.
EDIT:
I forgot to mention, by making the IX2 index CLUSTERED and moving the [Platform] field to the top, you should get rid of the IX1 index.
EDIT2:
Just to be very clear, I am suggesting something like:
CREATE UNIQUE CLUSTERED INDEX [IX2]
(
[ItemId] DESC,
[platform] ASC,
[count] ASC,
[dt] DESC,
[price] ASC
)
And to be fair, changing which index is CLUSTERED could also negatively impact queries where JOINs are done on the [id] field which is why you need to test thoroughly. In the end you need to tune the system for your most frequent and/or expensive queries and might have to accept that some queries will be slower as a result but that might be worth this operation being much faster.
See this https://stackoverflow.com/questions/3685141/how-to-....
would the update be the same cost as a delete? No. The update would be
a much lighter operation, especially if you had an index on the PK
(errrr, that's a guid, not an int). The point being that an update to
a bit field is much less expensive. A (mass) delete would force a
reshuffle of the data.
In light of this information, your idea to use a bit field is very valid.

Resources