I had previously posted a question about my query speed with an XML column. After some further investigation I have found that it is not with the XML as previously thought. The table schema and query are very simple. There are over 800K rows, everything was running smooth but not with the increase in records it is taking almost a minute to run.
The Table:
/****** Object: Table [dbo].[Audit] Script Date: 08/14/2009 09:49:01 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Audit](
[ID] [int] IDENTITY(1,1) NOT NULL,
[PID] [int] NOT NULL,
[Page] [int] NOT NULL,
[ObjectID] [int] NOT NULL,
[Data] [nvarchar](max) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Created] [datetime] NULL,
CONSTRAINT [PK_Audit] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
The query:
SELECT *
FROM Audit
WHERE PID = 158
AND Page = 2
AND ObjectID = 93
The query only return 26 records and the interesting thing is that if I add "TOP 26" the query executes in less than a second, if I change it to "TOP 27" then it take a minute. Even if I change the query to SELECT ID, it does not matter.
Any help is appreciated
You have an index on ID, but your query is using other columns instead. Therefore, you're probably getting a full table scan. Changing to SELECT ID makes no difference because it's not anywhere in the WHERE clause.
It's quick when you ask for TOP 26 because it can quit once it finds 26 rows because you don't have any ORDER BY clause. Changing it to TOP 27 means that, once it finds the first 26 (which are all the matches, according to your post), it can't quit looking; it has to continue to search until it either finds a 27th matching row or reaches the end of the data.
A SHOW PLAN would have shown you the problem pretty quickly.
Add an index for the PID, Page and ObjectID fields.
Why not add a covering index for the Page and Object ID columns and call it a day?
I think you must add non unique indexes to your columns you want to search. Indexing will certainly reduce the search time it takes. Whether requesting single column or multi column in SELECT query will not make any difference. The time it takes to individually compare rows needs to be reduced by indexing.
the 26 rows are probably near the start of the table, when you scan you find them fast and abort the rest of the scan, when looking for the 27th that doesn't exist you scan the entire table, which is slow!
when looking for these type of problems try this from query management studio:
run: _SET SHOWPLAN_ALL ON_
then run your query, look for the word "SCAN", they is mostlikely where your query is running slow, figure out why no index is being used.
in this case you need to ad an index. I generally add an index based on how I query the data, if you always have one of the three: PID, Page, and Object ID, add an index with that column first, add another column to that index if you have that value sometimes too. etc.
Related
Here is my table which has order_number column. The table has less than 500 rows at the moment. A non clustered index on order_number has been created.
CREATE TABLE [outbound_service].[shipment_line]
(
[id] [uniqueidentifier] NOT NULL,
[shipment_id] [uniqueidentifier] NOT NULL,
[order_number] [varchar](255) NOT NULL,
.... 18 other columns
CONSTRAINT [PK_SHIPMENT_LINE]
PRIMARY KEY CLUSTERED ([id] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY],
CONSTRAINT [uk_order_order_line_number]
UNIQUE NONCLUSTERED ([order_number] ASC, [order_line_number] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX IX_shipment_line_order
ON outbound_service.shipment_line(order_number ASC)
Here is my simple equals check query that might return at max 5 rows.
DECLARE #P0 nvarchar(400) ='LG-ORD-002';
SELECT TOP 1 sl.order_number
FROM outbound_service.shipment_line sl
WHERE sl.order_number = #P0
I have expected nonclustered index seek, but I see an index scan happening. Very limited data at max 5 rows per order_number:
If I run the query without bind parameters, I see the index seek:
I have another database where I expect millions of rows and am worried about this scan as it is leading to 100 CPU on this query with high concurrency and slowing down rest of the workflows.
What could be the reason here when the data to return from index is very minimal but still SQL Server seems to be liking scan instead of seek?
There is a convert_implicit from your first execution plan, I would always align parameter type as column type and size, in addition, if you want to get the result by top I would suggest using ORDER BY order_number for two reasons.
A non-clustered index IX_shipment_line_order might be chosen to use without sorting cost.
Without order by we might not guarantee the result as you expect if you use TOP.
As this query
DECLARE #P0 [varchar](255) ='LG-ORD-002';
SELECT TOP 1 sl.order_number
FROM outbound_service.shipment_line sl
WHERE sl.order_number = #P0
ORDER BY sl.order_number
Here is the final answer after digging a bit into Hibernate, SQL Driver code, Using SQL Profiler to understand the impact on queries. HIbernate logs do not give any clue about the VARCHAR conversion to NVARCHAR. It is the MS SQL Driver by default passing all String variables in JPA Entity as nvarchar inputs into SQL where clause even though the underlying mapped columns are just varchars. This was forcing SQL Engine to do implicit conversion of the column data also into nvarchar.
Solution:
Passing in sendStringParametersAsUnicode=false; in the connection URL would change the driver's default behavior. By default all Entity Strings will be varchars in the query where clause. If there is a need for nvarchar and the driver not taking it into account automatically, the specific field in the entity can be annotated with #Nationalized annotation provided by Hibernate. Hibernate would help set the right expectation in this case to the SQL Driver.
sendStringParametersAsUnicode set to false is helpful if we have too many varchar columns that would not need to store international language characters and we refer to such columns heavily in where clause. This would help avoid index scans.
I need to create statistics from several log tables. Most of the time every hour but sometimes more frequently every 5 minutes.
Selecting rows only by datetime isn't fast enough for larger logs so I thought I select only rows that are new since the last query by storing the max Id and reusing it next time:
SELECT TOP(1000) * -- so that it's not too much
FROM [dbo].[Log]
WHERE Id > lastId AND [Timestamp] >= timestampMin
ORDER BY [Id] DESC
My question: is the SQL Server smart enough to:
first filter the rows by Id and then by the Timestamp even if I change the order of the conditions or does the condition order matter or
do I need a subquery to first select the rows by Id and then filter them by the Timestamp.
with subquery:
SELECT *
FROM (
SELECT TOP(1000) * FROM [dbo].[Log]
WHERE Id > lastId
ORDER BY [Id] DESC
) t
WHERE t.[TimeStamp] >= timestampMin
The table schema is:
CREATE TABLE [dbo].[Log](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Timestamp] [datetime2](7) NOT NULL,
-- other columns
CONSTRAINT [PK_dbo_Log] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
I tried to use the query plan to find out how it works but it turns out that I cannot read it and I don't understand it.
In your case you don't have an index on TimeStamp so SQL Server will always use the Clustered Index (Id) first (the Clustered index seek you see in the query plan) to find the first row matching Id > lastId and then perform a scan on the remaining rows with the predicate [Timestamp] >= timestampMin (actually is the other way around since you are sorting in reverse order with DESC).
If you were to add a index on TimeStamp SQL Server might use it based on:
the cardinality of the predicate [Timestamp] >= timestampMin. Please note that cardinality is always an estimate based on statistics (see https://msdn.microsoft.com/en-us/library/ms190397.aspx) and the cardinality estimator (it changed from SQL 2012 to 2014+, see https://msdn.microsoft.com/en-us/library/dn600374.aspx).
how covering the non-clustered index is (since you are using the wildcard it would hardly matters anyway). If the non-clustered index is non covering SQL Server would have to add a Key Lookup (see https://technet.microsoft.com/en-us/library/bb326635(v=sql.105).aspx) operator in order to retrieve all the fields (or perform a join). This will likely make the index not worthwhile for this query.
Also note that your two queries - the one with subplan and the one without - are functionally different. The first will give you the first 1000 rows the have both Id > lastId AND [Timestamp] >= timestampMin. The second will give you only the rows having [Timestamp] >= timestampMin from the first 1000 rows having Id > lastId. So, for example, you might get 1000 rows from the first query but less than that on the second one.
hi would like to ask about how to partition the following table (see below). The problem i'm having is not in the retrieval of History records which was resolved by the clustered Index. But as you can see the index is based on the HistoryParameterID then TimeStamp, this is needed because the retrieval of rows are based on the columns stated above.
The problem here is that whenever it reaches ~1 billion records, inserts are slowing down since the scenario is there will be 15k rows\second (note this can be 30k - 100k) to be inserted and per row it corresponds to a HistoryParameterID.
Basically, the HistoryParameterID is not unique , it has a one -> many relation ship with the other columns of the table below.
My hunch is that because of the index, it slows down the inserts because inserts are not always at the bottom because it is arranged by HistoryParameterID.
I did some testing using Timestamp as index but to no avail since query performance is unacceptable.
is there any way to partition this by history ParameterID? I was trying it so i created 15k Tables for partition view. But when i created the view it didn't finish executing. Any tips? or is there any way to partition ? Please note that i'm using Standard edition and using enterprise edition is not an option.
CREATE TABLE [dbo].[HistorySampleValues]
(
[HistoryParameterID] [int] NOT NULL,
[SourceTimeStamp] [datetime2](7) NOT NULL,
[ArchiveTimestamp] [datetime2](7) NOT NULL CONSTRAINT [DF__HistorySa__Archi__2A164134] DEFAULT (getutcdate()),
[ValueStatus] [int] NOT NULL,
[ArchiveStatus] [int] NOT NULL,
[IntegerValue] [bigint] SPARSE NULL,
[DoubleValue] [float] SPARSE NULL,
[StringValue] [varchar](100) SPARSE NULL,
[EnumNamedSetName] [varchar](100) SPARSE NULL,
[EnumNumericValue] [int] SPARSE NULL,
[EnumTextualValue] [varchar](256) SPARSE NULL
) ON [PRIMARY]
CREATE CLUSTERED INDEX [Source_HistParameterID_Index] ON [dbo].[HistorySampleValues]
(
[HistoryParameterID] ASC,
[SourceTimeStamp] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
I was trying it so i created 15k Tables for partition view. But when
i created the view it didn't finish executing. Any tips? or is there
any way to partition ? Please note that i'm using Standard edition and
using enterprise edition is not an option.
If you go down the partitioned view path (http://technet.microsoft.com/en-us/library/ms190019.aspx), I suggest fewer tables (under one hundred). Without partitioned tables, the optimizer must go through a lot of work since each table of the view could be indexed differently.
I would not expect inserts to slow down with table size if HistoryParameterID is incremental. However, in the case of a random value, inserts will become progressively slower as the table size grows due to lower buffer cache efficiency. That problem will exist with a single table, partitioned table, or partitioned view. See http://www.dbdelta.com/improving-uniqueidentifier-performance/ for an example using a guid but the issue applies to any random key value.
You might try a single table with SourceTimestamp alone as the clustered index key and a non-clustered index on HistoryID nad SourceTimestamp. That would provide the best insert performance and the non-clustered index (maybe with included columns) might be good enough for your select queries.
Everything you need is here. I'll hope you can figure it out.
http://msdn.microsoft.com/en-us/library/ms188730.aspx
and for Standard Edition alternative solutions exist like this answer.
and this is an interesting article too.
also we implement that in our enterprise automation application with custom indexing around table of users and it worked well.
Here's the cons and pros of custom implementation:
Pros:
Higher performance that partitioned table because of application's logic awareness.
Cons:
Implementing routing method and updating indexes.
Un-Centralized data.
I was having timeout issue when giving long period of DateTime in below query (query runs from c# application). Table had 30 million rows with a non-clustered index on ID(not a primary key).
Found that there was no primary key so I recently updated ID as Primary Key, it’s not giving me timeout now. Can anyone help me for the below query to create index on more than one key for future and also if I remove non clustered index from this table and create on more than one column? Data is increasing rapidly and need improvement on performace
select
ID, ReferenceNo, MinNo, DateTime, DataNo from tbl1
where
DateTime BETWEEN '04/09/2013' AND '20/11/2013'
and ReferenceNo = 4 and MinNo = 3 and DataNo = 14 Order by ID
this is the create script
CREATE TABLE [dbo].[tbl1]( [ID] [int] IDENTITY(1,1) not null, [ReferenceNo] [int] not null, [MinNo] [int] not null, [DateTime] [datetime] not null, [DataNo] [int] not null, CONSTRAINT [tbl1_pk] PRIMARY KEY CLUSTERED ([ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS
= ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
Its hard to tell which index you should use without knowing more about your database and how its used.
You may want to change the ID column to a clustered index. If ID is an identity column you will get very few page splits while inserting new data. It will however require you to rebuild the table and that may be a problem depending on your usage of the database. You will be looking at some downtime.
If you want a covering index it should look something like this:
CREATE NONCLUSTERED INDEX [MyCoveringIndex] ON tbl1
(
[ReferenceNo] ASC,
[MinNo] ASC,
[DataNo] ASC,
[DateTime ] ASC
)
Its no need to include ID as a column as its already in the clusted index (clusted index columns will be included in all other indexes). This will however use up a whole lot of space (somewhere in the range of 1GB if the columns above are of the types int and datetime). It will also affect your insert, update and delete performance on the table in (most cases) a negative way.
You can create the index in online mode if you are using Enterprice Edition of SQL server. In all other cases there will be a lock on the table while creating the index.
Its also hard to know what other queries that are made against the table. You may want to tweek the order of the columns in the index to better match other queries.
Indexing all fields would be fastest, but would likely waste a ton of space. I would guess that a date index would provide the most benefit with the least storage capacity cost because the data is probably evenly spread out over a large period of time. If the MIN() MAX() dates are close together, then this will not be as effective:
CREATE NONCLUSTERED INDEX [IDX_1] ON [dbo].[tbl1] (
[DateTime] ASC
)
GO
As a side note, you can use SSMSE's "Display Estimated Execution Plan" which will show you what the DB needs to do to get your data. It will suggest missing indexes and also provide CREATE INDEX statements. These suggestions can be quite wasteful, but they will give you an idea of what is taking so long. This option is in the Standard Toolbar, four icons to the right from "Execute".
I've table that contains some buy/sell data, with around 8M records in it:
CREATE TABLE [dbo].[Transactions](
[id] [int] IDENTITY(1,1) NOT NULL,
[itemId] [bigint] NOT NULL,
[dt] [datetime] NOT NULL,
[count] [int] NOT NULL,
[price] [float] NOT NULL,
[platform] [char](1) NOT NULL
) ON [PRIMARY]
Every X mins my program gets new transactions for each itemId and I need to update it. My first solution is two step DELETE+INSERT:
delete from Transactions where platform=#platform and itemid=#itemid
insert into Transactions (platform,itemid,dt,count,price) values (#platform,#itemid,#dt,#count,#price)
[...]
insert into Transactions (platform,itemid,dt,count,price) values (#platform,#itemid,#dt,#count,#price)
The problem is, that this DELETE statement takes average 5 seconds. It's much too long.
The second solution I found is to use MERGE. I've created such Stored Procedure, wchich takes Table-valued parameter:
CREATE PROCEDURE [dbo].[sp_updateTransactions]
#Table dbo.tp_Transactions readonly,
#itemId bigint,
#platform char(1)
AS
BEGIN
MERGE Transactions AS TARGET
USING #Table AS SOURCE
ON (
TARGET.[itemId] = SOURCE.[itemId] AND
TARGET.[platform] = SOURCE.[platform] AND
TARGET.[dt] = SOURCE.[dt] AND
TARGET.[count] = SOURCE.[count] AND
TARGET.[price] = SOURCE.[price] )
WHEN NOT MATCHED BY TARGET THEN
INSERT VALUES (SOURCE.[itemId],
SOURCE.[dt],
SOURCE.[count],
SOURCE.[price],
SOURCE.[platform])
WHEN NOT MATCHED BY SOURCE AND TARGET.[itemId] = #itemId AND TARGET.[platform] = #platform THEN
DELETE;
END
This procedure takes around 7 seconds with table with 70k records. So with 8M it would probably take few minutes. The bottleneck is "When not matched" - when I commented this line, this procedure runs on average 0,01 second.
So the question is: how to improve perfomance of the delete statement?
Delete is needed to make sure, that table doesn't contains transaction that as been removed in application. But it real scenario it happens really rarely, ane the true need of deleting records is less than 1 on 10000 transaction updates.
My theoretical workaround is to create additional column like "transactionDeleted bit" and use UPDATE instead of DELETE, ane then make table cleanup by batch job every X minutes or hours and Execute
delete from transactions where transactionDeleted=1
It should be faster, but I would need to update all SELECT statements in other parts of application, to use only transactionDeleted=0 records and so it also may afect application performance.
Do you know any better solution?
UPDATE: Current indexes:
CREATE NONCLUSTERED INDEX [IX1] ON [dbo].[Transactions]
(
[platform] ASC,
[ItemId] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 50) ON [PRIMARY]
CONSTRAINT [IX2] UNIQUE NONCLUSTERED
(
[ItemId] DESC,
[count] ASC,
[dt] DESC,
[platform] ASC,
[price] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
OK, here is another approach also. For a similar problem (large scan WHEN NOT MATCHED BY SOURCE then DELETE) I reduced the MERGE execute time from 806ms to 6ms!
One issue with the problem above is that the "WHEN NOT MATCHED BY SOURCE" clause is scanning the whole TARGET table.
It is not that obvious but Microsoft allows the TARGET table to be filtered (by using a CTE) BEFORE doing the merge. So in my case the TARGET rows were reduced from 250K to less than 10 rows. BIG difference.
Assuming that the above problem works with the TARGET being filtered by #itemid and #platform then the MERGE code would look like this. The changes above to the indexes would help this logic too.
WITH Transactions_CTE (itemId
,dt
,count
,price
,platform
)
AS
-- Define the CTE query that will reduce the size of the TARGET table.
(
SELECT itemId
,dt
,count
,price
,platform
FROM Transactions
WHERE itemId = #itemId
AND platform = #platform
)
MERGE Transactions_CTE AS TARGET
USING #Table AS SOURCE
ON (
TARGET.[itemId] = SOURCE.[itemId]
AND TARGET.[platform] = SOURCE.[platform]
AND TARGET.[dt] = SOURCE.[dt]
AND TARGET.[count] = SOURCE.[count]
AND TARGET.[price] = SOURCE.[price]
)
WHEN NOT MATCHED BY TARGET THEN
INSERT
VALUES (
SOURCE.[itemId]
,SOURCE.[dt]
,SOURCE.[count]
,SOURCE.[price]
,SOURCE.[platform]
)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
Using a BIT field for IsDeleted (or IsActive as many people do) is valid but it does require modifying all code plus creating a separate SQL Job to periodically come through and remove the "deleted" records. This might be the way to go but there is something less intrusive to try first.
I noticed in your set of 2 indexes that neither is CLUSTERED. Can I assume that the IDENTITY field is? You might consider making the [IX2] UNIQUE index the CLUSTERED one and changing the PK (again, I assume the IDENTITY field is a CLUSTERED PK) to be NONCLUSTERED. I would also reorder the IX2 fields to put [Platform] and [ItemID] first. Since your main operation is looking for [Platform] and [ItemID] as a set, physically ordering them this way might help. And since this index is unique, that is a good candidate for being CLUSTERED. It is certainly worth testing as this will impact all queries against the table.
Also, if changing the indexes as I have suggested helps, it still might be worth trying both ideas and hence doing the IsDeleted field as well to see if that increases performance even more.
EDIT:
I forgot to mention, by making the IX2 index CLUSTERED and moving the [Platform] field to the top, you should get rid of the IX1 index.
EDIT2:
Just to be very clear, I am suggesting something like:
CREATE UNIQUE CLUSTERED INDEX [IX2]
(
[ItemId] DESC,
[platform] ASC,
[count] ASC,
[dt] DESC,
[price] ASC
)
And to be fair, changing which index is CLUSTERED could also negatively impact queries where JOINs are done on the [id] field which is why you need to test thoroughly. In the end you need to tune the system for your most frequent and/or expensive queries and might have to accept that some queries will be slower as a result but that might be worth this operation being much faster.
See this https://stackoverflow.com/questions/3685141/how-to-....
would the update be the same cost as a delete? No. The update would be
a much lighter operation, especially if you had an index on the PK
(errrr, that's a guid, not an int). The point being that an update to
a bit field is much less expensive. A (mass) delete would force a
reshuffle of the data.
In light of this information, your idea to use a bit field is very valid.