Why would a temp table make this query so much faster?

Why would a temp table make this query so much faster? - sql-server

While trying to dissect a SQL Server stored proc that's been running slow, we found that simply using a temp table instead of a real table had a drastic impact on performance. The table we're swapping out (ds_location) only has 173 rows:
This query will run complete in 1 second:
IF OBJECT_ID('tempdb..#Location') IS NOT NULL DROP TABLE #Location
SELECT * INTO #Location FROM ds_location
SELECT COUNT(*)
FROM wip_cubs_hc m
INNER JOIN ds_scenario sc ON sc.Scenario = m.Scenario
INNER JOIN ds_period pe ON pe.Period = m.ReportingPeriod
INNER JOIN #Location l ON l.Location = m.Sh_Location
Compare that to the original, which takes 7 seconds:
SELECT COUNT(*)
FROM wip_cubs_hc m
INNER JOIN ds_scenario sc ON sc.Scenario = m.Scenario
INNER JOIN ds_period pe ON pe.Period = m.ReportingPeriod
INNER JOIN ds_location l ON l.Location = m.Sh_Location
Here's the definition of wip_cubs_hc. It contains 1.7 million rows:
CREATE TABLE wip_cubs_hc(
Scenario varchar(16) NOT NULL,
ReportingPeriod varchar(50) NOT NULL,
Sh_Location varchar(50) NOT NULL,
Department varchar(50) NOT NULL,
ProductName varchar(75) NOT NULL,
Account varchar(50) NOT NULL,
Balance varchar(50) NOT NULL,
Source varchar(50) NOT NULL,
Data numeric(18, 6) NOT NULL,
CONSTRAINT PK_wip_cubs_hc PRIMARY KEY CLUSTERED
(
Scenario ASC,
ReportingPeriod ASC,
Sh_Location ASC,
Department ASC,
ProductName ASC,
Account ASC,
Balance ASC,
Source ASC
)
)
CREATE NONCLUSTERED INDEX IX_wip_cubs_hc_Balance
ON [dbo].[wip_cubs_hc] ([Scenario],[Sh_Location],[Department],[Balance])
INCLUDE ([ReportingPeriod],[ProductName],[Account],[Source])
I'd love to know HOW to determine what's causing the slowdown, too.

I can answer the "How to determine the slowdown" question...
Take a look at the execution plan of both queries. You do this by going to the "Query" menu > "Display Estimated Execution Plan". The default keyboard shortcut is Ctrl+L. You can see the plan for multiple queries at once as well. Look at the type of operation being done. What you want to see are things like Index Seek instead of Index Scan, etc.
This article explains some of the other things to look for.
Without knowing the schema/indexes of all the tables involved, this is where I would suggest starting.
Best of Luck!

Related

UPDATE with SELECT leads to deadlock

I have a very simple update statement within one job step:
UPDATE [Table]
SET
[Flag] = 1
WHERE [ID] = (SELECT MAX([ID]) FROM [Table] WHERE [Name] = 'DEV')
Normally there are no issues with this code, but sometimes it ends up with the deadlock.
Is it in general possible, that such stand-alone piece of code leads to a deadlock?
Table schema:
CREATE TABLE [Table]
(
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[Name] [varchar](100) NOT NULL,
[Flag] [bit] NULL,
CONSTRAINT [Table_ID] PRIMARY KEY CLUSTERED
)

The deadlock cause is quite obvious: there is no index on Name, so it's going to scan the whole table for the subquery. There is also no UPDLOCK hint on it, so that is also going to make deadlocks more likely.
Create an index on Name
CREATE NONCLUSTERED INDEX IX_Name ON [Table] (Name) INCLUDE (ID);
And make sure you use UPDLOCK on the subquery
UPDATE [Table]
SET Flag = 1
WHERE ID = (
SELECT MAX(ID)
FROM [Table] t2 WITH (UPDLOCK)
WHERE t2.Name = 'DEV')
This query is much more efficiently written without a self-join, like this:
UPDATE t
SET Flag = 1
FROM (
SELECT TOP (1)
*
FROM [Table] t
WHERE t.Name = 'DEV'
ORDER BY ID DESC
) t;
Even though the optimizer can often transform into this version, it's better to just write it like this anyway.
This version does not need a UPDLOCK, it will be added automatically. You still need the above index though.
db<>fiddle

SQL Server query optimizer performing an unnecessary join

I was wondering if someone could shed some light on why SQL Server (2016 RTM in my case, but I suspect this is not version-specific) is performing this seemingly unnecessary INNER JOIN.
Consider the following two tables joined by a foreign key:
CREATE TABLE [dbo].[batches](
[Id] [smallint] IDENTITY(1,1) PRIMARY KEY,
[Date] [date] NOT NULL,
[Run] [tinyint] NOT NULL,
[Clean] [bit] NOT NULL)
CREATE TABLE [dbo].[batch_values](
[Batch_Id] [smallint] NOT NULL,
[Key] [int] NOT NULL,
[Value] [int] NOT NULL,
CONSTRAINT [PK_batch_values] PRIMARY KEY CLUSTERED
( [Batch_Id] ASC, [Key] ASC))
GO
ALTER TABLE [dbo].[batch_values] WITH CHECK
ADD CONSTRAINT [FK_batch_values_batches] FOREIGN KEY([Batch_Id])
REFERENCES [dbo].[batches] ([Id])
GO
ALTER TABLE [dbo].[batch_values] CHECK CONSTRAINT [FK_batch_values_batches]
GO
Populate the tables with some data:
SET NOCOUNT ON;
DECLARE
#BatchCount int,
#BatchId smallint,
#KeyCount int;
SET #BatchCount = 1;
WHILE #BatchCount <= 100
BEGIN
INSERT INTO dbo.[batches]
VALUES (DATEADD(dd, #BatchCount / 10, '2016-01-01'), #BatchCount % 10, #BatchCount % 2);
SET #BatchId = SCOPE_IDENTITY();
SET #KeyCount = 1;
WHILE #KeyCount <= 1000
BEGIN
INSERT INTO dbo.batch_values
VALUES (#BatchId, #KeyCount, RAND() * 1000000 - 500000);
SET #KeyCount = #KeyCount + 1;
END;
SET #BatchCount = #BatchCount + 1;
END;
Now, if I run the following query the execution plan shows that the SQL Server is performing the INNER JOIN to the [batches] table, even though no columns are selected from it, and no records could be dropped from [batch_values] as a result of the join due to the foreign key constraint.
screenshot of query and execution plan
It seems to me that Query Optimizer should discard the INNER JOIN as unnecessary and simply do a primary key seek on [batch_values], but it doesn't.
This is material because if I develop views that join multiple tables to present a "bigger picture" of the underlying data for ease of use, when querying those views I will be taking a performance hit.

There are many limitations to use JOIN ELIMINATION by SQL Optimizer
E.g. if you use multiple columns in the foreign key, or constraint is not trusted, or marked as 'not for replication', etc.
SQL Server may not use JOIN ELIMINATION if you specify WHERE predicate with the column in foreign key.
Remove WHERE or remove "Batch_id = 100" from WHERE, and you should see the Optimizer now uses JOIN ELIMINATION
The documentation is limited on this topic, so I can't provide a proof link, but many people reported this issue in the past 5-7 years for different versions and agreed that behaviour was by design. My recommendation is to raise an incident with MS and ask them directly about it if it is critical for your system.

Slow Performance when ORDER BY in SQL Server

I'm working on a project (Microsoft SQL Server 2012) in which I do need to store quite some data.
Currently my table does contains 1441352 records in total.
The structure of the table is as follows:
RecordIdentifier (int, not null)
GlnCode (PK, nvarchar(100), not null)
Description (nvarchar(MAX), not null)
VendorId (nvarchar(100), not null)
VendorName (nvarchar(100), not null)
ItemNumber (PK, nvarchar(100), not null)
ItemUOM (PK, nvarchar(128), not null)
My table is indexed on the following fields:
NonClustered - GlnCode, Ascending
NonClustered - ItemNumber, Ascending
NonClustered - ItemUOM, Ascending
NonClustered - VendorID, Ascending
Clustered - Unique (The above 4 columns together).
Now, when I'm writing an API to return the records in the table.
The API exposes methods and it's executing this query:
SELECT TOP (51)
[GlnCode] AS [GlnCode],
[VendorId] AS [VendorId],
[ItemNumber] AS [ItemNumber],
[ItemUOM] AS [ItemUOM],
[RecordIdentitifer] AS [RecordIdentitifer],
[Description] AS [Description],
[VendorName] AS [VendorName]
FROM [dbo].[T_GENERIC_ARTICLE]
If I look at the performance, this is good.
But, this doesn't guarantee me to return always the same set, so I need to apply an ORDER BY clause, meaning the query being executed looks like this:
SELECT TOP (51)
[GlnCode] AS [GlnCode],
[VendorId] AS [VendorId],
[ItemNumber] AS [ItemNumber],
[ItemUOM] AS [ItemUOM],
[RecordIdentitifer] AS [RecordIdentitifer],
[Description] AS [Description],
[VendorName] AS [VendorName]
FROM [dbo].[T_GENERIC_ARTICLE]
ORDER BY [GlnCode] ASC, [ItemNumber] ASC, [ItemUOM] ASC, [VendorId] ASC
Now, the query takes a few seconds to return, which I can't afford.
Anyone has any idea on how to solve this issue?

Your table index definitions are not optimal. You also don't have to created the additional individual indexes because they are covered by the Non Clustered Index. You will have better performance when structuring your indexes as follows:
Table definition:
CREATE TABLE [dbo].[T_GENERIC_ARTICLE]
(
RecordIdentifier int IDENTITY(1,1) PRIMARY KEY NOT NULL,
GlnCode nvarchar(100) NOT NULL,
Description nvarchar(MAX) NOT NULL,
VendorId nvarchar(100) NOT NULL,
VendorName nvarchar(100) NOT NULL,
ItemNumber nvarchar(100) NOT NULL,
ItemUOM nvarchar(128) NOT NULL
)
GO
CREATE UNIQUE NONCLUSTERED INDEX [UniqueNonClusteredIndex-Composite2]
ON [dbo].[T_GENERIC_ARTICLE](GlnCode, ItemNumber,ItemUOM,VendorId ASC);
GO
Revised Query
SELECT TOP (51)
[RecordIdentifier] AS [RecordIdentitifer],
[GlnCode] AS [GlnCode],
[VendorId] AS [VendorId],
[ItemNumber] AS [ItemNumber],
[ItemUOM] AS [ItemUOM],
[Description] AS [Description],
[VendorName] AS [VendorName]
FROM [dbo].[T_GENERIC_ARTICLE]
ORDER BY [GlnCode], [ItemNumber], [ItemUOM], [VendorId]
First a key lookup will be performed on the Primary Key and then a Non Clustered Index Scan. This is where you want the majority of the work to be done.
Reference:
Indexes in SQL Server
Hope This helps

How to improve large table paging speed in SQL Server 2008

I have a table called Users with 10 million records in it. This is the table structure:
CREATE TABLE [dbo].[Users](
[UsersID] [int] IDENTITY(100000,1) NOT NULL,
[LoginUsersName] [nvarchar](50) NOT NULL,
[LoginUsersPwd] [nvarchar](50) NOT NULL,
[Email] [nvarchar](80) NOT NULL,
[IsEnable] [int] NOT NULL,
[CreateTime] [datetime] NOT NULL,
[LastLoginTime] [datetime] NOT NULL,
[LastLoginIp] [nvarchar](50) NOT NULL,
[UpdateTime] [datetime] NOT NULL,
CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED
(
[UsersID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
I have a nonclustered index on the UpdateTime column.
The paging sql:
;WITH UserCTE AS (
SELECT * FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY UpdateTime DESC) AS row,UsersID as rec_id -- select primary key only
FROM
dbo.Users WITH (NOLOCK)
) A WHERE row BETWEEN 9700000 AND 9700020
)
SELECT
*
FROM
dbo.Users WITH (NOLOCK) WHERE UsersID IN (SELECT UserCTE.rec_id FROM UserCTE)
The query above:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 3 ms.
(21 row(s) affected)
SQL Server Execution Times:
CPU time = 2574 ms, elapsed time = 3549 ms.
Anyone give me some suggests about how to improve paging speed will appreciate. Thanks!

That looks about as good as it is going to get without changing the way it works or doing some sort of pre-calculation.
The index used to locate the UserIds on the page is as narrow as it can be (the leaf pages will contain just the UpdateTime and the clustered index key of UsersID. You could make the index slightly narrower by changing to datetime2 but this won't make a significant difference. Also you could check that this index doesn't have excessive fragmentation.
If you had an indexed sequential integer column of UpdateTimeOrder then you could just do
SELECT *
FROM dbo.Users
WHERE UpdateTimeOrder BETWEEN 9700000 AND 9700020
But maintaining such a column along with concurrent INSERTS/UPDATES/DELETES will be difficult. One easier but less effective precalculation would be to create an indexed view.
CREATE VIEW dbo.UserCount
WITH SCHEMABINDING
AS
SELECT COUNT_BIG(*) AS Count
FROM [dbo].[Users]
GO
CREATE UNIQUE CLUSTERED INDEX IX ON dbo.UserCount(Count)
Then retrieve the pre-calculated count and call a different query with ROW_NUMBER() OVER (ORDER BY UpdateTime ASC) if looking for rows more than halfway through the index (and subtracting the original row numbers from the count of course)
But why do you actually need this anyway? Do you actually get people visiting page 485,000?

Please help me with this query (sql server 2008)

ALTER PROCEDURE ReadNews
#CategoryID INT,
#Culture TINYINT = NULL,
#StartDate DATETIME = NULL,
#EndDate DATETIME = NULL,
#Start BIGINT, -- for paging
#Count BIGINT -- for paging
AS
BEGIN
SET NOCOUNT ON;
--ItemType for news is 0
;WITH Paging AS
(
SELECT news.ID,
news.Title,
news.Description,
news.Date,
news.Url,
news.Vote,
news.ResourceTitle,
news.UserID,
ROW_NUMBER() OVER(ORDER BY news.rank DESC) AS RowNumber, TotalCount = COUNT(*) OVER()
FROM dbo.News news
JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
WHERE itemCat.ItemType = 0 -- news item
AND itemCat.CategoryID = #CategoryID
AND (
(#StartDate IS NULL OR news.Date >= #StartDate) AND
(#EndDate IS NULL OR news.Date <= #EndDate)
)
AND news.Culture = #Culture
and news.[status] = 1
)
SELECT * FROM Paging WHERE RowNumber >= #Start AND RowNumber <= (#Start + #Count - 1)
OPTION (OPTIMIZE FOR (#CategoryID UNKNOWN, #Culture UNKNOWN))
END
Here is the structure of News and ItemCategory tables:
CREATE TABLE [dbo].[News](
[ID] [bigint] NOT NULL,
[Url] [varchar](300) NULL,
[Title] [nvarchar](300) NULL,
[Description] [nvarchar](3000) NULL,
[Date] [datetime] NULL,
[Rank] [smallint] NULL,
[Vote] [smallint] NULL,
[Culture] [tinyint] NULL,
[ResourceTitle] [nvarchar](200) NULL,
[Status] [tinyint] NULL
CONSTRAINT [PK_News] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [ItemCategory](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[ItemID] [bigint] NOT NULL,
[ItemType] [tinyint] NOT NULL,
[CategoryID] [int] NOT NULL,
CONSTRAINT [PK_ItemCategory] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
This query reads news of a specific category (sport, politics, ...).
#Culture parameter specifies the language of news, like 0 (english), 1 (french), etc.
ItemCategory table relates a news record to one or more categories.
ItemType column in ItemCategory table specifies which type of itemID is there. for now, we have only ItemType 0 indicating that ItemID refers to a record in News table.
Currently, I have the following index on ItemCategory table:
CREATE NONCLUSTERED INDEX [IX_ItemCategory_ItemType_CategoryID__ItemID] ON [ItemCategory]
(
[ItemType] ASC,
[CategoryID] ASC
)
INCLUDE ( [ItemID])
and the following index for News table (suggested by query analyzer):
CREATE NONCLUSTERED INDEX [_dta_index_News_8_1734000549__K1_K7_K13_K15] ON [dbo].[News]
(
[ID] ASC,
[Date] ASC,
[Culture] ASC,
[Status] ASC
)
With these indexes, when I execute the query, the query executes in less than a second for some parameters, and for another parameters (e.g. different #Culture or #CategoryID) may take up to 2 minutes! I have used OPTIMIZE FOR (#CategoryID UNKNOWN, #Culture UNKNOWN) to prevent parameter sniffing for #CategoryID and #Culture parameters but seems not working for some parameters.
There are currently around 2,870,000 records in News table and 4,740,000 in ItemCategory table.
Now I greatly appreciate any advice on how to optimize this query or its indexes.
update:
execution plan: (in this image, ItemNetwork is what I referred to as ItemCategory. they are the same)

Have you had a look at some of the inbuilt SQL tools to help you with this:
I.e. from the management studio menu:
'Query'->'Display Estimated Execution Plan'
'Query'->'Include Actual Execution Plan'
'Tools'->'Database Engine Tuning Advisor'

Shouldn't the OPTION OPTIMIZE clause be part of the inner SQL, rather than of the SELECT on the CTE?

You should look at indexing the culture field in the news table, and the itemid and categoryid fields in the item category table. You may not need all these indexes - I would try them one at a time, then in combination until you find something that works. Your existing indexes do not seem to help your query very much.

Really need to see the query plan - one thing of note is you put the clustered index for News on News.ID, but it is not an identity field but the FK for the ItemCategory table, this will result in some fragmentation on the news table over time, so it less than ideal.
I suspect the underlying problem is your paging is causing the table to scan.
Updated:
Those Sort's are costing you 68% of the query execution time from the plan, and that makes sense, one of those sorts at least must be to support the ranking function you are using that is based on news.rank desc, but you have no index that can support that ranking natively.
Getting an index in to support that will be interesting, you can try a simple NC index on news.rank first off, SQL may chose to join indexes and avoid the sort, but it will take some experimentation.

Try using for ItemCategory table nonclustered index on itemId,categoryId and on News table also nonclustered index on Rank,Culture.

I have finally come up with the following indexes which are working great and the stored procedure executes in less than a second. I have just removed TotalCount = COUNT(*) OVER() from the query and I couldn't find any good index for that. Maybe I write a separate stored procedure to calculate the total number of records. I may even decide to use a "more" button like in Twitter and Facebook without pagination buttons.
for news table:
CREATE NONCLUSTERED INDEX [IX_News_Rank_Culture_Status_Date] ON [dbo].[News]
(
[Rank] DESC,
[Culture] ASC,
[Status] ASC,
[Date] ASC
)
for ItemNetwork table:
CREATE NONCLUSTERED INDEX [IX_ItemNetwork_ItemID_NetworkID] ON ItemNetwork
(
[ItemID] ASC,
[NetworkID] ASC
)
I just don't know whether ItemNetwork needs a clustered index on ID column. I am never retrieving a record from this table using the ID column. Do you think it's better to have a clustered index on (ItemID, NetworkID) columns?

Please try to change
FROM dbo.News news
JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
to
FROM dbo.News news
HASH JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
or
FROM dbo.News news
LOOP JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
I don't really know what is in your data, but the joining of this tables may be a bottleneck.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Why would a temp table make this query so much faster? - sql-server

Related

UPDATE with SELECT leads to deadlock

SQL Server query optimizer performing an unnecessary join

Slow Performance when ORDER BY in SQL Server

How to improve large table paging speed in SQL Server 2008

Please help me with this query (sql server 2008)

Categories

Resources