Slow Performance when ORDER BY in SQL Server - sql-server

I'm working on a project (Microsoft SQL Server 2012) in which I do need to store quite some data.
Currently my table does contains 1441352 records in total.
The structure of the table is as follows:
RecordIdentifier (int, not null)
GlnCode (PK, nvarchar(100), not null)
Description (nvarchar(MAX), not null)
VendorId (nvarchar(100), not null)
VendorName (nvarchar(100), not null)
ItemNumber (PK, nvarchar(100), not null)
ItemUOM (PK, nvarchar(128), not null)
My table is indexed on the following fields:
NonClustered - GlnCode, Ascending
NonClustered - ItemNumber, Ascending
NonClustered - ItemUOM, Ascending
NonClustered - VendorID, Ascending
Clustered - Unique (The above 4 columns together).
Now, when I'm writing an API to return the records in the table.
The API exposes methods and it's executing this query:
SELECT TOP (51)
[GlnCode] AS [GlnCode],
[VendorId] AS [VendorId],
[ItemNumber] AS [ItemNumber],
[ItemUOM] AS [ItemUOM],
[RecordIdentitifer] AS [RecordIdentitifer],
[Description] AS [Description],
[VendorName] AS [VendorName]
FROM [dbo].[T_GENERIC_ARTICLE]
If I look at the performance, this is good.
But, this doesn't guarantee me to return always the same set, so I need to apply an ORDER BY clause, meaning the query being executed looks like this:
SELECT TOP (51)
[GlnCode] AS [GlnCode],
[VendorId] AS [VendorId],
[ItemNumber] AS [ItemNumber],
[ItemUOM] AS [ItemUOM],
[RecordIdentitifer] AS [RecordIdentitifer],
[Description] AS [Description],
[VendorName] AS [VendorName]
FROM [dbo].[T_GENERIC_ARTICLE]
ORDER BY [GlnCode] ASC, [ItemNumber] ASC, [ItemUOM] ASC, [VendorId] ASC
Now, the query takes a few seconds to return, which I can't afford.
Anyone has any idea on how to solve this issue?

Your table index definitions are not optimal. You also don't have to created the additional individual indexes because they are covered by the Non Clustered Index. You will have better performance when structuring your indexes as follows:
Table definition:
CREATE TABLE [dbo].[T_GENERIC_ARTICLE]
(
RecordIdentifier int IDENTITY(1,1) PRIMARY KEY NOT NULL,
GlnCode nvarchar(100) NOT NULL,
Description nvarchar(MAX) NOT NULL,
VendorId nvarchar(100) NOT NULL,
VendorName nvarchar(100) NOT NULL,
ItemNumber nvarchar(100) NOT NULL,
ItemUOM nvarchar(128) NOT NULL
)
GO
CREATE UNIQUE NONCLUSTERED INDEX [UniqueNonClusteredIndex-Composite2]
ON [dbo].[T_GENERIC_ARTICLE](GlnCode, ItemNumber,ItemUOM,VendorId ASC);
GO
Revised Query
SELECT TOP (51)
[RecordIdentifier] AS [RecordIdentitifer],
[GlnCode] AS [GlnCode],
[VendorId] AS [VendorId],
[ItemNumber] AS [ItemNumber],
[ItemUOM] AS [ItemUOM],
[Description] AS [Description],
[VendorName] AS [VendorName]
FROM [dbo].[T_GENERIC_ARTICLE]
ORDER BY [GlnCode], [ItemNumber], [ItemUOM], [VendorId]
First a key lookup will be performed on the Primary Key and then a Non Clustered Index Scan. This is where you want the majority of the work to be done.
Reference:
Indexes in SQL Server
Hope This helps

Related

T-SQL compound index sufficient for query on subset of columns?

Is a compound index sufficient for queries against a subset of columns ?
CREATE TABLE [FILE_STATUS_HISTORY]
(
[FILE_ID] [INT] NOT NULL,
[STATUS_ID] [INT] NOT NULL,
[TIMESTAMP_UTC] [DATETIME] NOT NULL,
CONSTRAINT [PK_FILE_STATUS_HISTORY]
PRIMARY KEY CLUSTERED ([FILE_ID] ASC, [STATUS_ID] ASC)
) ON [PRIMARY]
CREATE UNIQUE NONCLUSTERED INDEX [IX_FILE_STATUS_HISTORY]
ON [FILE_STATUS_HISTORY] ([FILE_ID] ASC,
[STATUS_ID] ASC,
[TIMESTAMP_UTC] ASC) ON [PRIMARY]
GO
SELECT TOP (1) *
FROM [FILE_STATUS_HISTORY]
WHERE [FILE_ID] = 382748
ORDER BY [TIMESTAMP_UTC] DESC
A composite index on ( File_Id, Timestamp_UTC desc ) should optimize handling the where and top/order by clauses. The actual execution plan will show whether the query optimizer agrees.
A covering index would also have Status_Id as an included column so that the index could satisfy the entire query in a single lookup.

Why would a temp table make this query so much faster?

While trying to dissect a SQL Server stored proc that's been running slow, we found that simply using a temp table instead of a real table had a drastic impact on performance. The table we're swapping out (ds_location) only has 173 rows:
This query will run complete in 1 second:
IF OBJECT_ID('tempdb..#Location') IS NOT NULL DROP TABLE #Location
SELECT * INTO #Location FROM ds_location
SELECT COUNT(*)
FROM wip_cubs_hc m
INNER JOIN ds_scenario sc ON sc.Scenario = m.Scenario
INNER JOIN ds_period pe ON pe.Period = m.ReportingPeriod
INNER JOIN #Location l ON l.Location = m.Sh_Location
Compare that to the original, which takes 7 seconds:
SELECT COUNT(*)
FROM wip_cubs_hc m
INNER JOIN ds_scenario sc ON sc.Scenario = m.Scenario
INNER JOIN ds_period pe ON pe.Period = m.ReportingPeriod
INNER JOIN ds_location l ON l.Location = m.Sh_Location
Here's the definition of wip_cubs_hc. It contains 1.7 million rows:
CREATE TABLE wip_cubs_hc(
Scenario varchar(16) NOT NULL,
ReportingPeriod varchar(50) NOT NULL,
Sh_Location varchar(50) NOT NULL,
Department varchar(50) NOT NULL,
ProductName varchar(75) NOT NULL,
Account varchar(50) NOT NULL,
Balance varchar(50) NOT NULL,
Source varchar(50) NOT NULL,
Data numeric(18, 6) NOT NULL,
CONSTRAINT PK_wip_cubs_hc PRIMARY KEY CLUSTERED
(
Scenario ASC,
ReportingPeriod ASC,
Sh_Location ASC,
Department ASC,
ProductName ASC,
Account ASC,
Balance ASC,
Source ASC
)
)
CREATE NONCLUSTERED INDEX IX_wip_cubs_hc_Balance
ON [dbo].[wip_cubs_hc] ([Scenario],[Sh_Location],[Department],[Balance])
INCLUDE ([ReportingPeriod],[ProductName],[Account],[Source])
I'd love to know HOW to determine what's causing the slowdown, too.
I can answer the "How to determine the slowdown" question...
Take a look at the execution plan of both queries. You do this by going to the "Query" menu > "Display Estimated Execution Plan". The default keyboard shortcut is Ctrl+L. You can see the plan for multiple queries at once as well. Look at the type of operation being done. What you want to see are things like Index Seek instead of Index Scan, etc.
This article explains some of the other things to look for.
Without knowing the schema/indexes of all the tables involved, this is where I would suggest starting.
Best of Luck!

How to improve my query performance by indexing

i just want to know how will i index the this table for optimal performance? This will potentially hold around 20M rows.
CREATE TABLE [dbo].[Table1](
[ID] [bigint] NOT NULL,
[Col1] [varchar](100) NULL,
[Col2] [varchar](100) NULL,
[Description] [varchar](100) NULL
) ON [PRIMARY]
Basically, this table will be queried ONLY in this manner.
SELECT ID FROM Table1
WHERE Col1 = 'exactVal1' AND Col2 = 'exactVal2' AND [Description] = 'exactDesc'
This is what i did:
CREATE NONCLUSTERED INDEX IX_ID
ON Table1(ID)
GO
CREATE NONCLUSTERED INDEX IX_Col1
ON Table1(Col1)
GO
CREATE NONCLUSTERED INDEX IX_Col2
ON Table1(Col2)
GO
CREATE NONCLUSTERED INDEX IX_ValueDescription
ON Table1(ValueDescription)
GO
Am i right to index all these columns? Not really that confident yet. Just new to SQL stuff, please let me know if im on the right track.
Again, a lot of data will be put on this table. Unfortunately, i cannot test the performance yet since there are no available data. But I will soon be generating some dummy data to test the performance. But it would be great if there is already another option(suggestion) available that i can compare the results with.
Thanks,
jack
I would combine these indexes into one index, instead of having three separate indexes. For example:
CREATE INDEX ix_cols ON dbo.Table1 (Col1, Col2, Description)
If this combination of columns is unique within the table, then you should add the UNIQUE keyword to make the index unique. This is for performance reasons, but, also, more importantly, to enforce uniqueness. It may also be created as a primary key if that is appropriate.
Placing all of the columns into one index will give better performance because it will not be necessary for SQL Server to use multiple passes to find the row you are seeking.
Try this -
CREATE TABLE dbo.Table1
(
ID BIGINT NOT NULL
, Col1 VARCHAR(100) NULL
, Col2 VARCHAR(100) NULL
, [Description] VARCHAR(100) NULL
)
GO
CREATE CLUSTERED INDEX IX_Table1 ON dbo.Table1
(
Col1
, Col2
, [Description]
)
Or this -
CREATE TABLE dbo.Table1
(
ID BIGINT PRIMARY KEY NOT NULL
, Col1 VARCHAR(100) NULL
, Col2 VARCHAR(100) NULL
, [Description] VARCHAR(100) NULL
)
GO
CREATE UNIQUE NONCLUSTERED INDEX IX_Table1 ON dbo.Table1
(
Col1
, Col2
, [Description]
)

TSQL How to select employee with skills in xml column

In a table schema like below
CREATE TABLE [dbo].[Employee](
[EmployeeId] [uniqueidentifier] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Location] [nvarchar](50) NOT NULL,
[Skills] [xml] NOT NULL
CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED
How would i get Employees having C#(case insensitive) programming skills assuming the xml saved in the Skills columns is as below.
Could you advice on other functions would help me filter, sort when using xml data type columns
<Skills><Skill>C#</Skill><Skill>ASP.NET</Skill><Skill>VB.NET</Skill></Skills>
The comparison is case sensitive so you need to compare against both c# and C#. In SQL Server 2008 you can use upper-case.
declare #T table
(
ID int identity,
Skills XML
)
insert into #T values
('<Skills><Skill>C#</Skill><Skill>ASP.NET</Skill><Skill>VB.NET</Skill></Skills>')
insert into #T values
('<Skills><Skill>CB.NET</Skill><Skill>ASP.NET</Skill><Skill>c#</Skill></Skills>')
insert into #T values
('<Skills><Skill>F#</Skill><Skill>ASP.NET</Skill><Skill>VB.NET</Skill></Skills>')
select ID
from #T
where Skills.exist('/Skills/Skill[contains(., "C#") or contains(., "c#")]') = 1
Result:
ID
-----------
1
2
Update:
This will also work.
select T.ID
from #T as T
cross apply T.Skills.nodes('/Skills/Skill') as X(N)
where X.N.value('.', 'nvarchar(50)') like '%C#%'

Please help me with this query (sql server 2008)

ALTER PROCEDURE ReadNews
#CategoryID INT,
#Culture TINYINT = NULL,
#StartDate DATETIME = NULL,
#EndDate DATETIME = NULL,
#Start BIGINT, -- for paging
#Count BIGINT -- for paging
AS
BEGIN
SET NOCOUNT ON;
--ItemType for news is 0
;WITH Paging AS
(
SELECT news.ID,
news.Title,
news.Description,
news.Date,
news.Url,
news.Vote,
news.ResourceTitle,
news.UserID,
ROW_NUMBER() OVER(ORDER BY news.rank DESC) AS RowNumber, TotalCount = COUNT(*) OVER()
FROM dbo.News news
JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
WHERE itemCat.ItemType = 0 -- news item
AND itemCat.CategoryID = #CategoryID
AND (
(#StartDate IS NULL OR news.Date >= #StartDate) AND
(#EndDate IS NULL OR news.Date <= #EndDate)
)
AND news.Culture = #Culture
and news.[status] = 1
)
SELECT * FROM Paging WHERE RowNumber >= #Start AND RowNumber <= (#Start + #Count - 1)
OPTION (OPTIMIZE FOR (#CategoryID UNKNOWN, #Culture UNKNOWN))
END
Here is the structure of News and ItemCategory tables:
CREATE TABLE [dbo].[News](
[ID] [bigint] NOT NULL,
[Url] [varchar](300) NULL,
[Title] [nvarchar](300) NULL,
[Description] [nvarchar](3000) NULL,
[Date] [datetime] NULL,
[Rank] [smallint] NULL,
[Vote] [smallint] NULL,
[Culture] [tinyint] NULL,
[ResourceTitle] [nvarchar](200) NULL,
[Status] [tinyint] NULL
CONSTRAINT [PK_News] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [ItemCategory](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[ItemID] [bigint] NOT NULL,
[ItemType] [tinyint] NOT NULL,
[CategoryID] [int] NOT NULL,
CONSTRAINT [PK_ItemCategory] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
This query reads news of a specific category (sport, politics, ...).
#Culture parameter specifies the language of news, like 0 (english), 1 (french), etc.
ItemCategory table relates a news record to one or more categories.
ItemType column in ItemCategory table specifies which type of itemID is there. for now, we have only ItemType 0 indicating that ItemID refers to a record in News table.
Currently, I have the following index on ItemCategory table:
CREATE NONCLUSTERED INDEX [IX_ItemCategory_ItemType_CategoryID__ItemID] ON [ItemCategory]
(
[ItemType] ASC,
[CategoryID] ASC
)
INCLUDE ( [ItemID])
and the following index for News table (suggested by query analyzer):
CREATE NONCLUSTERED INDEX [_dta_index_News_8_1734000549__K1_K7_K13_K15] ON [dbo].[News]
(
[ID] ASC,
[Date] ASC,
[Culture] ASC,
[Status] ASC
)
With these indexes, when I execute the query, the query executes in less than a second for some parameters, and for another parameters (e.g. different #Culture or #CategoryID) may take up to 2 minutes! I have used OPTIMIZE FOR (#CategoryID UNKNOWN, #Culture UNKNOWN) to prevent parameter sniffing for #CategoryID and #Culture parameters but seems not working for some parameters.
There are currently around 2,870,000 records in News table and 4,740,000 in ItemCategory table.
Now I greatly appreciate any advice on how to optimize this query or its indexes.
update:
execution plan: (in this image, ItemNetwork is what I referred to as ItemCategory. they are the same)
Have you had a look at some of the inbuilt SQL tools to help you with this:
I.e. from the management studio menu:
'Query'->'Display Estimated Execution Plan'
'Query'->'Include Actual Execution Plan'
'Tools'->'Database Engine Tuning Advisor'
Shouldn't the OPTION OPTIMIZE clause be part of the inner SQL, rather than of the SELECT on the CTE?
You should look at indexing the culture field in the news table, and the itemid and categoryid fields in the item category table. You may not need all these indexes - I would try them one at a time, then in combination until you find something that works. Your existing indexes do not seem to help your query very much.
Really need to see the query plan - one thing of note is you put the clustered index for News on News.ID, but it is not an identity field but the FK for the ItemCategory table, this will result in some fragmentation on the news table over time, so it less than ideal.
I suspect the underlying problem is your paging is causing the table to scan.
Updated:
Those Sort's are costing you 68% of the query execution time from the plan, and that makes sense, one of those sorts at least must be to support the ranking function you are using that is based on news.rank desc, but you have no index that can support that ranking natively.
Getting an index in to support that will be interesting, you can try a simple NC index on news.rank first off, SQL may chose to join indexes and avoid the sort, but it will take some experimentation.
Try using for ItemCategory table nonclustered index on itemId,categoryId and on News table also nonclustered index on Rank,Culture.
I have finally come up with the following indexes which are working great and the stored procedure executes in less than a second. I have just removed TotalCount = COUNT(*) OVER() from the query and I couldn't find any good index for that. Maybe I write a separate stored procedure to calculate the total number of records. I may even decide to use a "more" button like in Twitter and Facebook without pagination buttons.
for news table:
CREATE NONCLUSTERED INDEX [IX_News_Rank_Culture_Status_Date] ON [dbo].[News]
(
[Rank] DESC,
[Culture] ASC,
[Status] ASC,
[Date] ASC
)
for ItemNetwork table:
CREATE NONCLUSTERED INDEX [IX_ItemNetwork_ItemID_NetworkID] ON ItemNetwork
(
[ItemID] ASC,
[NetworkID] ASC
)
I just don't know whether ItemNetwork needs a clustered index on ID column. I am never retrieving a record from this table using the ID column. Do you think it's better to have a clustered index on (ItemID, NetworkID) columns?
Please try to change
FROM dbo.News news
JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
to
FROM dbo.News news
HASH JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
or
FROM dbo.News news
LOOP JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
I don't really know what is in your data, but the joining of this tables may be a bottleneck.

Resources