Is a compound index sufficient for queries against a subset of columns ?
CREATE TABLE [FILE_STATUS_HISTORY]
(
[FILE_ID] [INT] NOT NULL,
[STATUS_ID] [INT] NOT NULL,
[TIMESTAMP_UTC] [DATETIME] NOT NULL,
CONSTRAINT [PK_FILE_STATUS_HISTORY]
PRIMARY KEY CLUSTERED ([FILE_ID] ASC, [STATUS_ID] ASC)
) ON [PRIMARY]
CREATE UNIQUE NONCLUSTERED INDEX [IX_FILE_STATUS_HISTORY]
ON [FILE_STATUS_HISTORY] ([FILE_ID] ASC,
[STATUS_ID] ASC,
[TIMESTAMP_UTC] ASC) ON [PRIMARY]
GO
SELECT TOP (1) *
FROM [FILE_STATUS_HISTORY]
WHERE [FILE_ID] = 382748
ORDER BY [TIMESTAMP_UTC] DESC
A composite index on ( File_Id, Timestamp_UTC desc ) should optimize handling the where and top/order by clauses. The actual execution plan will show whether the query optimizer agrees.
A covering index would also have Status_Id as an included column so that the index could satisfy the entire query in a single lookup.
While trying to dissect a SQL Server stored proc that's been running slow, we found that simply using a temp table instead of a real table had a drastic impact on performance. The table we're swapping out (ds_location) only has 173 rows:
This query will run complete in 1 second:
IF OBJECT_ID('tempdb..#Location') IS NOT NULL DROP TABLE #Location
SELECT * INTO #Location FROM ds_location
SELECT COUNT(*)
FROM wip_cubs_hc m
INNER JOIN ds_scenario sc ON sc.Scenario = m.Scenario
INNER JOIN ds_period pe ON pe.Period = m.ReportingPeriod
INNER JOIN #Location l ON l.Location = m.Sh_Location
Compare that to the original, which takes 7 seconds:
SELECT COUNT(*)
FROM wip_cubs_hc m
INNER JOIN ds_scenario sc ON sc.Scenario = m.Scenario
INNER JOIN ds_period pe ON pe.Period = m.ReportingPeriod
INNER JOIN ds_location l ON l.Location = m.Sh_Location
Here's the definition of wip_cubs_hc. It contains 1.7 million rows:
CREATE TABLE wip_cubs_hc(
Scenario varchar(16) NOT NULL,
ReportingPeriod varchar(50) NOT NULL,
Sh_Location varchar(50) NOT NULL,
Department varchar(50) NOT NULL,
ProductName varchar(75) NOT NULL,
Account varchar(50) NOT NULL,
Balance varchar(50) NOT NULL,
Source varchar(50) NOT NULL,
Data numeric(18, 6) NOT NULL,
CONSTRAINT PK_wip_cubs_hc PRIMARY KEY CLUSTERED
(
Scenario ASC,
ReportingPeriod ASC,
Sh_Location ASC,
Department ASC,
ProductName ASC,
Account ASC,
Balance ASC,
Source ASC
)
)
CREATE NONCLUSTERED INDEX IX_wip_cubs_hc_Balance
ON [dbo].[wip_cubs_hc] ([Scenario],[Sh_Location],[Department],[Balance])
INCLUDE ([ReportingPeriod],[ProductName],[Account],[Source])
I'd love to know HOW to determine what's causing the slowdown, too.
I can answer the "How to determine the slowdown" question...
Take a look at the execution plan of both queries. You do this by going to the "Query" menu > "Display Estimated Execution Plan". The default keyboard shortcut is Ctrl+L. You can see the plan for multiple queries at once as well. Look at the type of operation being done. What you want to see are things like Index Seek instead of Index Scan, etc.
This article explains some of the other things to look for.
Without knowing the schema/indexes of all the tables involved, this is where I would suggest starting.
Best of Luck!
i just want to know how will i index the this table for optimal performance? This will potentially hold around 20M rows.
CREATE TABLE [dbo].[Table1](
[ID] [bigint] NOT NULL,
[Col1] [varchar](100) NULL,
[Col2] [varchar](100) NULL,
[Description] [varchar](100) NULL
) ON [PRIMARY]
Basically, this table will be queried ONLY in this manner.
SELECT ID FROM Table1
WHERE Col1 = 'exactVal1' AND Col2 = 'exactVal2' AND [Description] = 'exactDesc'
This is what i did:
CREATE NONCLUSTERED INDEX IX_ID
ON Table1(ID)
GO
CREATE NONCLUSTERED INDEX IX_Col1
ON Table1(Col1)
GO
CREATE NONCLUSTERED INDEX IX_Col2
ON Table1(Col2)
GO
CREATE NONCLUSTERED INDEX IX_ValueDescription
ON Table1(ValueDescription)
GO
Am i right to index all these columns? Not really that confident yet. Just new to SQL stuff, please let me know if im on the right track.
Again, a lot of data will be put on this table. Unfortunately, i cannot test the performance yet since there are no available data. But I will soon be generating some dummy data to test the performance. But it would be great if there is already another option(suggestion) available that i can compare the results with.
Thanks,
jack
I would combine these indexes into one index, instead of having three separate indexes. For example:
CREATE INDEX ix_cols ON dbo.Table1 (Col1, Col2, Description)
If this combination of columns is unique within the table, then you should add the UNIQUE keyword to make the index unique. This is for performance reasons, but, also, more importantly, to enforce uniqueness. It may also be created as a primary key if that is appropriate.
Placing all of the columns into one index will give better performance because it will not be necessary for SQL Server to use multiple passes to find the row you are seeking.
Try this -
CREATE TABLE dbo.Table1
(
ID BIGINT NOT NULL
, Col1 VARCHAR(100) NULL
, Col2 VARCHAR(100) NULL
, [Description] VARCHAR(100) NULL
)
GO
CREATE CLUSTERED INDEX IX_Table1 ON dbo.Table1
(
Col1
, Col2
, [Description]
)
Or this -
CREATE TABLE dbo.Table1
(
ID BIGINT PRIMARY KEY NOT NULL
, Col1 VARCHAR(100) NULL
, Col2 VARCHAR(100) NULL
, [Description] VARCHAR(100) NULL
)
GO
CREATE UNIQUE NONCLUSTERED INDEX IX_Table1 ON dbo.Table1
(
Col1
, Col2
, [Description]
)
In a table schema like below
CREATE TABLE [dbo].[Employee](
[EmployeeId] [uniqueidentifier] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Location] [nvarchar](50) NOT NULL,
[Skills] [xml] NOT NULL
CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED
How would i get Employees having C#(case insensitive) programming skills assuming the xml saved in the Skills columns is as below.
Could you advice on other functions would help me filter, sort when using xml data type columns
<Skills><Skill>C#</Skill><Skill>ASP.NET</Skill><Skill>VB.NET</Skill></Skills>
The comparison is case sensitive so you need to compare against both c# and C#. In SQL Server 2008 you can use upper-case.
declare #T table
(
ID int identity,
Skills XML
)
insert into #T values
('<Skills><Skill>C#</Skill><Skill>ASP.NET</Skill><Skill>VB.NET</Skill></Skills>')
insert into #T values
('<Skills><Skill>CB.NET</Skill><Skill>ASP.NET</Skill><Skill>c#</Skill></Skills>')
insert into #T values
('<Skills><Skill>F#</Skill><Skill>ASP.NET</Skill><Skill>VB.NET</Skill></Skills>')
select ID
from #T
where Skills.exist('/Skills/Skill[contains(., "C#") or contains(., "c#")]') = 1
Result:
ID
-----------
1
2
Update:
This will also work.
select T.ID
from #T as T
cross apply T.Skills.nodes('/Skills/Skill') as X(N)
where X.N.value('.', 'nvarchar(50)') like '%C#%'
ALTER PROCEDURE ReadNews
#CategoryID INT,
#Culture TINYINT = NULL,
#StartDate DATETIME = NULL,
#EndDate DATETIME = NULL,
#Start BIGINT, -- for paging
#Count BIGINT -- for paging
AS
BEGIN
SET NOCOUNT ON;
--ItemType for news is 0
;WITH Paging AS
(
SELECT news.ID,
news.Title,
news.Description,
news.Date,
news.Url,
news.Vote,
news.ResourceTitle,
news.UserID,
ROW_NUMBER() OVER(ORDER BY news.rank DESC) AS RowNumber, TotalCount = COUNT(*) OVER()
FROM dbo.News news
JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
WHERE itemCat.ItemType = 0 -- news item
AND itemCat.CategoryID = #CategoryID
AND (
(#StartDate IS NULL OR news.Date >= #StartDate) AND
(#EndDate IS NULL OR news.Date <= #EndDate)
)
AND news.Culture = #Culture
and news.[status] = 1
)
SELECT * FROM Paging WHERE RowNumber >= #Start AND RowNumber <= (#Start + #Count - 1)
OPTION (OPTIMIZE FOR (#CategoryID UNKNOWN, #Culture UNKNOWN))
END
Here is the structure of News and ItemCategory tables:
CREATE TABLE [dbo].[News](
[ID] [bigint] NOT NULL,
[Url] [varchar](300) NULL,
[Title] [nvarchar](300) NULL,
[Description] [nvarchar](3000) NULL,
[Date] [datetime] NULL,
[Rank] [smallint] NULL,
[Vote] [smallint] NULL,
[Culture] [tinyint] NULL,
[ResourceTitle] [nvarchar](200) NULL,
[Status] [tinyint] NULL
CONSTRAINT [PK_News] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [ItemCategory](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[ItemID] [bigint] NOT NULL,
[ItemType] [tinyint] NOT NULL,
[CategoryID] [int] NOT NULL,
CONSTRAINT [PK_ItemCategory] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
This query reads news of a specific category (sport, politics, ...).
#Culture parameter specifies the language of news, like 0 (english), 1 (french), etc.
ItemCategory table relates a news record to one or more categories.
ItemType column in ItemCategory table specifies which type of itemID is there. for now, we have only ItemType 0 indicating that ItemID refers to a record in News table.
Currently, I have the following index on ItemCategory table:
CREATE NONCLUSTERED INDEX [IX_ItemCategory_ItemType_CategoryID__ItemID] ON [ItemCategory]
(
[ItemType] ASC,
[CategoryID] ASC
)
INCLUDE ( [ItemID])
and the following index for News table (suggested by query analyzer):
CREATE NONCLUSTERED INDEX [_dta_index_News_8_1734000549__K1_K7_K13_K15] ON [dbo].[News]
(
[ID] ASC,
[Date] ASC,
[Culture] ASC,
[Status] ASC
)
With these indexes, when I execute the query, the query executes in less than a second for some parameters, and for another parameters (e.g. different #Culture or #CategoryID) may take up to 2 minutes! I have used OPTIMIZE FOR (#CategoryID UNKNOWN, #Culture UNKNOWN) to prevent parameter sniffing for #CategoryID and #Culture parameters but seems not working for some parameters.
There are currently around 2,870,000 records in News table and 4,740,000 in ItemCategory table.
Now I greatly appreciate any advice on how to optimize this query or its indexes.
update:
execution plan: (in this image, ItemNetwork is what I referred to as ItemCategory. they are the same)
Have you had a look at some of the inbuilt SQL tools to help you with this:
I.e. from the management studio menu:
'Query'->'Display Estimated Execution Plan'
'Query'->'Include Actual Execution Plan'
'Tools'->'Database Engine Tuning Advisor'
Shouldn't the OPTION OPTIMIZE clause be part of the inner SQL, rather than of the SELECT on the CTE?
You should look at indexing the culture field in the news table, and the itemid and categoryid fields in the item category table. You may not need all these indexes - I would try them one at a time, then in combination until you find something that works. Your existing indexes do not seem to help your query very much.
Really need to see the query plan - one thing of note is you put the clustered index for News on News.ID, but it is not an identity field but the FK for the ItemCategory table, this will result in some fragmentation on the news table over time, so it less than ideal.
I suspect the underlying problem is your paging is causing the table to scan.
Updated:
Those Sort's are costing you 68% of the query execution time from the plan, and that makes sense, one of those sorts at least must be to support the ranking function you are using that is based on news.rank desc, but you have no index that can support that ranking natively.
Getting an index in to support that will be interesting, you can try a simple NC index on news.rank first off, SQL may chose to join indexes and avoid the sort, but it will take some experimentation.
Try using for ItemCategory table nonclustered index on itemId,categoryId and on News table also nonclustered index on Rank,Culture.
I have finally come up with the following indexes which are working great and the stored procedure executes in less than a second. I have just removed TotalCount = COUNT(*) OVER() from the query and I couldn't find any good index for that. Maybe I write a separate stored procedure to calculate the total number of records. I may even decide to use a "more" button like in Twitter and Facebook without pagination buttons.
for news table:
CREATE NONCLUSTERED INDEX [IX_News_Rank_Culture_Status_Date] ON [dbo].[News]
(
[Rank] DESC,
[Culture] ASC,
[Status] ASC,
[Date] ASC
)
for ItemNetwork table:
CREATE NONCLUSTERED INDEX [IX_ItemNetwork_ItemID_NetworkID] ON ItemNetwork
(
[ItemID] ASC,
[NetworkID] ASC
)
I just don't know whether ItemNetwork needs a clustered index on ID column. I am never retrieving a record from this table using the ID column. Do you think it's better to have a clustered index on (ItemID, NetworkID) columns?
Please try to change
FROM dbo.News news
JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
to
FROM dbo.News news
HASH JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
or
FROM dbo.News news
LOOP JOIN ItemCategory itemCat ON itemCat.ItemID = news.ID
I don't really know what is in your data, but the joining of this tables may be a bottleneck.