SQL Server 2012 row_number ASC DESC performance - sql-server

In a SQL Server 2012 version 11.0.5058 I've a query like this
SELECT TOP 30
row_number() OVER (ORDER BY SequentialNumber ASC) AS [row_number],
o.Oid, StopAzioni
FROM
tmpTestPerf O
INNER JOIN
Stati s on O.Stato = s.Oid
WHERE
StopAzioni = 0
When I use ORDER BY SequentialNumber ASC it takes 400 ms
When I use ORDER BY DESC in the row_number function it takes only 2 ms
(This is in a test environment, in production it is 7000, 7 seconds vs 15 ms!)
Analyzing the execution plan, I found that it's the same for both queries. The interesting difference is that in the slower it works with all the rows filtered by the stopazioni = 0 condition, 117k rows
In the faster it only uses 53 rows
There are a primary key on the tmpTestPerf query and an indexed ASC key on the sequential number column.
How it could be explained?
Regards.
Daniele
This is the script of the tmpTestPerfQuery and Stati query with their indexes
CREATE TABLE [dbo].[tmpTestPerf]
(
[Oid] [uniqueidentifier] NOT NULL,
[SequentialNumber] [bigint] NOT NULL,
[Anagrafica] [uniqueidentifier] NULL,
[Stato] [uniqueidentifier] NULL,
CONSTRAINT [PK_tmpTestPerf]
PRIMARY KEY CLUSTERED ([Oid] ASC)
)
CREATE NONCLUSTERED INDEX [IX_2]
ON [dbo].[tmpTestPerf]([SequentialNumber] ASC)
CREATE TABLE [dbo].[Stati]
(
[Oid] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[Descrizione] [nvarchar](100) NULL,
[StopAzioni] [bit] NOT NULL
CONSTRAINT [PK_Stati]
PRIMARY KEY CLUSTERED ([Oid] ASC)
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [iStopAzioni_Stati]
ON [dbo].[Stati]([StopAzioni] ASC)
GO

The query plans are not exactly the same.
Select the Index Scan operator.
Press F4 to view the properties and have a look at Scan Direction.
When you order ascending the Scan Direction is FORWARD and when you order descending it is BACKWARD.
The difference in number of rows is there because it takes only 53 rows to find 30 rows when scanning backwards and it takes 117k rows to find 30 matching rows scanning forwards in the index.
Note, without an order by clause on the main query there is no guarantee on what 30 rows you will get from your query. In this case it just happens to be the first thirty or the last thirty depending on the order by used in row_number().

Related

Indexing columns in SQL Server

I have the following table
CREATE TABLE [dbo].[ActiveHistory]
(
[ID] [INT] IDENTITY(1,1) NOT NULL,
[Date] [VARCHAR](250) NOT NULL,
[ActiveID] [INT] NOT NULL,
[UserID] [INT] NOT NULL,
CONSTRAINT [PK_ActiveHistory]
PRIMARY KEY CLUSTERED ([ID] ASC)
)
About 600,000 rows are inserted into the table per day that means 300,000 distinct actives for one date with about 500 distinct users. I would like to have about 5 year history in one table that means more then bln rows, in overall about 4,000 distinct userid and 1,000,000 distinct actives are placed in 5 year table. it is very important for me to work faster with this table,
Most of the queries in the past used joins with date and userid but in last days I have to include activeid quite often, but sometimes just two of them could be used (any pairs).
I never use ID in join.
Now I have nonclustered index with userid and date as index key columns and ID and ActiveID as included columns, Now my question is - how to best arrange the index for this table considering new challenges, just add all options as index may use huge place and sometimes application that uses the same server is suffering as CPU usage goes to 99%, I am not sure how new indexes will effect on that.

T-SQL compound index sufficient for query on subset of columns?

Is a compound index sufficient for queries against a subset of columns ?
CREATE TABLE [FILE_STATUS_HISTORY]
(
[FILE_ID] [INT] NOT NULL,
[STATUS_ID] [INT] NOT NULL,
[TIMESTAMP_UTC] [DATETIME] NOT NULL,
CONSTRAINT [PK_FILE_STATUS_HISTORY]
PRIMARY KEY CLUSTERED ([FILE_ID] ASC, [STATUS_ID] ASC)
) ON [PRIMARY]
CREATE UNIQUE NONCLUSTERED INDEX [IX_FILE_STATUS_HISTORY]
ON [FILE_STATUS_HISTORY] ([FILE_ID] ASC,
[STATUS_ID] ASC,
[TIMESTAMP_UTC] ASC) ON [PRIMARY]
GO
SELECT TOP (1) *
FROM [FILE_STATUS_HISTORY]
WHERE [FILE_ID] = 382748
ORDER BY [TIMESTAMP_UTC] DESC
A composite index on ( File_Id, Timestamp_UTC desc ) should optimize handling the where and top/order by clauses. The actual execution plan will show whether the query optimizer agrees.
A covering index would also have Status_Id as an included column so that the index could satisfy the entire query in a single lookup.

Selecting rows first by Id then by datetime - with or without a subquery?

I need to create statistics from several log tables. Most of the time every hour but sometimes more frequently every 5 minutes.
Selecting rows only by datetime isn't fast enough for larger logs so I thought I select only rows that are new since the last query by storing the max Id and reusing it next time:
SELECT TOP(1000) * -- so that it's not too much
FROM [dbo].[Log]
WHERE Id > lastId AND [Timestamp] >= timestampMin
ORDER BY [Id] DESC
My question: is the SQL Server smart enough to:
first filter the rows by Id and then by the Timestamp even if I change the order of the conditions or does the condition order matter or
do I need a subquery to first select the rows by Id and then filter them by the Timestamp.
with subquery:
SELECT *
FROM (
SELECT TOP(1000) * FROM [dbo].[Log]
WHERE Id > lastId
ORDER BY [Id] DESC
) t
WHERE t.[TimeStamp] >= timestampMin
The table schema is:
CREATE TABLE [dbo].[Log](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Timestamp] [datetime2](7) NOT NULL,
-- other columns
CONSTRAINT [PK_dbo_Log] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
I tried to use the query plan to find out how it works but it turns out that I cannot read it and I don't understand it.
In your case you don't have an index on TimeStamp so SQL Server will always use the Clustered Index (Id) first (the Clustered index seek you see in the query plan) to find the first row matching Id > lastId and then perform a scan on the remaining rows with the predicate [Timestamp] >= timestampMin (actually is the other way around since you are sorting in reverse order with DESC).
If you were to add a index on TimeStamp SQL Server might use it based on:
the cardinality of the predicate [Timestamp] >= timestampMin. Please note that cardinality is always an estimate based on statistics (see https://msdn.microsoft.com/en-us/library/ms190397.aspx) and the cardinality estimator (it changed from SQL 2012 to 2014+, see https://msdn.microsoft.com/en-us/library/dn600374.aspx).
how covering the non-clustered index is (since you are using the wildcard it would hardly matters anyway). If the non-clustered index is non covering SQL Server would have to add a Key Lookup (see https://technet.microsoft.com/en-us/library/bb326635(v=sql.105).aspx) operator in order to retrieve all the fields (or perform a join). This will likely make the index not worthwhile for this query.
Also note that your two queries - the one with subplan and the one without - are functionally different. The first will give you the first 1000 rows the have both Id > lastId AND [Timestamp] >= timestampMin. The second will give you only the rows having [Timestamp] >= timestampMin from the first 1000 rows having Id > lastId. So, for example, you might get 1000 rows from the first query but less than that on the second one.

Create an index to speed up query in SQL Server

I took a SQL assessment test this week. And this question in specific is one I did not understand since I am not familiar with clustered, non-clustered indexes yet.
The SQL server table below is used to manage a company’s product purchases. The table contains 17 million rows. Which of the following SQL statements can be used to create an index such to calculate the total purchases for a given data will run the shortest amount of time?
CREATE TABLE [Production].[TransactionHistory]
(
[TransactionID][int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
[ProductionID][int] NOT NULL,
[TransactionType][nchar](1) NOT NULL,
[Quantity][int] NOT NULL,
[ActualCost][money] NOT NULL,
[ProductionDate][dateTime] NOT NULL,
)
Which of the following queries can return data in the shortest amount of processing time? This will give me a good understanding of how indexes work. And there can be up to 3 valid answers in this question. Thanks in advance, I appreciate the help.
Option 1
CREATE COVERING INDEX IX_TranHistory_Covered
ON [Production].[TransactionHistory]
(
[ProductionDate] ASC,
[ActualCost] ASC,
[Quantity] ASC
);
Option 2
CREATE NONCLUSTERED INDEX IX_TranHistory_Covered
ON [Production].[TransactionHistory]
(
[ActualCost] ASC,
[ProductionDate] ASC,
[Quantity] ASC
);
Option 3
CREATE NONCLUSTERED INDEX IX_TranHistory_Covered
ON [Production].[TransactionHistory]
(
[Quantity]
)
INCLUDE
(
[ProductionDate],
[ActualCost]
);
Option 4
CREATE NONCLUSTERED INDEX IX_TranHistory_Covered
ON [Production].[TransactionHistory]
(
[ProductionDate]
)
INCLUDE
(
[ActualCost] ASC,
[Quantity] ASC
);
Last option
CREATE INDEX IX_TranHistory_Covered
ON [Production].[TransactionHistory]
(
[ActualCost] ASC,
[Quantity] ASC,
[ProductionDate] ASC
);
You want option 4. The key (Production Date) will induce index seeks and by creating a covered index the information needed to do satisfy query is right there in the index tree and SQL Server does not have to retrieve the entire row to calculate the result. You don't want 'asc' in the include part of the index.

How to improve large table paging speed in SQL Server 2008

I have a table called Users with 10 million records in it. This is the table structure:
CREATE TABLE [dbo].[Users](
[UsersID] [int] IDENTITY(100000,1) NOT NULL,
[LoginUsersName] [nvarchar](50) NOT NULL,
[LoginUsersPwd] [nvarchar](50) NOT NULL,
[Email] [nvarchar](80) NOT NULL,
[IsEnable] [int] NOT NULL,
[CreateTime] [datetime] NOT NULL,
[LastLoginTime] [datetime] NOT NULL,
[LastLoginIp] [nvarchar](50) NOT NULL,
[UpdateTime] [datetime] NOT NULL,
CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED
(
[UsersID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
I have a nonclustered index on the UpdateTime column.
The paging sql:
;WITH UserCTE AS (
SELECT * FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY UpdateTime DESC) AS row,UsersID as rec_id -- select primary key only
FROM
dbo.Users WITH (NOLOCK)
) A WHERE row BETWEEN 9700000 AND 9700020
)
SELECT
*
FROM
dbo.Users WITH (NOLOCK) WHERE UsersID IN (SELECT UserCTE.rec_id FROM UserCTE)
The query above:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 3 ms.
(21 row(s) affected)
SQL Server Execution Times:
CPU time = 2574 ms, elapsed time = 3549 ms.
Anyone give me some suggests about how to improve paging speed will appreciate. Thanks!
That looks about as good as it is going to get without changing the way it works or doing some sort of pre-calculation.
The index used to locate the UserIds on the page is as narrow as it can be (the leaf pages will contain just the UpdateTime and the clustered index key of UsersID. You could make the index slightly narrower by changing to datetime2 but this won't make a significant difference. Also you could check that this index doesn't have excessive fragmentation.
If you had an indexed sequential integer column of UpdateTimeOrder then you could just do
SELECT *
FROM dbo.Users
WHERE UpdateTimeOrder BETWEEN 9700000 AND 9700020
But maintaining such a column along with concurrent INSERTS/UPDATES/DELETES will be difficult. One easier but less effective precalculation would be to create an indexed view.
CREATE VIEW dbo.UserCount
WITH SCHEMABINDING
AS
SELECT COUNT_BIG(*) AS Count
FROM [dbo].[Users]
GO
CREATE UNIQUE CLUSTERED INDEX IX ON dbo.UserCount(Count)
Then retrieve the pre-calculated count and call a different query with ROW_NUMBER() OVER (ORDER BY UpdateTime ASC) if looking for rows more than halfway through the index (and subtracting the original row numbers from the count of course)
But why do you actually need this anyway? Do you actually get people visiting page 485,000?

Resources