Create an index to speed up query in SQL Server - sql-server

I took a SQL assessment test this week. And this question in specific is one I did not understand since I am not familiar with clustered, non-clustered indexes yet.
The SQL server table below is used to manage a company’s product purchases. The table contains 17 million rows. Which of the following SQL statements can be used to create an index such to calculate the total purchases for a given data will run the shortest amount of time?
CREATE TABLE [Production].[TransactionHistory]
(
[TransactionID][int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
[ProductionID][int] NOT NULL,
[TransactionType][nchar](1) NOT NULL,
[Quantity][int] NOT NULL,
[ActualCost][money] NOT NULL,
[ProductionDate][dateTime] NOT NULL,
)
Which of the following queries can return data in the shortest amount of processing time? This will give me a good understanding of how indexes work. And there can be up to 3 valid answers in this question. Thanks in advance, I appreciate the help.
Option 1
CREATE COVERING INDEX IX_TranHistory_Covered
ON [Production].[TransactionHistory]
(
[ProductionDate] ASC,
[ActualCost] ASC,
[Quantity] ASC
);
Option 2
CREATE NONCLUSTERED INDEX IX_TranHistory_Covered
ON [Production].[TransactionHistory]
(
[ActualCost] ASC,
[ProductionDate] ASC,
[Quantity] ASC
);
Option 3
CREATE NONCLUSTERED INDEX IX_TranHistory_Covered
ON [Production].[TransactionHistory]
(
[Quantity]
)
INCLUDE
(
[ProductionDate],
[ActualCost]
);
Option 4
CREATE NONCLUSTERED INDEX IX_TranHistory_Covered
ON [Production].[TransactionHistory]
(
[ProductionDate]
)
INCLUDE
(
[ActualCost] ASC,
[Quantity] ASC
);
Last option
CREATE INDEX IX_TranHistory_Covered
ON [Production].[TransactionHistory]
(
[ActualCost] ASC,
[Quantity] ASC,
[ProductionDate] ASC
);

You want option 4. The key (Production Date) will induce index seeks and by creating a covered index the information needed to do satisfy query is right there in the index tree and SQL Server does not have to retrieve the entire row to calculate the result. You don't want 'asc' in the include part of the index.

Related

Should I explicitly list partitioning column as part of the index key or it's enough to specify it in the ON clause with partition schema?

I have SQL Server 2019 where I want to partition one of my tables. Let's say we have a simple table like so:
IF OBJECT_ID('dbo.t') IS NOT NULL
DROP TABLE t;
CREATE TABLE t
(
PKID INT NOT NULL,
PeriodId INT NOT NULL,
ColA VARCHAR(10),
ColB INT
);
Let's also say that I have defined partition function and schema. The schema is called [PS_PartitionKey]
Now I can partition this table by building a clustered index in a couple of ways.
Like this:
CREATE CLUSTERED INDEX IX_1 ON t ([PKId] ASC )
ON [PS_PartitionKey]([PeriodID])
Or like this:
CREATE CLUSTERED INDEX IX_1 ON t ([PKId] ASC, [PeriodId] ASC )
ON [PS_PartitionKey]([PeriodID])
As you can see, in the first case I did not explicitly specify my partitioning column as part of the index key, but in the second case I did. Both of these work, but what's the difference?
A similar question would apply if I were building these as non-clustered indexes. Using the same table as an example. Let's say I start by creating a clustered PK:
ALTER TABLE [dbo].t
ADD CONSTRAINT PK_t
PRIMARY KEY CLUSTERED ([PKId] ASC, [PeriodId]) ON [PS_PartitionKey]([PeriodID])
Now I want to define additional non-clustered index. Once again, I can do it in two ways:
CREATE NONCLUSTERED INDEX IX_1 ON t ([ColA] ASC)
ON [PS_PartitionKey]([PeriodID])
or:
CREATE NONCLUSTERED INDEX IX_1 ON t ([ColA] ASC, [PeriodId] ASC)
ON [PS_PartitionKey]([PeriodID])
What difference would it make?

T-SQL compound index sufficient for query on subset of columns?

Is a compound index sufficient for queries against a subset of columns ?
CREATE TABLE [FILE_STATUS_HISTORY]
(
[FILE_ID] [INT] NOT NULL,
[STATUS_ID] [INT] NOT NULL,
[TIMESTAMP_UTC] [DATETIME] NOT NULL,
CONSTRAINT [PK_FILE_STATUS_HISTORY]
PRIMARY KEY CLUSTERED ([FILE_ID] ASC, [STATUS_ID] ASC)
) ON [PRIMARY]
CREATE UNIQUE NONCLUSTERED INDEX [IX_FILE_STATUS_HISTORY]
ON [FILE_STATUS_HISTORY] ([FILE_ID] ASC,
[STATUS_ID] ASC,
[TIMESTAMP_UTC] ASC) ON [PRIMARY]
GO
SELECT TOP (1) *
FROM [FILE_STATUS_HISTORY]
WHERE [FILE_ID] = 382748
ORDER BY [TIMESTAMP_UTC] DESC
A composite index on ( File_Id, Timestamp_UTC desc ) should optimize handling the where and top/order by clauses. The actual execution plan will show whether the query optimizer agrees.
A covering index would also have Status_Id as an included column so that the index could satisfy the entire query in a single lookup.

Create Clustered Index on (Date + Key)

There is a transaction table that have 40 millions of data. There are 100 columns in the table.
For simply, there are 3 important columns (HeaderID, HeaderLineID, OrderDate) and the unique identifier is (HeaderID, HeaderLineID).
CREATE TABLE [dbo].[T_Table](
[HeaderID] [nvarchar](4) NOT NULL,
[HeaderLineID] [nvarchar](10) NOT NULL,
[OrderDate] [datetime] NOT NULL
) ON [FG_Index]
GO
CREATE CLUSTERED INDEX [OrderDate] ON [dbo].[T_Table]
(
[OrderDate] ASC
)
GO
CREATE NONCLUSTERED INDEX [Key] ON [dbo].[T_Table]
(
[HeaderID] ASC,
[HeaderLineID] ASC
)
GO
For normal usage, we select the data based on date range
select * from T_Table
where OrderDate between '2015-01-01' and '2015-12-31'
Is it better approach to drop current keys and insert a clustered index key with Date + Key instead? That is,
CREATE CLUSTERED INDEX [NewKey] ON [dbo].[T_Table]
(
[OrderDate] ASC,
[HeaderID] ASC,
[HeaderLineID] ASC
)
GO
.
Replies from comments
explain what is HeaderID and HeaderLineID. Is combination of HeaderLineID & HeaderID unique?
HeaderID is the Order Number and HeaderLineID is the Order Line Number.
Combination of HeaderID+HeaderLineID is unique.
Which will be most frequently use in search ? Selectivity of Order Date vs Selectivity of HeaderLineID & HeaderID.
OrderDate could be found in filter condition
HeaderLineID could be found in joining condition to other tables
HeaderID, HeaderLineID, OrderDate could be found in output result
Your index will not perform good ,for the queries you have,if your order date is not unique and if you have more queries like below
select * from T_Table
where OrderDate between '2015-01-01' and '2015-12-31'
i suggest creating a non clustered index with below definition
create index nci_somename on t_table(orderdate)
include(HeaderID, HeaderLineID)
Having a clustered index is good,but i don't recommend it ,if it won't satisfy your queries
i) What is the volume of transaction per date ?
ii) You must hv read this example where table scan was done instead of CI seek becasue optmizer felt table scan was more cose effective way.Similarly can be your case.
iii) Critical error: 100 column in single table is itself wrong.For how many column you include in NON Clustered covering index.At most 20-25 column are comonn and important across all req. rest column are AREA specific hence mostly sparse.Putting all columns in single table is not an example of DeNormalization.
iv) Is data really normalise ?I mean do you hv repeative rows.for example suppose two item was ordered in single orderid then how it is stored in this scenario.If two item are store in same table then it is not an example of DeNormalization.
v) Create CI on unique sequential column. Create Non Clsutered index on OrderDate include (*some common important column)
*since no idea about rest column and details.

SQL Server 2012 row_number ASC DESC performance

In a SQL Server 2012 version 11.0.5058 I've a query like this
SELECT TOP 30
row_number() OVER (ORDER BY SequentialNumber ASC) AS [row_number],
o.Oid, StopAzioni
FROM
tmpTestPerf O
INNER JOIN
Stati s on O.Stato = s.Oid
WHERE
StopAzioni = 0
When I use ORDER BY SequentialNumber ASC it takes 400 ms
When I use ORDER BY DESC in the row_number function it takes only 2 ms
(This is in a test environment, in production it is 7000, 7 seconds vs 15 ms!)
Analyzing the execution plan, I found that it's the same for both queries. The interesting difference is that in the slower it works with all the rows filtered by the stopazioni = 0 condition, 117k rows
In the faster it only uses 53 rows
There are a primary key on the tmpTestPerf query and an indexed ASC key on the sequential number column.
How it could be explained?
Regards.
Daniele
This is the script of the tmpTestPerfQuery and Stati query with their indexes
CREATE TABLE [dbo].[tmpTestPerf]
(
[Oid] [uniqueidentifier] NOT NULL,
[SequentialNumber] [bigint] NOT NULL,
[Anagrafica] [uniqueidentifier] NULL,
[Stato] [uniqueidentifier] NULL,
CONSTRAINT [PK_tmpTestPerf]
PRIMARY KEY CLUSTERED ([Oid] ASC)
)
CREATE NONCLUSTERED INDEX [IX_2]
ON [dbo].[tmpTestPerf]([SequentialNumber] ASC)
CREATE TABLE [dbo].[Stati]
(
[Oid] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[Descrizione] [nvarchar](100) NULL,
[StopAzioni] [bit] NOT NULL
CONSTRAINT [PK_Stati]
PRIMARY KEY CLUSTERED ([Oid] ASC)
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [iStopAzioni_Stati]
ON [dbo].[Stati]([StopAzioni] ASC)
GO
The query plans are not exactly the same.
Select the Index Scan operator.
Press F4 to view the properties and have a look at Scan Direction.
When you order ascending the Scan Direction is FORWARD and when you order descending it is BACKWARD.
The difference in number of rows is there because it takes only 53 rows to find 30 rows when scanning backwards and it takes 117k rows to find 30 matching rows scanning forwards in the index.
Note, without an order by clause on the main query there is no guarantee on what 30 rows you will get from your query. In this case it just happens to be the first thirty or the last thirty depending on the order by used in row_number().

indexes that appear to be redundant with clustered PK

I am working on a database at a client with the following table:
CREATE TABLE [Example] (
[ID] INT IDENTITY (1, 1) NOT NULL,
....
[AddressID] INT NULL,
[RepName] VARCHAR(50) NULL,
....
CONSTRAINT [PK_Example] PRIMARY KEY CLUSTERED ([ID] ASC)
)
And it has the following indexes:
CREATE NONCLUSTERED INDEX [IDX_Example_Address]
ON [example]( [ID] ASC, [AddressId] ASC);
CREATE NONCLUSTERED INDEX [IDX_Example_Rep]
ON [example]( [ID] ASC, [RepName] ASC);
To me these are appear to be redundant with the clustered Index. I cannot imagine any scenario where these would be beneficial. If anyone can come up with a situation where these would be useful, let me know.
Here is another example:
CREATE NONCLUSTERED INDEX [IDX_Example_IsDeleted]
ON [example]( [IsDeleted] ASC)
INCLUDE( [ID], [SomeNumber]);
Why would you need to INCLUDE [ID]? My understanding is that the clustered index key is already present in every non-clustered index, so why would they do that? I would just INCLUDE ([SomeNumber])
You are correct in that the clustered index key is already included in every non-clustered index, but not in the same sense as your example clustered indices suggest.
For example, if you have a non-clustered index as in your example for IDX_Example_Rep, and you run this query:
SELECT [RepName], [Id] FROM [Example] WHERE [RepName] = 'some_value';
The IDX_Example_Rep index will be used, but it will be an index scan (every row will be checked). This is because the [Id] column was specified as the first column in the index.
If the index is instead specified as follows:
CREATE NONCLUSTERED INDEX [IDX_Example_Rep]
ON [example]([RepName] ASC);
Then when you run the same sample query, the IDX_Example_Rep index is used and the operation is an index seek - the engine knows exactly where to find the records by [RepName] within the IDX_Example_Rep index and, because the only other field being returned by the SELECT is the [Id] field, which is the key of the clustered index and therefore included in the non-clustered index, no further operations are necessary.
If the SELECT list were expanded to include, say, the [AddressId] field, then you'll find the engine still performs the index seek against IDX_Example_Rep to find the correct records, but then also has do a key lookup against the clustered index to get the "other" fields (the [AddressId] in this example).
So, no - you probably don't want to repeat the [Id] column as part of the non-clustered indices in general, but when it comes to non-clustered indices you definitely want to pay attention to your SELECTed fields and know whether or not you're covering the fields you're going to need.

Resources