Is there possibility to solve this with index in SQL Server - sql-server

Problem is that index as is. I can't alter or add it.
Can I do something for better query plan?
Index on 2 columns: pid, Date.
But select is only on Date...
Table deal is very big (>1 000 000 rows)
create table deal
(
Id Int, NOT NULL PRIMARY KEY NONCLUSTERED,
pid Int, NOT NULL,
Date smalldatetime NOT NULL
)
create clustered index pk ON deal (pid, Date)
select *
from deal
where Date between #d1 and #d2

I would recommend using ID as your clustered index and create a second index on date including ID and PID (Covering Index). If you do batch inserts then drop the date index and recreate it after to improve insert performance.

I would create my clustered index over id and create a non clustered index over date like this
CREATE NONCLUSTERED INDEX noncluxidxdate ON deal (Date)
INCLUDE (id, pid);
this post will help you understand
What do Clustered and Non clustered index actually mean?

Related

Create Clustered Index on (Date + Key)

There is a transaction table that have 40 millions of data. There are 100 columns in the table.
For simply, there are 3 important columns (HeaderID, HeaderLineID, OrderDate) and the unique identifier is (HeaderID, HeaderLineID).
CREATE TABLE [dbo].[T_Table](
[HeaderID] [nvarchar](4) NOT NULL,
[HeaderLineID] [nvarchar](10) NOT NULL,
[OrderDate] [datetime] NOT NULL
) ON [FG_Index]
GO
CREATE CLUSTERED INDEX [OrderDate] ON [dbo].[T_Table]
(
[OrderDate] ASC
)
GO
CREATE NONCLUSTERED INDEX [Key] ON [dbo].[T_Table]
(
[HeaderID] ASC,
[HeaderLineID] ASC
)
GO
For normal usage, we select the data based on date range
select * from T_Table
where OrderDate between '2015-01-01' and '2015-12-31'
Is it better approach to drop current keys and insert a clustered index key with Date + Key instead? That is,
CREATE CLUSTERED INDEX [NewKey] ON [dbo].[T_Table]
(
[OrderDate] ASC,
[HeaderID] ASC,
[HeaderLineID] ASC
)
GO
.
Replies from comments
explain what is HeaderID and HeaderLineID. Is combination of HeaderLineID & HeaderID unique?
HeaderID is the Order Number and HeaderLineID is the Order Line Number.
Combination of HeaderID+HeaderLineID is unique.
Which will be most frequently use in search ? Selectivity of Order Date vs Selectivity of HeaderLineID & HeaderID.
OrderDate could be found in filter condition
HeaderLineID could be found in joining condition to other tables
HeaderID, HeaderLineID, OrderDate could be found in output result
Your index will not perform good ,for the queries you have,if your order date is not unique and if you have more queries like below
select * from T_Table
where OrderDate between '2015-01-01' and '2015-12-31'
i suggest creating a non clustered index with below definition
create index nci_somename on t_table(orderdate)
include(HeaderID, HeaderLineID)
Having a clustered index is good,but i don't recommend it ,if it won't satisfy your queries
i) What is the volume of transaction per date ?
ii) You must hv read this example where table scan was done instead of CI seek becasue optmizer felt table scan was more cose effective way.Similarly can be your case.
iii) Critical error: 100 column in single table is itself wrong.For how many column you include in NON Clustered covering index.At most 20-25 column are comonn and important across all req. rest column are AREA specific hence mostly sparse.Putting all columns in single table is not an example of DeNormalization.
iv) Is data really normalise ?I mean do you hv repeative rows.for example suppose two item was ordered in single orderid then how it is stored in this scenario.If two item are store in same table then it is not an example of DeNormalization.
v) Create CI on unique sequential column. Create Non Clsutered index on OrderDate include (*some common important column)
*since no idea about rest column and details.

Does SQL Server allow including a computed column in a non-clustered index? If not, why not?

When a column is included in non-clustered index, SQL Server copies the values for that column from the table into the index structure (B+ tree). Included columns don't require table look up.
If the included column is essentially a copy of original data, why does not SQL Server also allow including computed columns in the non-clustered index - applying the computations when it is copying/updating the data from table to index structure? Or am I just not getting the syntax right here?
Assume:
DateOpened is datetime
PlanID is varchar(6)
This works:
create nonclustered index ixn_DateOpened_CustomerAccount
on dbo.CustomerAccount(DateOpened)
include(PlanID)
This does not work with left(PlanID, 3):
create nonclustered index ixn_DateOpened_CustomerAccount
on dbo.CustomerAccount(DateOpened)
include(left(PlanID, 3))
or
create nonclustered index ixn_DateOpened_CustomerAccount
on dbo.CustomerAccount(DateOpened)
include(left(PlanID, 3) as PlanType)
My use case is somewhat like below query.
select
case
when left(PlanID, 3) = '100' then 'Basic'
else 'Professional'
end as 'PlanType'
from
CustomerAccount
where
DateOpened between '2016-01-01 00:00:00.000' and '2017-01-01 00:00:00.000'
The query cares only for the left 3 of PlanID and I was wondering instead of computing it every time the query runs, I would include left(PlanID, 3) in the non-clustered index so the computations are done when the index is built/updated (fewer times) instead at the query time (frequently)
EDIT: We use SQL Server 2014.
As Laughing Vergil stated - you CAN index persisted columns provided that they are persisted. You have a few options, here's a couple:
Option 1: Create the column as PERSISTED then index it
(or, in your case, include it in the index)
First the sample data:
CREATE TABLE dbo.CustomerAccount
(
PlanID int PRIMARY KEY,
DateOpened datetime NOT NULL,
First3 AS LEFT(PlanID,3) PERSISTED
);
INSERT dbo.CustomerAccount (PlanID, DateOpened)
VALUES (100123, '20160114'), (100999, '20151210'), (255657, '20150617');
and here's the index:
CREATE NONCLUSTERED INDEX nc_CustomerAccount ON dbo.CustomerAccount(DateOpened)
INCLUDE (First3);
Now let's test:
-- Note: IIF is available for SQL Server 2012+ and is cleaner
SELECT PlanID, PlanType = IIF(First3 = 100, 'Basic', 'Professional')
FROM dbo.CustomerAccount;
Execution Plan:
As you can see- the optimizer picked the nonclustered index.
Option #2: Perform the CASE logic inside your table DDL
First the updated table structure:
DROP TABLE dbo.CustomerAccount;
CREATE TABLE dbo.CustomerAccount
(
PlanID int PRIMARY KEY,
DateOpened datetime NOT NULL,
PlanType AS
CASE -- NOTE: casting as varchar(12) will make the column a varchar(12) column:
WHEN LEFT(PlanID,3) = 100 THEN CAST('Basic' AS varchar(12))
ELSE 'Professional'
END
PERSISTED
);
INSERT dbo.CustomerAccount (PlanID, DateOpened)
VALUES (100123, '20160114'), (100999, '20151210'), (255657, '20150617');
Notice that I use CAST to assign the data type, the table will be created with this column as varchar(12).
Now the index:
CREATE NONCLUSTERED INDEX nc_CustomerAccount ON dbo.CustomerAccount(DateOpened)
INCLUDE (PlanType);
Let's test again:
SELECT DateOpened, PlanType FROM dbo.CustomerAccount;
Execution plan:
... again, it used the nonclustered index
A third option, which I don't have time to go into, would be to create an indexed view. This would be a good option for you if you were unable to change your existing table structure.
SQL Server 2014 allows creating indexes on computed columns, but you're not doing that -- you're attempting to create the index directly on an expression. This is not allowed. You'll have to make PlanType a column first:
ALTER TABLE dbo.CustomerAccount ADD PlanType AS LEFT(PlanID, 3);
And now creating the index will work just fine (if your SET options are all correct, as outlined here):
CREATE INDEX ixn_DateOpened_CustomerAccount ON CustomerAccount(DateOpened) INCLUDE (PlanType)
It is not required that you mark the column PERSISTED. This is required only if the column is not precise, which does not apply here (this is a concern only for floating-point data).
Incidentally, the real benefit of this index is not so much that LEFT(PlanType, 3) is precalculated (the calculation is inexpensive), but that no clustered index lookup is needed to get at PlanID. With an index only on DateOpened, a query like
SELECT PlanType FROM CustomerAccounts WHERE DateOpened >= '2012-01-01'
will result in an index seek on CustomerAccounts, followed by a clustered index lookup to get PlanID (so we can calculate PlanType). If the index does include PlanType, the index is covering and the extra lookup disappears.
This benefit is relevant only if the index is truly covering, however. If you select other columns from the table, an index lookup is still required and the included computed column is only taking up space for little gain. Likewise, suppose that you had multiple calculations on PlanID or you needed PlanID itself as well -- in this case it would make much more sense to include PlanID directly rather than PlanType.
Computed columns are only allowed in indexes if they are Persisted - that is, if the data is written to the table. If the information is not persisted, then the information isn't even calculated / available until the field is queried.

How indexes work for below queries?

I have created the below table with primary key:
create table test2
(id int primary key,name varchar(20))
insert into test2 values
(1,'mahesh'),(2,'ram'),(3,'sham')
then created the non clustered index on it.
create nonclustered index ind_non_name on test2(name)
when I write below query it will always you non clustered indexes in query execution plan.
select COUNT(*) from test2
select id from test2
select * from test2
Could you please help me to understand why it always use non clustered index even if we have clustered index on table?
Thanks in advance.
Basically when you create a non-clustered index on name, the index actually contains name and id, so it kind of contains all the table itself.
If you add another field like this:
create table test4
(id int primary key clustered,name varchar(20), name2 varchar(20))
insert into test4 values
(1,'mahesh','mahesh'),(2,'ram','mahesh'),(3,'sham','mahesh')
create nonclustered index ind_non_name on test4(name)
You'll see that some of the queries will start using the clustered index.
In your case the indexes are pretty much the same thing, since clustered index also contains the data, your clustered index is id, name and non clustered indexes contain the clustering key, so the non-clustered index is name, id.
You don't have any search criteria, so no matter which index is used, it must be scanned completely anyhow, so why should it actually use the clustered index?
If you add third field you your table, then at least select * will use clustered index.
You are confusing Primary Keys with clustering keys. They are not the same. You will need to explicitly create the clustering key.
To create the clustering key on the primary key in the create statement:
create table test2
(id int ,name varchar(20)
constraint PK_ID_test2 primary key clustered(id))
To add the clustering key to what you have already:
ALTER TABLE test2
ADD CONSTRAINT PK_ID_test2 primary key clustered(id)

SQL Server Unique Key Constraint Date/Time Span

How can I set a unique key constraint for the following table to ensure the date/time span between the Date/BeginTime and Date/EndTime do not overlap with another record? If I need to add a computed column, what data type and calculation?
Column Name Data Type
Date date
BeginTime time(7)
EndTime time(7)
Thanks.
I don't believe that you can do that using a UNIQUE constraint in SQL Server. Postgres has this capability, but to implement it in SQL Server you must use a trigger. Since your question was "how can I do this using a unique key constraint", the correct answer is "you can't". If you had asked "how can I enforce this non-overlapping constraint", there is an answer.
Alexander Kuznetsov shows one possible way. Storing intervals of time with no overlaps.
See also article by Joe Celko: Contiguous Time Periods
Here is the table and the first interval:
CREATE TABLE dbo.IntegerSettings(SettingID INT NOT NULL,
IntValue INT NOT NULL,
StartedAt DATETIME NOT NULL,
FinishedAt DATETIME NOT NULL,
PreviousFinishedAt DATETIME NULL,
CONSTRAINT PK_IntegerSettings_SettingID_FinishedAt
PRIMARY KEY(SettingID, FinishedAt),
CONSTRAINT UNQ_IntegerSettings_SettingID_PreviousFinishedAt
UNIQUE(SettingID, PreviousFinishedAt),
CONSTRAINT FK_IntegerSettings_SettingID_PreviousFinishedAt
FOREIGN KEY(SettingID, PreviousFinishedAt)
REFERENCES dbo.IntegerSettings(SettingID, FinishedAt),
CONSTRAINT CHK_IntegerSettings_PreviousFinishedAt_NotAfter_StartedAt
CHECK(PreviousFinishedAt <= StartedAt),
CONSTRAINT CHK_IntegerSettings_StartedAt_Before_FinishedAt
CHECK(StartedAt < FinishedAt)
);
INSERT INTO dbo.IntegerSettings
(SettingID, IntValue, StartedAt, FinishedAt, PreviousFinishedAt)
VALUES(1, 1, '20070101', '20070103', NULL);
Constraints enforce these rules:
There can be only one first interval for a setting
Next window must begin after the end of the previous one
Two different windows cannot refer to one and the same window as their previous one
-- this is a unique key that allows for null in EndTime field
-- This Unique Index could be clusteres optionally instead of the traditional primary key being clustered
CREATE UNIQUE NONCLUSTERED INDEX
[UNQ_IDX_Date_BeginTm_EndTm_UniqueIndex_With_Null_EndTime] ON [MyTableName]
(
[Date] ASC,
[BeginTime] ASC,
[EndTime] ASC
)
GO
-- this is a traditional PK Constraint that is clustered but EndTime is
--- Not Null
-- it is possible that this table would not have a traditional Primary Key
ALTER TABLE dbo.MyTable ADD CONSTRAINT
PK_Date_BeginTm_EndTm_EndTimeIsNotNull PRIMARY KEY CLUSTERED
(
Date,
BeginTime,
EndTime
)
GO
-- HINT - Control your BeginTime and EndTime secconds and milliseconds at
-- all insert and read points
-- you want 13:01:42.000 and 13:01:42.333 to evaluate and compare the
-- exact way you expect from a KEY perspective

JOIN Performance: Composite key versus BigInt Primary Key

We have a table that is going to be say 100 million to a billion rows (Table name: Archive)
This table will be referenced from another table, Users.
We have 2 options for the primary key on the Archive table:
option 1: dataID (bigint)
option 2: userID + datetime (4 byte version).
Schema:
Users
- userID (int)
Archive
- userID
- datetime
OR
Archive
- dataID (big int)
Which one would be faster?
We are shying away from using Option#1 because bigint is 8 bytes and with 100 million rows that will add up to allot of storage.
Update
Ok sorry I forgot to mention, userID and datetime have to be regardless, so that was the reason for not adding another column, dataID, to the table.
Some thoughts, but there is probably not a clear cut solution:
If you have a billion rows, why not use int which goes from -2.1 billion to +2.1 billion?
Userid, int, 4 bytes + smalldatetime, 4 bytes = 8 bytes, same as bigint
If you are thinking of userid + smalldatetime then surely this is useful anyway.
If so, adding a surrogate "archiveID" column will increase space anyway
Do you require filtering/sorting by userid + smalldatetime?
Make sure your model is correct, worry about JOINs later...
Concern: Using UserID/[small]datetime carries with it a high risk of not being unique.
Here is some real schema. Is this what you're talking about?
-- Users (regardless of Archive choice)
CREATE TABLE dbo.Users (
userID int NOT NULL IDENTITY,
<other columns>
CONSTRAINT <name> PRIMARY KEY CLUSTERED (userID)
)
-- Archive option 1
CREATE TABLE dbo.Archive (
dataID bigint NOT NULL IDENTITY,
userID int NOT NULL,
[datetime] smalldatetime NOT NULL,
<other columns>
CONSTRAINT <name> PRIMARY KEY CLUSTERED (dataID)
)
-- Archive option 2
CREATE TABLE dbo.Archive (
userID int NOT NULL,
[datetime] smalldatetime NOT NULL,
<other columns>
CONSTRAINT <name> PRIMARY KEY CLUSTERED (userID, [datetime] DESC)
)
CREATE NONCLUSTERED INDEX <name> ON dbo.Archive (
userID,
[datetime] DESC
)
If this were my decision, I would definitely got with option 1. Disk is cheap.
If you go with Option 2, it's likely that you will have to add some other column to your PK to make it unique, then your design starts degrading.
What's with option 3: Making dataID a 4 byte int?
Also, if I understand it right, the archive table will be referenced from the users table, so it wouldn't even make much sense to have the userID in the archive table.
I recommend that you setup a simulation to validate this in your environment, but my guess would be that the single bigint would be faster in general; however when you query the table what are you going to be querying on?
If I was building an arhive, I might lean to having an autoincrement identity field, and then using a partioning scheme to partion based on DateTime and perhaps userid but that would depend on the circumstance.

Resources