There is a transaction table that have 40 millions of data. There are 100 columns in the table.
For simply, there are 3 important columns (HeaderID, HeaderLineID, OrderDate) and the unique identifier is (HeaderID, HeaderLineID).
CREATE TABLE [dbo].[T_Table](
[HeaderID] [nvarchar](4) NOT NULL,
[HeaderLineID] [nvarchar](10) NOT NULL,
[OrderDate] [datetime] NOT NULL
) ON [FG_Index]
GO
CREATE CLUSTERED INDEX [OrderDate] ON [dbo].[T_Table]
(
[OrderDate] ASC
)
GO
CREATE NONCLUSTERED INDEX [Key] ON [dbo].[T_Table]
(
[HeaderID] ASC,
[HeaderLineID] ASC
)
GO
For normal usage, we select the data based on date range
select * from T_Table
where OrderDate between '2015-01-01' and '2015-12-31'
Is it better approach to drop current keys and insert a clustered index key with Date + Key instead? That is,
CREATE CLUSTERED INDEX [NewKey] ON [dbo].[T_Table]
(
[OrderDate] ASC,
[HeaderID] ASC,
[HeaderLineID] ASC
)
GO
.
Replies from comments
explain what is HeaderID and HeaderLineID. Is combination of HeaderLineID & HeaderID unique?
HeaderID is the Order Number and HeaderLineID is the Order Line Number.
Combination of HeaderID+HeaderLineID is unique.
Which will be most frequently use in search ? Selectivity of Order Date vs Selectivity of HeaderLineID & HeaderID.
OrderDate could be found in filter condition
HeaderLineID could be found in joining condition to other tables
HeaderID, HeaderLineID, OrderDate could be found in output result
Your index will not perform good ,for the queries you have,if your order date is not unique and if you have more queries like below
select * from T_Table
where OrderDate between '2015-01-01' and '2015-12-31'
i suggest creating a non clustered index with below definition
create index nci_somename on t_table(orderdate)
include(HeaderID, HeaderLineID)
Having a clustered index is good,but i don't recommend it ,if it won't satisfy your queries
i) What is the volume of transaction per date ?
ii) You must hv read this example where table scan was done instead of CI seek becasue optmizer felt table scan was more cose effective way.Similarly can be your case.
iii) Critical error: 100 column in single table is itself wrong.For how many column you include in NON Clustered covering index.At most 20-25 column are comonn and important across all req. rest column are AREA specific hence mostly sparse.Putting all columns in single table is not an example of DeNormalization.
iv) Is data really normalise ?I mean do you hv repeative rows.for example suppose two item was ordered in single orderid then how it is stored in this scenario.If two item are store in same table then it is not an example of DeNormalization.
v) Create CI on unique sequential column. Create Non Clsutered index on OrderDate include (*some common important column)
*since no idea about rest column and details.
Related
I have SQL Server 2019 where I want to partition one of my tables. Let's say we have a simple table like so:
IF OBJECT_ID('dbo.t') IS NOT NULL
DROP TABLE t;
CREATE TABLE t
(
PKID INT NOT NULL,
PeriodId INT NOT NULL,
ColA VARCHAR(10),
ColB INT
);
Let's also say that I have defined partition function and schema. The schema is called [PS_PartitionKey]
Now I can partition this table by building a clustered index in a couple of ways.
Like this:
CREATE CLUSTERED INDEX IX_1 ON t ([PKId] ASC )
ON [PS_PartitionKey]([PeriodID])
Or like this:
CREATE CLUSTERED INDEX IX_1 ON t ([PKId] ASC, [PeriodId] ASC )
ON [PS_PartitionKey]([PeriodID])
As you can see, in the first case I did not explicitly specify my partitioning column as part of the index key, but in the second case I did. Both of these work, but what's the difference?
A similar question would apply if I were building these as non-clustered indexes. Using the same table as an example. Let's say I start by creating a clustered PK:
ALTER TABLE [dbo].t
ADD CONSTRAINT PK_t
PRIMARY KEY CLUSTERED ([PKId] ASC, [PeriodId]) ON [PS_PartitionKey]([PeriodID])
Now I want to define additional non-clustered index. Once again, I can do it in two ways:
CREATE NONCLUSTERED INDEX IX_1 ON t ([ColA] ASC)
ON [PS_PartitionKey]([PeriodID])
or:
CREATE NONCLUSTERED INDEX IX_1 ON t ([ColA] ASC, [PeriodId] ASC)
ON [PS_PartitionKey]([PeriodID])
What difference would it make?
Problem is that index as is. I can't alter or add it.
Can I do something for better query plan?
Index on 2 columns: pid, Date.
But select is only on Date...
Table deal is very big (>1 000 000 rows)
create table deal
(
Id Int, NOT NULL PRIMARY KEY NONCLUSTERED,
pid Int, NOT NULL,
Date smalldatetime NOT NULL
)
create clustered index pk ON deal (pid, Date)
select *
from deal
where Date between #d1 and #d2
I would recommend using ID as your clustered index and create a second index on date including ID and PID (Covering Index). If you do batch inserts then drop the date index and recreate it after to improve insert performance.
I would create my clustered index over id and create a non clustered index over date like this
CREATE NONCLUSTERED INDEX noncluxidxdate ON deal (Date)
INCLUDE (id, pid);
this post will help you understand
What do Clustered and Non clustered index actually mean?
I need to add a composite primary key (2 columns) to an already existing table. This key will also be a clustered index, so the order of the columns is important.
I am using the following script:
ALTER TABLE [Table]
ADD CONSTRAINT [PK_Table]
PRIMARY KEY CLUSTERED ([Col1] ASC, [Col2] ASC)
I need Col1 to be the first column of the clustered index, followed by Col2.
My question is if this script will do it (or do I need to explicitly set the order somehow?).
Appreciate it.
That T-SQL statement is doing exactly what you say you need.
The order of the columns is the order in which you write them down in your T-SQL statement - there's no need nor any way to otherwise specify their order.
Your T-SQL statement will create a clustered index with Col1 first, followed by Col2 - just as you want it to be.
I have this table (TableA):
(
[FieldA] [int] NOT NULL,
[FieldB] [int] NOT NULL,
[Value] [float] NULL
CONSTRAINT [PK_TableA] PRIMARY KEY CLUSTERED
(
[FieldA] ASC,
[FieldB] ASC
)
There are few distinct FieldA values, lets say FieldA can be {1,2,3,4,5,6}.
Why does this query causes a full table scan:
SELECT COUNT(*) FROM TableA WHERE FieldB = 1
While this doesn't:
SELECT COUNT(*) FROM TableA WHERE FieldB = 1 where FieldA in (1,2,3,4,5,6)
Can't Sql Server optimize this? If I had TableB where FieldA was a PK and I joined TableB and TableA the query would run similarly to the second query.
The clustered index you've created is based on two columns. If you're doing a lookup on just one of those columns, SQL Server cannot generate a "key" value to use in the lookup process on that index, so it falls back to a table-scan approach.
Even though FieldA has a very small range of values it could contain, the SQL optimizer doesn't look at that range of values to determine whether it could "fudge" a key out of the information you've given it.
If you want to improve the performance of the first query, you will have to create another index on FieldB. If, as you say, there are not many distinct values in FieldA, and you do most of your lookups on a FieldB exclusively, you might want to consider moving your clustered index to be built only on FieldB and generate a unique index over FieldA and FieldB.
Apparently, what I was looking for is a skip-scan optimization which is available on Oracle but not on SQL Server. Skip scan can utilize an index if the leading edge column predicate is missing:
http://social.msdn.microsoft.com/Forums/eu/transactsql/thread/48de15ad-f8e9-4930-9f40-ca74946bc401
I am just wondering if there is any way to create index on agregate field?
CREATE TABLE test2 (
id INTEGER,
name VARCHAR(10),
family VARCHAR(10),
amount INTEGER)
CREATE VIEW dbo.test2_v WITH SCHEMABINDING
AS
SELECT id, SUM(amount) as amount, COUNT_BIG(*) as tmp
FROM dbo.test2
GROUP BY id
CREATE UNIQUE CLUSTERED INDEX vIdx ON test2_v(amount)
I have next error message with this code:
Cannot create the clustered index
"vIdx" on view "test.dbo.test2_v"
because the index key includes columns
that are not in the GROUP BY clause.
Consider eliminating columns that are
not in the GROUP BY clause from the
index key.
One of the restrictions when creating an unique clustered index on a view is.
If the view definition contains a
GROUP BY clause, the key of the unique
clustered index can reference only the
columns specified in the GROUP BY
clause.
http://msdn.microsoft.com/en-us/library/ms188783.aspx
So you firstly need to create a UNIQUE CLUSTERED index on your key, then you can create a NONCLUSTERED index on your amount column.
CREATE UNIQUE CLUSTERED INDEX vIdx ON test2_v(id)
CREATE NONCLUSTERED INDEX ix_test2_v_amount ON test2_v(amount)
Your index needs to include ID most likely. Amount won't be unique enough either
As per the error message
...the index key includes columns that are not in the GROUP BY clause....