Indexing View on Aggregate Field - sql-server

I am just wondering if there is any way to create index on agregate field?
CREATE TABLE test2 (
id INTEGER,
name VARCHAR(10),
family VARCHAR(10),
amount INTEGER)
CREATE VIEW dbo.test2_v WITH SCHEMABINDING
AS
SELECT id, SUM(amount) as amount, COUNT_BIG(*) as tmp
FROM dbo.test2
GROUP BY id
CREATE UNIQUE CLUSTERED INDEX vIdx ON test2_v(amount)
I have next error message with this code:
Cannot create the clustered index
"vIdx" on view "test.dbo.test2_v"
because the index key includes columns
that are not in the GROUP BY clause.
Consider eliminating columns that are
not in the GROUP BY clause from the
index key.

One of the restrictions when creating an unique clustered index on a view is.
If the view definition contains a
GROUP BY clause, the key of the unique
clustered index can reference only the
columns specified in the GROUP BY
clause.
http://msdn.microsoft.com/en-us/library/ms188783.aspx
So you firstly need to create a UNIQUE CLUSTERED index on your key, then you can create a NONCLUSTERED index on your amount column.
CREATE UNIQUE CLUSTERED INDEX vIdx ON test2_v(id)
CREATE NONCLUSTERED INDEX ix_test2_v_amount ON test2_v(amount)

Your index needs to include ID most likely. Amount won't be unique enough either
As per the error message
...the index key includes columns that are not in the GROUP BY clause....

Related

Unique constraint and index

I have a table in SQL Server containing some user related info where the primary key is id (auto increment by 1) and has a column named userId. Each user can only has one record in the table, so I have added a unique constraint on column userId. As per SQL Server docs, SQL Server will automatically create an index for the unique constraint column.
For the usage on the table, there can be many update and insert operations, as well as select operations, and that's where my questions arise.
I see that the index that got created automatically by SQL Server on the unique constraint column is a non-clustered index, where it is good for update and insert operations, but for select operation, it is not as fast as the clustered index. (ref. differences-between-a-clustered-and-a-non-clustered-index)
For this table, there can be many select by userId operations. From the performance perspective, should a clustered index on userId be created, given that clustered index is the fastest for read operations ?
If yes, but a non-clustered index has already been automatically created on column userId, could a clustered index still be created on the userId column? (I have found some similar question, from the answers, it seem like if doing so, it will first search through the non-clustered index, then it will points to the clustered index and continue that search non-clustered-index-and-clustered-index-on-the-same-column)
Assuming your table was created in the following manner:
CREATE TABLE dbo.users
(
id int identity(1,1),
userId int,
userName varchar(100),
emailAddress varchar(100),
constraint PK_dbo_users primary key (Id)
);
alter table dbo.users
add constraint UNQ_dbo_users_userId UNIQUE(userId);
... then you already have a clustered index on "id" column by default.
A table can only have one clustered index, as Jonathon Willcock mentioned in the comments. So you cannot add another clustered index to userId column.
You also cannot recreate the clustered index to switch it to the userId column, as the constraints must much the existing constraint. Also, assuming there are foreign key references involved from other tables, you would have to drop the foreign keys before you can drop the users table.
Another option is to create a nonclustered covering index with an INCLUDE clause that contains all the columns needed for your query. This will avoid key lookups in the query plan.
For example:
create nonclustered index IX_dbo_users
on dbo.users (userId) include (id, userName, emailAddress);
Whether the PK and/or clustered index should be on userId or Id column depends on your users queries. If more queries, or more important queries, rely on "id" having clustered index, then keep it. Etc.
But if your table does not already have a clustered index, then yes, add it on userId column.

Choosing the right index to prevent duplicates but not affect order

I have a table containing an INT IDENTITY column, and a VARCHAR(10) column. I want to have a UNIQUE CLUSTERED INDEX on the IDENTITY column (so it's a primary key), and I also want to have a way of preventing duplicate values in the VARCHAR(10) column. However, the data should be ordered by the IDENTITY column in ascending order.
For example:
CREATE TABLE ref.currency (
currency_id INT NOT NULL IDENTITY(1,1),
currency_name VARCHAR(10) NOT NULL
);
GO
CREATE UNIQUE CLUSTERED INDEX ix_ref_currency_id ON ref.currency(currency_id);
GO
INSERT INTO ref.currency (
currency_name
)
VALUES
('Pounds'),
('Euros'),
('Dollars');
(I create a UNIQUE CLUSTERED INDEX instead of a PRIMARY KEY, as I was taught this, but didnt fully understand the reasons why one should be chosen over the other. I've stuck with it ever since.)
Question:
What type of index should I add to this table in order to prevent duplicate values being added to the currency_name column, that will not affect the order of the data?
I've tried adding a UNQIUE NONCLUSTERED INDEX, however this results in the data being ordered by currency_name, which I do not want.

Create Clustered Index on (Date + Key)

There is a transaction table that have 40 millions of data. There are 100 columns in the table.
For simply, there are 3 important columns (HeaderID, HeaderLineID, OrderDate) and the unique identifier is (HeaderID, HeaderLineID).
CREATE TABLE [dbo].[T_Table](
[HeaderID] [nvarchar](4) NOT NULL,
[HeaderLineID] [nvarchar](10) NOT NULL,
[OrderDate] [datetime] NOT NULL
) ON [FG_Index]
GO
CREATE CLUSTERED INDEX [OrderDate] ON [dbo].[T_Table]
(
[OrderDate] ASC
)
GO
CREATE NONCLUSTERED INDEX [Key] ON [dbo].[T_Table]
(
[HeaderID] ASC,
[HeaderLineID] ASC
)
GO
For normal usage, we select the data based on date range
select * from T_Table
where OrderDate between '2015-01-01' and '2015-12-31'
Is it better approach to drop current keys and insert a clustered index key with Date + Key instead? That is,
CREATE CLUSTERED INDEX [NewKey] ON [dbo].[T_Table]
(
[OrderDate] ASC,
[HeaderID] ASC,
[HeaderLineID] ASC
)
GO
.
Replies from comments
explain what is HeaderID and HeaderLineID. Is combination of HeaderLineID & HeaderID unique?
HeaderID is the Order Number and HeaderLineID is the Order Line Number.
Combination of HeaderID+HeaderLineID is unique.
Which will be most frequently use in search ? Selectivity of Order Date vs Selectivity of HeaderLineID & HeaderID.
OrderDate could be found in filter condition
HeaderLineID could be found in joining condition to other tables
HeaderID, HeaderLineID, OrderDate could be found in output result
Your index will not perform good ,for the queries you have,if your order date is not unique and if you have more queries like below
select * from T_Table
where OrderDate between '2015-01-01' and '2015-12-31'
i suggest creating a non clustered index with below definition
create index nci_somename on t_table(orderdate)
include(HeaderID, HeaderLineID)
Having a clustered index is good,but i don't recommend it ,if it won't satisfy your queries
i) What is the volume of transaction per date ?
ii) You must hv read this example where table scan was done instead of CI seek becasue optmizer felt table scan was more cose effective way.Similarly can be your case.
iii) Critical error: 100 column in single table is itself wrong.For how many column you include in NON Clustered covering index.At most 20-25 column are comonn and important across all req. rest column are AREA specific hence mostly sparse.Putting all columns in single table is not an example of DeNormalization.
iv) Is data really normalise ?I mean do you hv repeative rows.for example suppose two item was ordered in single orderid then how it is stored in this scenario.If two item are store in same table then it is not an example of DeNormalization.
v) Create CI on unique sequential column. Create Non Clsutered index on OrderDate include (*some common important column)
*since no idea about rest column and details.

Necessary to create index on multi field primary key in SQL server?

Given the database table:
UserID (PK)
SomeTypeID (PK)
SomeSubTypeID (PK)
Data
And you wish to query:
SELECT Data FROM Table WHERE UserID = {0} AND SomeTypeID = {1} AND SomeSubTypeID = {2}
Would you need to create the index UserID, SomeTypeID, SomeSubTypeID or does the fact they form the primary key mean this is not needed?
If you created your primary key as:
CREATE TABLE TBL (UserID, SomeTypeID, SomeSubType, Data
CONSTRAINT PK PRIMARY KEY (UserID, SomeTypeID, SomeSubType))
Then the default index that is being created is a CLUSTERED index.
Usually (so not all times), when looking for data, you would want your queries to use a NON-CLUSTERED index to filter rows, where the columns you use to filter rows will form the key of the index and the information (column) that you return from those rows as an INCLUDED column, in this case DATA, like below:
CREATE NONCLUSTERED INDEX ncl_indx
ON TBL (UserID, SomeTypeID, SomeSubType) INCLUDE (Data);
By doing this, you're avoiding accessing the table data, through the CLUSTERED index.
But, you can specify the type of index that you want your PRIMARY KEY to be, so:
CREATE TABLE TBL (UserID, SomeTypeID, SomeSubType, Data
CONSTRAINT PK PRIMARY KEY NONCLUSTERED (UserID, SomeTypeID, SomeSubType));
Buuut, because you want this to be defined as a PRIMARY KEY then you are not able to use the INCLUDE functionality, so you can't avoid the disk lookup in order to get the information from the DATA column, which is where you basically are with having the default CLUSTERED index.
Buuuuuut, there's still a way to ensure the uniqueness that the Primary Key gives you and benefit from the INCLUDE functionality, so as to do as fewer disk I/O's.
You can specify your NONCLUSTERED INDEX as UNIQUE which will ensure that all of your 3 columns that make up the index key are unique.
CREATE UNIQUE NONCLUSTERED INDEX ncl_indx
ON TBL (UserID, SomeTypeID, SomeSubType) INCLUDE (Data);
By doing all of these then your table is going to be a HEAP, which is not a very good thing. If you've given it a good thought in designing your tables and decided that the best clustering key for your CLUSTERED INDEX is (UserID, SomeTypeID, SomeSubType), then it's best to leave everything as you currently have it.
Otherwise, if you have decided on a different clustering key then you can add this unique nonclustered index, if you're going to query the table as you said you will.
AS long as you use all the columns used in your primary key when filtering you don't need to create seperate indexes. Your primary key is ok in your example.
Think of creating seperate index if you plan to filter on one of the columns and not the others. For example: SELECT Data FROM Table WHERE UserID = {0}

How indexes work for below queries?

I have created the below table with primary key:
create table test2
(id int primary key,name varchar(20))
insert into test2 values
(1,'mahesh'),(2,'ram'),(3,'sham')
then created the non clustered index on it.
create nonclustered index ind_non_name on test2(name)
when I write below query it will always you non clustered indexes in query execution plan.
select COUNT(*) from test2
select id from test2
select * from test2
Could you please help me to understand why it always use non clustered index even if we have clustered index on table?
Thanks in advance.
Basically when you create a non-clustered index on name, the index actually contains name and id, so it kind of contains all the table itself.
If you add another field like this:
create table test4
(id int primary key clustered,name varchar(20), name2 varchar(20))
insert into test4 values
(1,'mahesh','mahesh'),(2,'ram','mahesh'),(3,'sham','mahesh')
create nonclustered index ind_non_name on test4(name)
You'll see that some of the queries will start using the clustered index.
In your case the indexes are pretty much the same thing, since clustered index also contains the data, your clustered index is id, name and non clustered indexes contain the clustering key, so the non-clustered index is name, id.
You don't have any search criteria, so no matter which index is used, it must be scanned completely anyhow, so why should it actually use the clustered index?
If you add third field you your table, then at least select * will use clustered index.
You are confusing Primary Keys with clustering keys. They are not the same. You will need to explicitly create the clustering key.
To create the clustering key on the primary key in the create statement:
create table test2
(id int ,name varchar(20)
constraint PK_ID_test2 primary key clustered(id))
To add the clustering key to what you have already:
ALTER TABLE test2
ADD CONSTRAINT PK_ID_test2 primary key clustered(id)

Resources