Necessary to create index on multi field primary key in SQL server? - sql-server

Given the database table:
UserID (PK)
SomeTypeID (PK)
SomeSubTypeID (PK)
Data
And you wish to query:
SELECT Data FROM Table WHERE UserID = {0} AND SomeTypeID = {1} AND SomeSubTypeID = {2}
Would you need to create the index UserID, SomeTypeID, SomeSubTypeID or does the fact they form the primary key mean this is not needed?

If you created your primary key as:
CREATE TABLE TBL (UserID, SomeTypeID, SomeSubType, Data
CONSTRAINT PK PRIMARY KEY (UserID, SomeTypeID, SomeSubType))
Then the default index that is being created is a CLUSTERED index.
Usually (so not all times), when looking for data, you would want your queries to use a NON-CLUSTERED index to filter rows, where the columns you use to filter rows will form the key of the index and the information (column) that you return from those rows as an INCLUDED column, in this case DATA, like below:
CREATE NONCLUSTERED INDEX ncl_indx
ON TBL (UserID, SomeTypeID, SomeSubType) INCLUDE (Data);
By doing this, you're avoiding accessing the table data, through the CLUSTERED index.
But, you can specify the type of index that you want your PRIMARY KEY to be, so:
CREATE TABLE TBL (UserID, SomeTypeID, SomeSubType, Data
CONSTRAINT PK PRIMARY KEY NONCLUSTERED (UserID, SomeTypeID, SomeSubType));
Buuut, because you want this to be defined as a PRIMARY KEY then you are not able to use the INCLUDE functionality, so you can't avoid the disk lookup in order to get the information from the DATA column, which is where you basically are with having the default CLUSTERED index.
Buuuuuut, there's still a way to ensure the uniqueness that the Primary Key gives you and benefit from the INCLUDE functionality, so as to do as fewer disk I/O's.
You can specify your NONCLUSTERED INDEX as UNIQUE which will ensure that all of your 3 columns that make up the index key are unique.
CREATE UNIQUE NONCLUSTERED INDEX ncl_indx
ON TBL (UserID, SomeTypeID, SomeSubType) INCLUDE (Data);
By doing all of these then your table is going to be a HEAP, which is not a very good thing. If you've given it a good thought in designing your tables and decided that the best clustering key for your CLUSTERED INDEX is (UserID, SomeTypeID, SomeSubType), then it's best to leave everything as you currently have it.
Otherwise, if you have decided on a different clustering key then you can add this unique nonclustered index, if you're going to query the table as you said you will.

AS long as you use all the columns used in your primary key when filtering you don't need to create seperate indexes. Your primary key is ok in your example.
Think of creating seperate index if you plan to filter on one of the columns and not the others. For example: SELECT Data FROM Table WHERE UserID = {0}

Related

Unique constraint and index

I have a table in SQL Server containing some user related info where the primary key is id (auto increment by 1) and has a column named userId. Each user can only has one record in the table, so I have added a unique constraint on column userId. As per SQL Server docs, SQL Server will automatically create an index for the unique constraint column.
For the usage on the table, there can be many update and insert operations, as well as select operations, and that's where my questions arise.
I see that the index that got created automatically by SQL Server on the unique constraint column is a non-clustered index, where it is good for update and insert operations, but for select operation, it is not as fast as the clustered index. (ref. differences-between-a-clustered-and-a-non-clustered-index)
For this table, there can be many select by userId operations. From the performance perspective, should a clustered index on userId be created, given that clustered index is the fastest for read operations ?
If yes, but a non-clustered index has already been automatically created on column userId, could a clustered index still be created on the userId column? (I have found some similar question, from the answers, it seem like if doing so, it will first search through the non-clustered index, then it will points to the clustered index and continue that search non-clustered-index-and-clustered-index-on-the-same-column)
Assuming your table was created in the following manner:
CREATE TABLE dbo.users
(
id int identity(1,1),
userId int,
userName varchar(100),
emailAddress varchar(100),
constraint PK_dbo_users primary key (Id)
);
alter table dbo.users
add constraint UNQ_dbo_users_userId UNIQUE(userId);
... then you already have a clustered index on "id" column by default.
A table can only have one clustered index, as Jonathon Willcock mentioned in the comments. So you cannot add another clustered index to userId column.
You also cannot recreate the clustered index to switch it to the userId column, as the constraints must much the existing constraint. Also, assuming there are foreign key references involved from other tables, you would have to drop the foreign keys before you can drop the users table.
Another option is to create a nonclustered covering index with an INCLUDE clause that contains all the columns needed for your query. This will avoid key lookups in the query plan.
For example:
create nonclustered index IX_dbo_users
on dbo.users (userId) include (id, userName, emailAddress);
Whether the PK and/or clustered index should be on userId or Id column depends on your users queries. If more queries, or more important queries, rely on "id" having clustered index, then keep it. Etc.
But if your table does not already have a clustered index, then yes, add it on userId column.

How indexes work for below queries?

I have created the below table with primary key:
create table test2
(id int primary key,name varchar(20))
insert into test2 values
(1,'mahesh'),(2,'ram'),(3,'sham')
then created the non clustered index on it.
create nonclustered index ind_non_name on test2(name)
when I write below query it will always you non clustered indexes in query execution plan.
select COUNT(*) from test2
select id from test2
select * from test2
Could you please help me to understand why it always use non clustered index even if we have clustered index on table?
Thanks in advance.
Basically when you create a non-clustered index on name, the index actually contains name and id, so it kind of contains all the table itself.
If you add another field like this:
create table test4
(id int primary key clustered,name varchar(20), name2 varchar(20))
insert into test4 values
(1,'mahesh','mahesh'),(2,'ram','mahesh'),(3,'sham','mahesh')
create nonclustered index ind_non_name on test4(name)
You'll see that some of the queries will start using the clustered index.
In your case the indexes are pretty much the same thing, since clustered index also contains the data, your clustered index is id, name and non clustered indexes contain the clustering key, so the non-clustered index is name, id.
You don't have any search criteria, so no matter which index is used, it must be scanned completely anyhow, so why should it actually use the clustered index?
If you add third field you your table, then at least select * will use clustered index.
You are confusing Primary Keys with clustering keys. They are not the same. You will need to explicitly create the clustering key.
To create the clustering key on the primary key in the create statement:
create table test2
(id int ,name varchar(20)
constraint PK_ID_test2 primary key clustered(id))
To add the clustering key to what you have already:
ALTER TABLE test2
ADD CONSTRAINT PK_ID_test2 primary key clustered(id)

designing new table for daily uploads - use unique constraint

I am using SQL Server 2012 & am creating a table that will have 8 columns, types below
datetime
varchar(12)
varchar(6)
varchar(100)
float
float
int
datetime
Once a day (normally) there will be an upload of approx 10,000 rows of data. Going forward its possible it could be 100,000.
The rows will be unique if I group on the first three columns listed above. I have read I can use the unique constraint on multiple columns which will guarantee the rows are unique.
I think I'm correct in saying that the unique constraint by default sets up non-clustered index. Would a clustered index be better & assuming when the table starts to contain millions of rows this won't cause any issues?
My last question. By applying the unique constraint on my table I am right to say querying the data will be quicker than if the unique constraint wasn't applied (because of the non-clustering or clustering) & uploading the data will be slower (which is fine) with the constraint on the table?
Unique index can be non-clustered.
Primary key is unique and can be clustered
Clustered index is not unique by default
Unique clustered index is unique :)
Mor information you can get from this guide.
So, we should separate uniqueness and index keys.
If you need to kepp data unique by some column - create uniqe contraint (unique index). You'll protect your data.
Also, you can create primary key (PK) on your columns - they will be unique also. But, there is a difference: all other indexies will use PK for referencing, so PK must be as short as possible. So, my advice - create Identity column (int or bigint) and create PK on it. And, create unique index on your unique columns.
Querying data may become faster, if you do queries on your unique columns, if you do query on other columns - you need to create other, specific indexies.
So, unique keys - for data consistency, indexies - for queries.
I think I'm correct in saying that the unique constraint by default
sets up non-clustered index
TRUE
Would a clustered index be better & assuming when the table starts to
contain millions of rows this won't cause any issues?
(1)if u need to make (datetime ,varchar(12), varchar(6)) Unique
(2)if you application or you will access rows using datetime or datetime ,varchar(12) or datetime ,varchar(12), varchar(6) in where condition
ALL the time
then have primary key on (datetime ,varchar(12), varchar(6))
by default it will put Uniqness and clustered index on all above three column.
but as you commented above:
the queries will vary to be honest. I imagine most queries will make
use of the first datetime column
and you will deal with huge data and might join this table with other tables
then its better have a surrogate key( ever-increasing unique identifier ) in the table and to satisfy your Selects
have Non-Clustered INDEXES
Surrogate Key vs Business Key
NON-CLUSTERED INDEX

Primary keys without defaul index (sort) - SQL2005

How do I switch off the default index on primary keys
I dont want all my tables to be indexed (sorted) but they must have a primary key
You can define a primary key index as NONCLUSTERED to prevent the table rows from being ordered according to the primary key, but you cannot define a primary key without some associated index.
Tables are always unsorted - there is no "default" order for a table and the optimiser may or may not choose to use an index if one exists.
In SQL Server an index is effectively the only way to implement a key. You get a choice between clustered or nonclustered indexes - that is all.
The means by which SQL Server implements Primary and Unique keys is by placing an index on those columns. So you cannot have a Primary Key (or Unique constraint) without an index.
You can tell SQL Server to use a nonclustered index to implement these indexes. If there are only nonclustered indexes on a table (or no indexes at all), you have a heap. It's pretty rare that this is what you actually want.
Just because a table has a clustered index, this in no way indicates that the rows of the table will be returned in the "order" defined by such an index - the fact that the rows are usually returned in that order is an implementation quirk.
And the actual code would be:
CREATE TABLE T (
Column1 char(1) not null,
Column2 char(1) not null,
Column3 char(1) not null,
constraint PK_T PRIMARY KEY NONCLUSTERED (Column2,Column3)
)
What does " I dont want all my tables to be sorted" mean ? If it means that you want the rows to appear in the order where they've been entered, there's only one way to garantee it: have a field that stores that order (or the time if you don't have a lot of transactions). And in that case, you will want to have a clustered index on that field for best performance.
You might end up with a non clustered PK (like the productId) AND a clustered unique index on your autonumber_or_timestamp field for max performance.
But that's really depending on the reality your're trying to model, and your question contains too little information about this. DB design is NOT abstract thinking.

Incorrect value for UNIQUE_CONSTRAINT_NAME in REFERENTIAL_CONSTRAINTS

I am listing all FK constraints for a given table using INFORMATION_SCHEMA set of views with the following query:
SELECT X.UNIQUE_CONSTRAINT_NAME,
"C".*, "X".*
FROM "INFORMATION_SCHEMA"."KEY_COLUMN_USAGE" AS "C"
INNER JOIN "INFORMATION_SCHEMA"."REFERENTIAL_CONSTRAINTS" AS "X"
ON "C"."CONSTRAINT_NAME" = "X"."CONSTRAINT_NAME"
AND "C"."TABLE_NAME" = 'MY_TABLE'
AND "C"."TABLE_SCHEMA" = 'MY_SCHEMA'
Everything works perfectly well, but for one particular constraint the value of UNIQUE_CONSTRAINT_NAME column is wrong, and I need it in order to find additional information from the referenced Column. Basically, for most of the rows the UNIQUE_CONSTRAINT_NAME contains the name of the unique constraint (or PK) in the referenced table, but for one particular FK it is the name of some other unique constraint.
I dropped and re-created the FK - did not help.
My assumption is that the meta-data is somehow screwed. Is there a way to rebuild the meta data so that the INFORMATION_SCHEMA views would actually show the correct data?
edit-1: sample db structure
CREATE TABLE MY_PARENT_TABLE (
ID INTEGER,
NAME VARCHAR,
--//...
CONSTRAINT MY_PARENT_TABLE_PK PRIMARY KEY CLUSTERED (ID)
)
CREATE UNIQUE NONCLUSTERED INDEX MY_PARENT_TABLE_u_nci_ID_LongName ON MY_PARENT_TABLE (ID ASC) INCLUDE (SOME_OTHER_COLUMN)
CREATE TABLE MY_CHILD_TABLE (
ID INTEGER,
PID INTEGER,
NAME VARCHAR,
CONSTRAINT MY_CHILD_TABLE_PK PRIMARY KEY CLUSTERED (ID)
,CONSTRAINT MY_CHILD_TABLE__MY_PARENT_TABLE__FK
FOREIGN KEY (PID)
REFERENCES MY_PARENT_TABLE (ID)
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
I expect the UNIQUE_CONSTRAINT_NAME to be MY_PARENT_TABLE_PK, but what I am
getting is MY_PARENT_TABLE_u_nci_ID_LongName.
Having looked at the structure, I see that in fact there are 2 UNIQUE constaints on that column - PK and the MY_PARENT_TABLE_u_nci_ID_LongName. So the real question should probably be: why does it take some other unique index and not the PK?
Since you have both a PK and a UNIQUE constraint on the same column, SQL Server picks one to use. I don't know if it picks the UNIQUE constraint because it is thinner (i.e. fewer columns involved) and might require fewer reads to confirm matches(?)
I don't see any way within SQL to enforce which one it chooses, other than ordering your scripts - create the table with the PK, create the other table and the FK, then create the UNIQUE constraint if you really need it - but is that really the case?

Resources