Are 'Primary Keys' obligatory in SQL Server Design? - sql-server

Observe the following table model:
CREATE TABLE [site].[Permissions] (
[ID] INT REFERENCES [site].[Accounts]( [ID] ) NOT NULL,
[Type] SMALLINT NOT NULL,
[Value] INT NULL
);
The site.Accounts->site.Permissions is a one-to-many relationship so 'ID' cannot be made a primary key due to the uniqueness that a PK imposes.
The rows are selected using a WHERE [ID] = ? clause, so adding a phoney IDENTITY column and making it the PK yields no benefit at the cost of additional disk space.
Its my understanding that the targeted platform - SQL Server (2008) - does not support composite PKs. These all add up to my question: If a Primary Key is not used, so something wrong? Or could something be more right?

Your understanding is not correct, SQL Server does support composite primary keys!
The syntax to add one would be
ALTER TABLE [site].[Permissions]
ADD CONSTRAINT PK_Permissions PRIMARY KEY CLUSTERED (id,[Type])
Regarding the question in the comments "What is the benefit of placing a PK on the entire table?"
I'm not sure from your description though what the PK would need to be on. Is it all 3 columns or just 2 of them? If it's on id,[Type] then presumably you wouldn't want the possibility that the same id,[Type] combo could appear multiple times with conflicting values.
If it is on all 3 columns then to turn the question around why wouldn't you want a primary key?
If you are going to have a clustered index on your table you could just make that the primary key. If say you made a clustered index on the id column only SQL Server would add in uniqueifiers anyway to make it unique and your columns are so narrow (int,smallint,int) this just seems a pointless addition.
Additionally the query optimiser can use unique constraints to improve its query plans (though might not apply if the only queries on that table really are WHERE [ID] = ?) and it would be pretty wasteful to allow duplicates that you then have to both store and filter out with DISTINCT.

Related

Azure Synapse Analytics: Can I use non-unique column as hash column in hash distributed tables?

I'm using Dedicated SQL Pools (AKA Azure Synapse Analytics). Trying to optimize a fact table and according to documentation FACT tables should be hash distributed for better performance.
Problems is:
My fact table has a composite primary key.
You can specify only column as hash distribution column.
Can I use one of those columns as distribution column? Any one of the columns would have duplicates, though they are all NOT NULL.
CREATE TABLE myTable
(
[ITEM] [varchar](50) NOT NULL,
[LOC] [varchar](50) NOT NULL,
[MEASURE] [varchar](50) NOT NULL
CONSTRAINT [PK] PRIMARY KEY NONCLUSTERED
(
[LOC] ASC,
[ITEM] ASC
) NOT ENFORCED
)
WITH
(
DISTRIBUTION = HASH([ITEM]),
CLUSTERED COLUMNSTORE INDEX
)
Yes, you can! You can use any column as a hash distribution column, but be aware that this introduces a constraint into your table: you cannot drop the distribution column.
There are two reasons to use a hash distribution column: one is the to prevent data movement across distributions for queries, but the other is to ensure even distribution of data across your distributions to ensure all the workers are efficiently used in queries. Hash-distributing by a non-skewed column, even if not unique, can help with the second case.
However, if you do want to distribute by your primary key, consider creating a composite primary key by hashing together the different columns of your composite primary key. You can hash-distribute by your hashed key and this will also hopefully reduce data movement if you need to upsert on that hashed key later.

designing new table for daily uploads - use unique constraint

I am using SQL Server 2012 & am creating a table that will have 8 columns, types below
datetime
varchar(12)
varchar(6)
varchar(100)
float
float
int
datetime
Once a day (normally) there will be an upload of approx 10,000 rows of data. Going forward its possible it could be 100,000.
The rows will be unique if I group on the first three columns listed above. I have read I can use the unique constraint on multiple columns which will guarantee the rows are unique.
I think I'm correct in saying that the unique constraint by default sets up non-clustered index. Would a clustered index be better & assuming when the table starts to contain millions of rows this won't cause any issues?
My last question. By applying the unique constraint on my table I am right to say querying the data will be quicker than if the unique constraint wasn't applied (because of the non-clustering or clustering) & uploading the data will be slower (which is fine) with the constraint on the table?
Unique index can be non-clustered.
Primary key is unique and can be clustered
Clustered index is not unique by default
Unique clustered index is unique :)
Mor information you can get from this guide.
So, we should separate uniqueness and index keys.
If you need to kepp data unique by some column - create uniqe contraint (unique index). You'll protect your data.
Also, you can create primary key (PK) on your columns - they will be unique also. But, there is a difference: all other indexies will use PK for referencing, so PK must be as short as possible. So, my advice - create Identity column (int or bigint) and create PK on it. And, create unique index on your unique columns.
Querying data may become faster, if you do queries on your unique columns, if you do query on other columns - you need to create other, specific indexies.
So, unique keys - for data consistency, indexies - for queries.
I think I'm correct in saying that the unique constraint by default
sets up non-clustered index
TRUE
Would a clustered index be better & assuming when the table starts to
contain millions of rows this won't cause any issues?
(1)if u need to make (datetime ,varchar(12), varchar(6)) Unique
(2)if you application or you will access rows using datetime or datetime ,varchar(12) or datetime ,varchar(12), varchar(6) in where condition
ALL the time
then have primary key on (datetime ,varchar(12), varchar(6))
by default it will put Uniqness and clustered index on all above three column.
but as you commented above:
the queries will vary to be honest. I imagine most queries will make
use of the first datetime column
and you will deal with huge data and might join this table with other tables
then its better have a surrogate key( ever-increasing unique identifier ) in the table and to satisfy your Selects
have Non-Clustered INDEXES
Surrogate Key vs Business Key
NON-CLUSTERED INDEX

PRIMARY KEYs vs. UNIQUE Constraints

In an Alexander Kuznetsov article, he presents the follow code snippet:
CREATE TABLE dbo.Vehicles(
ID INT NOT NULL,
[Type] VARCHAR(5) NOT NULL,
CONSTRAINT Vehicles_PK PRIMARY KEY(ID),
CONSTRAINT Vehicles_UNQ_ID_Type UNIQUE(ID, [Type]),
CONSTRAINT Vehicles_CHK_ValidTypes CHECK([Type] IN ('Car', 'Truck'))
);
This snippet raises a few questions for me.
Why is it necessary to include both ID and Type in the unique constraint? If just ID is unique, then the combination of the two columns will always be unique as well.
Also, I know how to set a primary key and specify if it unique in SSMS. But how would I specify a primary key on one column, and make a unique constraint on a combination of columns? Does this create two indexes?
This came up because I'm trying to implement similar code, which does not create a composite primary key, and I get the following error. So I'm trying to understand this code better.
The columns in table 'MyTable' do not match an existing primary key or UNIQUE constraint.
EDIT
I was able to get this working by simply creating a composite primary key in MyTable. The actual table definition is shown below. Again, this works. But it is not the same as the code quoted above. And I'm not sure if it would be better if I did it the other way.
CREATE TABLE [dbo].[MessageThread](
[Id] [int] IDENTITY(1,1) NOT NULL,
[MessageThreadType] [int] NOT NULL,
CONSTRAINT [PK_MessageThread_1] PRIMARY KEY CLUSTERED
(
[Id] ASC,
[MessageThreadType] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[MessageThread] WITH CHECK ADD CONSTRAINT [CK_MessageThread_ValidType] CHECK (([MessageThreadType]=(2) OR [MessageThreadType]=(1)))
GO
ALTER TABLE [dbo].[MessageThread] CHECK CONSTRAINT [CK_MessageThread_ValidType]
GO
1 : I am not sure of the specific purpose of the given schema. But note that a unique constraint can be applied for multiple reasons, most commonly: (a) to enforce uniqueness and (b) to provide the optimizer with more information to base decisions.
2 : A unique constraint does not create two indexes. It creates a single index with one of the columns as the leading key column. It enforces uniqueness on both. So a unique constraint on a,b could have:
a b
---- ----
1 1
1 2
2 1
2 2
Notice that neither of the columns enforce uniqueness individually. I am not a big fan of using the table designer in SSMS (it has tons of bugs and doesn't support all functionality) but here is how to do it:
a) right-click the grid and choose Indexes/Keys...
b) choose multiple columns using the [...] button in the Columns grid
c) change Type to Unique Key
d) change the Name if desired
Here's an example of a table that already has a primary key. I could add one or more unique indexes if I wanted to:
In my understanding, the reason for unique constraint on ID,[Type] is let detail tables to refer ID,[Type] as foreign key. Usually parent table is required to have unique constraint on columns used for foreign key. For instance, the table in the question can have 2 detail tables:
CREATE TABLE dbo.CARS(
....
vehicle_id INT NOT NULL,
[Type] VARCHAR(5) NOT NULL,
CONSTRAINT CAR_CHK_TYPE CHECK [Type]='Car',
CONSTRAINT CAR_FK_VEHICLE FOREIGN KEY (vehicle_id,[Type]) REFERENCES Vehincle(id,[Type]));
CREATE TABLE dbo.TRUCKS(
....
vehicle_id INT NOT NULL,
[Type] VARCHAR(5) NOT NULL,
CONSTRAINT CAR_CHK_TYPE CHECK [Type]='Truck',
CONSTRAINT CAR_FK_VEHICLE FOREIGN KEY (vehicle_id,[Type]) REFERENCES Vehincle(id,[Type]));
This way Cars will have details only about Car type, whereas TRUCKS only about Truck.
Such design is used to avoid polymorphic relationship, for instance
CREATE TABLE dbo.VEHICLE (
...,
ref_id INT NOT NULL,
-- PK of 'master' table
ref_name VARCHAR(20) NOT NULL,
-- here we put 'truck' or 'car', so we virtually have 2 parents;
-- in this case we cannot use FK constraint, the only thing that may
-- somehow enforce the logical constraint is writing a trigger
Update
Your updated table definition looks good to me. I guess the sample table was initially designed for Oracle and then ported to SQLServer. In Oracle, that unique constraint and primary key can use the same index, so there is no penalty for having both PK and Unique constraint.
Good question. Theoretically you're right; there is no reason, a record can always be uniquely identified by its PK and the unique constraint will always be satisfied as long as this is true. However, if ID and Type have some relationship outside the bounds of the data layer (maybe this table is the data model for an Enum?), then it's unlikely that there would be two different IDs with the same Type because the uniqueness of Type is enforced elsewhere. The constraint also sets up an index that includes both ID and Type, making the table relatively efficient to be queried by that combination of columns.
You set up a unique constraint using the "Manage Indexes and Keys" option. Yes, this will create an index and unique constraint for the primary key, and an index and unique constraint for the combination of PK and Type.
I suspect the reason for having both columns in the UNIQUE constraint is related to the error message you mentioned. SQL Server (in common with other SQL DBMSs) has a limitation that a FOREIGN KEY constraint can only reference exactly the set of columns defined by a uniqueness constraint. So if a FOREIGN KEY constraint references two columns then those two columns must have a uniqueness constraint on them - even if other constraints already guarantee uniqueness. This is a pointless limitation but it is part of standard SQL.
The following example is quite similar and explains why a composite foreign key and nested uniqueness constraints can be useful.
http://consultingblogs.emc.com/davidportas/archive/2007/01/08/Distributed-Keys-and-Disjoint-Subtypes.aspx
Here you go:
Cars and trucks have different attributes, so they do not belong in one table. This is why I have two tables, Cars and Trucks.
Yet cars and trucks share some attributes, such as VIN (vehicle idenification number). More to the point, VIN is unique. This is why I need a table Vehicles. A vehicle cannot be both a car and a truck, so I must make sure it is not possible to enter both (VIN=123456789, Type=Car) and (VIN=123456789, Type=Truck). This is why I have a PK on VIN only.
I must ensure that a vehicle cannot have corresponding rows in both Cars and Trucks tables. This is why I have Type column in Cars and Trucks, and this is why I want (VIN, Type) in child tables Cars and Trucks refer to the parent table Vehicles. The only reason why I need an additional unique constraint on (VIN, Type) is this: it is referred by FK constraints from child tables.
BTW, you could leave a comment on the blog - in that case sqlblog would send me a message. It is a coincidence that I noticed your question here; I was supposed to go skiing, only there is no snow.

Missing FK Relationship in Entity Framework Model

I had a lot of trouble implementing the technique described in an Alexander Kuznetsov article. Basically, the article describes a way to create a FK between one table and alternate tables, and still maintain full constraints on those relationship.
Here's part of Alexander's code:
CREATE TABLE dbo.Vehicles(
ID INT NOT NULL,
[Type] VARCHAR(5) NOT NULL,
CONSTRAINT Vehicles_PK PRIMARY KEY(ID),
CONSTRAINT Vehicles_UNQ_ID_Type UNIQUE(ID, [Type]),
CONSTRAINT Vehicles_CHK_ValidTypes CHECK([Type] IN ('Car', 'Truck'))
)
CREATE TABLE dbo.Cars(ID INT NOT NULL,
[Type] AS CAST('Car' AS VARCHAR(5)) PERSISTED,
OtherData VARCHAR(10) NULL,
CONSTRAINT Cars_PK PRIMARY KEY(ID),
CONSTRAINT Cars_FK_Vehicles FOREIGN KEY(ID, [Type])
REFERENCES dbo.Vehicles(ID, [Type])
)
I finally got it working after errors and confirmed bugs. But when I generate my EF models from the new schema, it is missing a relationship between two of my tables.
The problem is that, in order to have a FK on two columns, there must be an index or unique constraint on both those columns. However, in my case, I also have another table with a FK to a single column in the base table (Vehicles, in Alexander's code).
Since you cannot have more than one PK in a table, this means I cannot have a FK to a PK on both sides. The PK can be for one or two columns, and the other FK will need to reference the non-PK unique constraint.
Unfortunately, Entity Framework will only create relationships for you when there is a FK to a PK. That's the problem. Can someone who understand DB design better than I spot any other alternatives here?
Note: I realize some will see the obvious fix as simply modifying the model to manually add the additional relationship. Unfortunately, we are using a database project and are constantly using automated systems to regenerate the project and model from an updated database. So manual steps are really not practical.
You can't have more than one PK, but you can have more than one unique constraint, and in SQL Server you can create a foreign key constraint that references a unique constraint (one or multiple columns). Here is an example of two tables that roughly look like your model.
CREATE TABLE dbo.Vehicles
(
VehicleID INT PRIMARY KEY,
[Type] VARCHAR(5) NOT NULL UNIQUE,
CONSTRAINT u1 UNIQUE(VehicleID, [Type])
);
CREATE TABLE dbo.Cars
(
CarID INT PRIMARY KEY,
VehicleID INT NOT NULL
FOREIGN KEY REFERENCES dbo.Vehicles(VehicleID),
[Type] VARCHAR(5) NOT NULL
FOREIGN KEY REFERENCES dbo.Vehicles([Type]),
CONSTRAINT fk1 FOREIGN KEY (VehicleID, [Type])
REFERENCES dbo.Vehicles(VehicleID, [Type])
);
Note that Cars has three foreign keys: one points to the PK of vehicles (VehicleID), one points to the unique constraint on Vehicles([Type]), and one points to the multi-column unique constraint on Vehicles(VehicleID, [Type]). I realize this is not equivalent to what you are trying to do but should demonstrate that SQL Server, at least, is capable of doing everything you seem to want to do (I'm having a hard time concluding what you're actually because you keep swapping concepts between what Alex did, what you're trying to do but failing, and what you've done successfully).
Are you saying that EF will not recognize a foreign key that references a unique constraint? If so, does that affect constraints that have more than one column, or all unique constraints? If this is the case, that's a shame, because it is certainly supported in SQL Server. Seems like this would either be a bug or an intentional omission (given that the standard doesn't strictly allow FKs against unique constraints). I wonder if there are any bugs reported on Connect?
I have no idea how to force EF to recognize it, but I do know that just about all the people I know who use database projects end up performing pre- or post-deployment modifications and these can be relatively automated.

Primary keys without defaul index (sort) - SQL2005

How do I switch off the default index on primary keys
I dont want all my tables to be indexed (sorted) but they must have a primary key
You can define a primary key index as NONCLUSTERED to prevent the table rows from being ordered according to the primary key, but you cannot define a primary key without some associated index.
Tables are always unsorted - there is no "default" order for a table and the optimiser may or may not choose to use an index if one exists.
In SQL Server an index is effectively the only way to implement a key. You get a choice between clustered or nonclustered indexes - that is all.
The means by which SQL Server implements Primary and Unique keys is by placing an index on those columns. So you cannot have a Primary Key (or Unique constraint) without an index.
You can tell SQL Server to use a nonclustered index to implement these indexes. If there are only nonclustered indexes on a table (or no indexes at all), you have a heap. It's pretty rare that this is what you actually want.
Just because a table has a clustered index, this in no way indicates that the rows of the table will be returned in the "order" defined by such an index - the fact that the rows are usually returned in that order is an implementation quirk.
And the actual code would be:
CREATE TABLE T (
Column1 char(1) not null,
Column2 char(1) not null,
Column3 char(1) not null,
constraint PK_T PRIMARY KEY NONCLUSTERED (Column2,Column3)
)
What does " I dont want all my tables to be sorted" mean ? If it means that you want the rows to appear in the order where they've been entered, there's only one way to garantee it: have a field that stores that order (or the time if you don't have a lot of transactions). And in that case, you will want to have a clustered index on that field for best performance.
You might end up with a non clustered PK (like the productId) AND a clustered unique index on your autonumber_or_timestamp field for max performance.
But that's really depending on the reality your're trying to model, and your question contains too little information about this. DB design is NOT abstract thinking.

Resources