I have a SQL Server database and it contains a table to record a employee salary.
It has 3 columns declared as foreign keys, and reference to the employee table's column, employee_id:
employee_id
submitted_by
confirmed_by
But is it best practice to make it all as FK, or do I only need employee_id?
Because in my application, submitted_by and confirmed_by will be selected by a drop down list and assume it exist on employee table.
Thanks you for advice.
Yes, since all users of your system are also Employees modelled by your system, if you wish to have Referential Integrity (RI) enforced in the database, all three columns should have foreign keys back to the referenced employee table. Note that since confirmed by sounds like part of a workflow process, where the user confirming may not be available at the time the record is inserted, you can make the field confirmed_by in table EmployeeSalary nullable (confirmed_by INT NULL), in which case RI will only be enforced at the later time when the field is actually populated.
You should name each of the foreign keys appropriately by expressing the role in the foreign key, e.g.
FK_EmployeeSalary_SalariedEmployee
FK_EmployeeSalary_EmployeeSubmittedBy
FK_EmployeeSalary_EmployeeConfirmedBy
Although the front end may restrict choices via the drop down, referential integrity is still beneficial:
Protect against bugs, e.g. where the submitted by employee is omitted (in the case of a non-nullable FK) or the employee provided doesn't exist in the employees table.
Prevent accidental deletion of an employee to which foreign key data is linked.
There is a (very) minor performance penalty on RI whereby the DB will need to check the existence of the PK in the employee table - in most instances this will be negligible.
Any column that references a key in another table should be declared as a foreign key. This way, if you mistakenly try to put a nonexistent value there, the database will report an error.
I was a developer in a certain project developed under sql-server and .Net, they don't use physical relations between their tables but they use logical ones " logical foreign keys ".
I asked them that for what reason they do that ,they say "it is more optimal".
What I really want to know, is it really more optimal or it is just a myth?
When it comes to reads from a database, whether foreign keys are defined or not doesn't come into it. There is no relationship between having foreign keys and the performance of reads.
Things that will effect performance are how the tables are stored, what indexes are defined on them and the stored statistics (just to name a few).
This is a bad justification for not having referential integrity in the database (in particular as it can be trivial to test).
Using the assumption that " logical foreign keys " are just values that reference a key in another table without a physical link between them in terms of constraints I can tell you what the benefits of the physical link is.
First of all a "physical" foreign key is a constraint and it enforces referential integrity between the two values. So that, if you want for example to use a foreign key that doesn't exist in the other table you will receive an error. The same thing will also happen if you try to delete a key that is a foreign key by constraint in another table.
Secondly it is arguable that it is more optimal since you can index the foreign key constraints and benefit from that for example when you use joins.
More on this: http://msdn.microsoft.com/en-us/library/ff647793.aspx
There is actually no physical difference between a "real" foreign key and a "logical" foreign key. They're both just columns in a table and don't affect the way that a table is stored on disk. This actually surprised me too when I first learned.
The only difference is that when you have a "real" foreign key, whenever a delete, update, or insert statement is ran on a table, the database server has to check that the value is being updated to a legitimate value. If you look at the execution plan for a statement that's an update, insert, delete, or merge, you'll actually see it has to scan or seek on all tables that have a foreign key.
This can be quite a performance overhead if there are a lot of foreign keys or there aren't helpful indexes.
Picture you have a table for Companies, and then another table for Employees. Your employees table will likely have a column called companyId.
When you run:
delete from Companies where companyId = 123;
The database server needs to make sure that there aren't any employees for that companyId. The same applies when you run:
insert into Employees (companyId, name) values (123, 'John');
The database server needs to search the companies table to make sure that the companyId 123 exists.
Yes it is faster to have only "logical" foreign keys. However, it comes at the cost of possible data corruption and might cost more time finding bugs and other sources of data corruption. Whether it's worth it is up to you. One thing to consider is that it doesn't affect read-only queries.
Edit As Martin Smith pointed out and I had left out, there are some cases where the foreign key would be faster. If there is an inner join on a table with a foreign key, and no columns are referenced by the second table, then the query doesn't have to hit the second table since it can trust the foreign key.
Let's consider the following scenario: I've a "master table" with a "detail" table. The detail table have just a foreign key pointing to the primary one ( not a primary key ). This is the schema that NHibernate generates for me when I map a simple bag. My question is, does the FK itself on the detail table suffices to have queryes without full scan? In other world, having or not having the FK defined on the detail table, does change the performance? I guess yes, but I don't know if I'm right, or where to find a source to explain it.
In both cases (FK or no FK) you'll have to create an index on the FK field in the details table to prevent a table scan. In sql server, when creating an FK constraint an index is not created automatically.
See MSDN, "Indexing FOREIGN KEY Constraints".
My question specifically about sql-server, but probably can be answered by anyone with any database background
If I want table A to have a 1:1 relationship with table B on a certain column, should I somehow modify the CREATE TABLE statement to identify this relationship or is this something that is not done at all (and rather it is handled by logic)?
EDIT
The second part of my question is: what is the point of embedding this into the code? why not just handle it logically on selects/updates?
All you need to do is have the column in Table A be a foreign key to the primary key of Table B:
create table TableB (
Id int primary key identity(1,1),
Name varchar(255))
create table TableA (
Id int primary key identity(1,1),
Name varchar(255),
TableBRelation int unique,
foreign key (TableBRelation) references TableB (Id))
The SQL may not be perfect but you should be able to get the idea.
As for why you would want to do this in the database rather than just application logic:
Other databases or developers may try to access your database. Do you want them to be able to create invalid data that may break your application? No. That's one of the points of referential integrity.
At some point, somebody is going to have to maintain your application. Defining your keys at the database level will clearly identify relationships between your data rather than requiring the develop to dig through your application code.
To create a 1:1 relationship just make the B table column a foreign key or unique. This will ensure that there can be only one column in table B that matches the PK field in table A and that way you effectively get a 1:1 relationship...
You can setup a foreign key and add a constraint for it to be unique. This would setup a 1:1 relationship between your tables.
I have three basic types of entities: People, Businesses, and Assets. Each Asset can be owned by one and only one Person or Business. Each Person and Business can own from 0 to many Assets. What would be the best practice for storing this type of conditional relationship in Microsoft SQL Server?
My initial plan is to have two nullable foreign keys in the Assets table, one for People and one for Businesses. One of these values will be null, while the other will point to the owner. The problem I see with this setup is that it requires application logic in order to be interpreted and enforced. Is this really the best possible solution or are there other options?
Introducing SuperTypes and SubTypes
I suggest that you use supertypes and subtypes. First, create PartyType and Party tables:
CREATE TABLE dbo.PartyType (
PartyTypeID int NOT NULL identity(1,1) CONSTRAINT PK_PartyType PRIMARY KEY CLUSTERED
Name varchar(32) CONSTRAINT UQ_PartyType_Name UNIQUE
);
INSERT dbo.PartyType VALUES ('Person'), ('Business');
SuperType
CREATE TABLE dbo.Party (
PartyID int identity(1,1) NOT NULL CONSTRAINT PK_Party PRIMARY KEY CLUSTERED,
FullName varchar(64) NOT NULL,
BeginDate smalldatetime, -- DOB for people or creation date for business
PartyTypeID int NOT NULL
CONSTRAINT FK_Party_PartyTypeID FOREIGN KEY REFERENCES dbo.PartyType (PartyTypeID)
);
SubTypes
Then, if there are columns that are unique to a Person, create a Person table with just those:
CREATE TABLE dbo.Person (
PersonPartyID int NOT NULL
CONSTRAINT PK_Person PRIMARY KEY CLUSTERED
CONSTRAINT FK_Person_PersonPartyID FOREIGN KEY REFERENCES dbo.Party (PartyID)
ON DELETE CASCADE,
-- add columns unique to people
);
And if there are columns that are unique to Businesses, create a Business table with just those:
CREATE TABLE dbo.Business (
BusinessPartyID int NOT NULL
CONSTRAINT PK_Business PRIMARY KEY CLUSTERED
CONSTRAINT FK_Business_BusinessPartyID FOREIGN KEY REFERENCES dbo.Party (PartyID)
ON DELETE CASCADE,
-- add columns unique to businesses
);
Usage and Notes
Finally, your Asset table will look something like this:
CREATE TABLE dbo.Asset (
AssetID int NOT NULL identity(1,1) CONSTRAINT PK_Asset PRIMARY KEY CLUSTERED,
PartyID int NOT NULL
CONSTRAINT FK_Asset_PartyID FOREIGN KEY REFERENCES dbo.Party (PartyID),
AssetTag varchar(64) CONSTRAINT UQ_Asset_AssetTag UNIQUE
);
The relationship the supertype Party table shares with the subtype tables Business and Person is "one to zero-or-one". Now, while the subtypes generally have no corresponding row in the other table, there is the possibility in this design of having a Party that ends up in both tables. However, you may actually like this: sometimes a person and a business are nearly interchangeable. If not useful, while a trigger to enforce this will be fairly easily done, the best solution is probably to add the PartyTypeID column to the subtype tables, making it part of the PK & FK, and put a CHECK constraint on the PartyTypeID.
The beauty of this model is that when you want to create a column that has a constraint to a business or a person, then you make the constraint to the appropriate table instead of the party table.
Also, if desired, turning on cascade delete on the constraints can be useful, as well as an INSTEAD OF DELETE trigger on the subtype tables that instead delete the corresponding IDs from the supertype table (this guarantees no supertype rows that have no subtype rows present). These queries are very simple and work at the entire-row-exists-or-doesn't-exist level, which in my opinion is a gigantic improvement over any design that requires checking column value consistency.
Also, please notice that in many cases columns that you would think should go in one of the subtype tables really can be combined in the supertype table, such as social security number. Call it TIN (taxpayer identification number) and it works for both businesses and people.
ID Column Naming
The question of whether or not to call the column in the Person table PartyID, PersonID, or PersonPartyID is your own preference, but I think it's best to call these PersonPartyID or BusinessPartyID—tolerating the cost of the longer name, this avoids two types of confusion. E.g., someone unfamiliar with the database sees BusinessID and doesn't know this is a PartyID, or sees PartyID and doesn't know it is restricted by foreign key to just those in the Business table.
If you want to create views for the Party and Business tables, they can even be materialized views since it's a simple inner join, and there you could rename the PersonPartyID column to PersonID if you were truly so inclined (though I wouldn't). If it's of great value to you, you can even make INSTEAD OF INSERT and INSTEAD OF UPDATE triggers on these views to handle the inserts to the two tables for you, making the views appear completely like their own tables to many application programs.
Making Your Proposed Design Work As-Is
Also, I hate to mention it, but if you want to have a constraint in your proposed design that enforces only one column being filled in, here is code for that:
ALTER TABLE dbo.Assets
ADD CONSTRAINT CK_Asset_PersonOrBusiness CHECK (
CASE WHEN PersonID IS NULL THEN 0 ELSE 1 END
+ CASE WHEN BusinessID IS NULL THEN 0 ELSE 1 END = 1
);
However, I don't recommend this solution.
Final Thoughts
A natural third subtype to add is organization, in the sense of something that people and businesses can have membership in. Supertype and subtype also elegantly solve customer/employee, customer/vendor, and other problems similar to the one you presented.
Be careful not to confuse "Is-A" with "Acts-As-A". You can tell a party is a customer by looking in your order table or viewing the order count, and may not need a Customer table at all. Also don't confuse identity with life cycle: a rental car may eventually be sold, but this is a progression in life cycle and should be handled with column data, not table presence--the car doesn't start out as a RentalCar and get turned into a ForSaleCar later, it's a Car the whole time. Or perhaps a RentalItem, maybe the business will rent other things too. You get the idea.
It may not even be necessary to have a PartyType table. The party type can be determined by the presence of a row in the corresponding subtype table. This would also avoid the potential problem of the PartyTypeID not matching the subtype table presence. One possible implementation is to keep the PartyType table, but remove PartyTypeID from the Party table, then in a view on the Party table return the correct PartyTypeID based on which subtype table has the corresponding row. This won't work if you choose to allow parties to be both subtypes. Then you would just stick with the subtype views and know that the same value of BusinessID and PersonID refer to the same party.
Further Reading On This Pattern
Please see A Universal Person and Organization Data Model for a more complete and theoretical treatment.
I recently found the following articles to be useful for describing some alternate approaches for modeling inheritance in a database. Though specific to Microsoft's Entity Framework ORM tool, there's no reason you couldn't implement these yourself in any DB development:
Table Per Hierarchy
Table Per Type (this is what I advocate above as the only fully normalized method of implementing inheritance in a database)
Table Per Concrete Class
Or a more brief overview of these three ways: How to choose an inheritance strategy
P.S. I have switched, more than once, my opinion on column naming of IDs in subtype tables, due to having more experience under my belt.
You don't need application logic to enforce this. The easiest way is with a check constraint:
(PeopleID is null and BusinessID is not null) or (PeopleID is not null and BusinessID is null)
You can have another entity from which Person and Business "extend". We call this entity Party in our current project. Both Person and Business have a FK to Party (is-a relationship). And Asset may have also a FK to Party (belongs to relationship).
With that said, if in the future an Asset can be shared by multiple instances, is better to create m:n relationships, it gives flexibility but complicates the application logic and the queries a bit more.
ErikE's answer gives a good explanation on how to go about the supertype / subtype relationship in tables and is likely what I'd go for in your situation, however, it doesn't really address the question(s) you've posed which are also interesting, namely:
What would be the best practice for storing this type of conditional relationship in Microsoft SQL Server?
...are there other options?
For those I recommend this blog entry on TechTarget which has an excerpt from excerpt from "A Developer's Guide to Data Modeling for SQL Server, Covering SQL Server 2005 and 2008" by Eric Johnson and Joshua Jones which addresses 3 possible options.
In summary they are:
Supertype Table - Almost matches what you've proposed, have a table with some fields that will always be null when others are filled. Good when only a couple of fields aren't shared. So depending on how different Business and People are you could possibly combine them into one table, Owners perhaps, and then just have OwnerID in your Asset table.
Subtype Tables - Basically the opposite of what Supertype tables are and is what you have just now. Here we have lots of unique fields and maybe one or two the same so we just have the repeated fields appear in each table. As you are finding this isn't really suitable for your situation.
Supertype and Subtype Tables - A combination of both of the above where the matching fields are placed in a single table and the unique ones in separate tables and matching IDs are used to join the record from one table to the other. This matches ErikE's proposed solution and, as mentioned, is the one I would favour as well.
Sadly it doesn't go on to explain which, if any, are best practice but it is certainly a good read to get an idea of the options that are out there.
YOu can enforce the logic with a trigger instead. Then no matter how the record is changed, only one of the fileds will be filled in.
You could also have a PeopleAsset table and a BusinessAsset table, but stillwould have the problem of enforcing that only one of them has a record.
An asset would have a foreign key to the owning person, and you should setup an association table to link assets and businesses. As said in other comments, you can use triggers and/or constraints to ensure that the data stays in a consistent state. ie. when you delete a business, delete the lines in your association table.
Table People, Businesses both can use UUID as primary key, and union both to a view for sql join purpose.
so you can simply use one foreign key column in Assets relation to both People and Businesses, because UUID is nearly unique. And you can simply query like:
select * from Assets
join view_People_Businesses as v on v.id = Assets.fk