I have a SQL Server database and it contains a table to record a employee salary.
It has 3 columns declared as foreign keys, and reference to the employee table's column, employee_id:
employee_id
submitted_by
confirmed_by
But is it best practice to make it all as FK, or do I only need employee_id?
Because in my application, submitted_by and confirmed_by will be selected by a drop down list and assume it exist on employee table.
Thanks you for advice.
Yes, since all users of your system are also Employees modelled by your system, if you wish to have Referential Integrity (RI) enforced in the database, all three columns should have foreign keys back to the referenced employee table. Note that since confirmed by sounds like part of a workflow process, where the user confirming may not be available at the time the record is inserted, you can make the field confirmed_by in table EmployeeSalary nullable (confirmed_by INT NULL), in which case RI will only be enforced at the later time when the field is actually populated.
You should name each of the foreign keys appropriately by expressing the role in the foreign key, e.g.
FK_EmployeeSalary_SalariedEmployee
FK_EmployeeSalary_EmployeeSubmittedBy
FK_EmployeeSalary_EmployeeConfirmedBy
Although the front end may restrict choices via the drop down, referential integrity is still beneficial:
Protect against bugs, e.g. where the submitted by employee is omitted (in the case of a non-nullable FK) or the employee provided doesn't exist in the employees table.
Prevent accidental deletion of an employee to which foreign key data is linked.
There is a (very) minor performance penalty on RI whereby the DB will need to check the existence of the PK in the employee table - in most instances this will be negligible.
Any column that references a key in another table should be declared as a foreign key. This way, if you mistakenly try to put a nonexistent value there, the database will report an error.
Related
I found a case in ER where for the life of me I can't figure out how to achieve referential integrity. The classical Employee, Manager, Department relationship can illustrate this problem.
With the following constraints:
Employee can work in only one Department.
Department can have many Employees.
Employee can have one Manager working in the same Department.
Manager can have many Employees working in the same Department.
Employee that doesn't have a Manager is a Manager.
This diagram illustrates the concept.
Before normalisation I end up with the following table.
After normalisation I end up with these tables.
However, there is still nothing stopping me from accidentally assigning a manager working in one department to an employee working in a different department in the EmployeeManager table.
One possible solution that I found was to put Department into the EmployeeManager table and define a reference integrity constraint so that {Manager, Department} refers {Employee, Department} in the EmployeeDepartment table.
However, for this to work doesn't {Manager, Department} have to be a candidate key? Is there a different design that can solve this?
Update
Ok to answer my first question, doesn't {Manager, Department} have to be a candidate key? It turns out that the {Manager, Department} in the EmployeeManager table doesn't have to be a candidate key or a unique key. It simply has to be a foreign key referencing the {Employee, Department} in the EmployeeDepartment table. The uniqueness of {Employee, Department} key isn't well defined and may differ between different engines. MySQL for example advises that the foreign keys reference only unique keys.
Additionally, MySQL requires that the referenced columns be indexed for performance reasons. However, the system does not enforce a requirement that the referenced columns be UNIQUE or be declared NOT NULL. The handling of foreign key references to nonunique keys or keys that contain NULL values is not well defined for operations such as UPDATE or DELETE CASCADE. You are advised to use foreign keys that reference only UNIQUE (including PRIMARY) and NOT NULL keys.
In my case it will work because Employee can only work in one Department however if the constraint chances to allow Employees work in many Departments it won't work because {Employee, Department} will no longer be unique.
It should work in all cases including if the constraint chances to allow Employees work in many Departments.
Is there a different design that can solve this? I also thought about replacing EmployeeDepartment with ManagerDepartment table with {Manager} as a primary key and going back to a previous EmployeeManager table with (Employee, Manager) columns. So now to find out which Department an Employee works you need to join EmployeeManager with ManagerDepartment table.
Do you see any bad practises or anomalies with this design?
Assuming all these columns are declared NOT NULL . . .
One possible solution that I found was to put Department into the
EmployeeManager table and define a reference integrity constraint so
that {Manager, Department} refers {Employee, Department} in the
EmployeeDepartment table.
Yes, add a column for "department" to the "EmployeeManager" table. But you need two foreign key constraints that overlap. (But see below . . .)
(manager, department) references EmployeeDepartment (Employee, Department)
(employee, department) references EmployeeDepartment (Employee, Department)
Since EmployeeDepartment.Employee is unique, the pair of columns EmployeeDepartment.Employee and EmployeeDepartment.Department is also unique. So you can declare "Employee" as a primary key, and also declare a unique constraint on the pair of columns (Employee, Department). Should the requirements change and allow employees to work in multiple departments, you can drop the single-column primary key. I would probably drop both the primary key and unique constraints, and create a new primary key constraint that included both columns, but all that's strictly necessary is to drop the primary key constraint.
In systems like yours, it's usually a good idea to have a table of managers, with the obvious foreign key references. Right now, if you delete the employee Will, you lose the fact that Steve is a manager.
I have got two questions when designing a database for a sales system.
Is it possible to have a isolated table, which means a table does not have relationship with all other tables?
How to solve the following issue:
Table: SalesOrderDetail, Table: InventoryTrans
Every record in SalesOrderDetail will insert into InventoryTrans, but not all records in InventoryTrans are from SalesOrderDetail. Because other tables may also insert records into the InventoryTrans.
Therefore, I want to add a reference column SalesOrderDetailID to InventoryTrans table, but does not specify FK constraint. Because if the record is not from SalesOrderDetail table, then the SalesOrderDetailID should be null.
Is this the right design?
Yes, you can have a table that has no foreign key references to other tables. A table that stores various configuration settings is probably the most common, but there are others.
The column InventoryTrans.SalesOrderDetailID can be a nullable foreign key reference. But you haven't provided enough detail to tell whether that's a good design decision. Making an educated guess, I'd say probably not. (Other kinds of transactions would probably benefit from a foreign key reference.)
I have two tables:
User (username, password)
Profile (profileId, gender, dateofbirth, ...)
Currently I'm using this approach: each Profile record has a field named "userId" as foreign key which links to the User table. When a user registers, his Profile record is automatically created.
I'm confused with my friend suggestion: to have the "userId" field as the foreign and primary key and delete the "profileId" field. Which approach is better?
Foreign keys are almost always "Allow Duplicates," which would make them unsuitable as Primary Keys.
Instead, find a field that uniquely identifies each record in the table, or add a new field (either an auto-incrementing integer or a GUID) to act as the primary key.
The only exception to this are tables with a one-to-one relationship, where the foreign key and primary key of the linked table are one and the same.
Primary keys always need to be unique, foreign keys need to allow non-unique values if the table is a one-to-many relationship. It is perfectly fine to use a foreign key as the primary key if the table is connected by a one-to-one relationship, not a one-to-many relationship. If you want the same user record to have the possibility of having more than 1 related profile record, go with a separate primary key, otherwise stick with what you have.
Yes, it is legal to have a primary key being a foreign key. This is a rare construct, but it applies for:
a 1:1 relation. The two tables cannot be merged in one because of different permissions and privileges only apply at table level (as of 2017, such a database would be odd).
a 1:0..1 relation. Profile may or may not exist, depending on the user type.
performance is an issue, and the design acts as a partition: the profile table is rarely accessed, hosted on a separate disk or has a different sharding policy as compared to the users table. Would not make sense if the underlining storage is columnar.
Yes, a foreign key can be a primary key in the case of one to one relationship between those tables
I would not do that. I would keep the profileID as primary key of the table Profile
A foreign key is just a referential constraint between two tables
One could argue that a primary key is necessary as the target of any foreign keys which refer to it from other tables. A foreign key is a set of one or more columns in any table (not necessarily a candidate key, let alone the primary key, of that table) which may hold the value(s) found in the primary key column(s) of some other table. So we must have a primary key to match the foreign key.
Or must we? The only purpose of the primary key in the primary key/foreign key pair is to provide an unambiguous join - to maintain referential integrity with respect to the "foreign" table which holds the referenced primary key. This insures that the value to which the foreign key refers will always be valid (or null, if allowed).
http://www.aisintl.com/case/primary_and_foreign_key.html
It is generally considered bad practise to have a one to one relationship. This is because you could just have the data represented in one table and achieve the same result.
However, there are instances where you may not be able to make these changes to the table you are referencing. In this instance there is no problem using the Foreign key as the primary key. It might help to have a composite key consisting of an auto incrementing unique primary key and the foreign key.
I am currently working on a system where users can log in and generate a registration code to use with an app. For reasons I won't go into I am unable to simply add the columns required to the users table. So I am going down a one to one route with the codes table.
It depends on the business and system.
If your userId is unique and will be unique all the time, you can use userId as your primary key. But if you ever want to expand your system, it will make things difficult. I advise you to add a foreign key in table user to make a relationship with table profile instead of adding a foreign key in table profile.
Short answer: DEPENDS.... In this particular case, it might be fine. However, experts will recommend against it just about every time; including your case.
Why?
Keys are seldomly unique in tables when they are foreign (originated in another table) to the table in question. For example, an item ID might be unique in an ITEMS table, but not in an ORDERS table, since the same type of item will most likely exist in another order. Likewise, order IDs might be unique (might) in the ORDERS table, but not in some other table like ORDER_DETAILS where an order with multiple line items can exist and to query against a particular item in a particular order, you need the concatenation of two FK (order_id and item_id) as the PK for this table.
I am not DB expert, but if you can justify logically to have an auto-generated value as your PK, I would do that. If this is not practical, then a concatenation of two (or maybe more) FK could serve as your PK. BUT, I cannot think of any case where a single FK value can be justified as the PK.
It is not totally applied for the question's case, but since I ended up on this question serching for other info and by reading some comments, I can say it is possible to only have a FK in a table and get unique values.
You can use a column that have classes, which can only be assigned 1 time, it works almost like and ID, however it could be done in the case you want to use a unique categorical value that distinguish each record.
I have a problem that can be summarized as follow:
Assume that I am implementing an employee database. For each person depends on his position, different fields should be filled. So for example if the employee is a software engineer, I have the following columns:
Name
Family
Language
Technology
CanDevelopWeb
And if the employee is a business manager I have the following columns:
Name
Family
FieldOfExpertise
MaximumContractValue
BonusRate
And if the employee is a salesperson then some other columns and so on.
How can I implement this in database schema?
One way that I thought is to have some related tables:
CoreTable:
Name
Family
Type
And if type is one then the employee is a software developer and hence the remaining information should be in table SoftwareDeveloper:
Language
Technology
CanDevelopWeb
For business Managers I have another table with columns:
FieldOfExpertise
MaximumContractValue
BonusRate
The problem with this structure is that I am not sure how to make relationship between tables, as one table has relationship with several tables on one column.
How to enforce relational integrity?
There are a few schools of thought here.
(1) store nullable columns in a single table and only populate the relevant ones (check constraints can enforce integrity here). Some people don't like this because they are afraid of NULLs.
(2) your multi-table design where each type gets its own table. Tougher to enforce with DRI but probably trivial with application or trigger logic.
The only problem with either of those, is as soon as you add a new property (like CanReadUpsideDown), you have to make schema changes to accommodate for that - in (1) you need to add a new column and a new constraint, in (2) you need to add a new table if that represents a new "type" of employee.
(3) EAV, where you have a single table that stores property name and value pairs. You have less control over data integrity here, but you can certainly constraint the property names to certain strings. I wrote about this here:
What is so bad about EAV, anyway?
You are describing one ("class per table") of the 3 possible strategies for implementing the category (aka. inheritance, generalization, subclass) hierarchy.
The correct "propagation" of PK from the parent to child tables is naturally enforced by straightforward foreign keys between them, but ensuring both presence and the exclusivity of the child rows is another matter. It can be done (as noted in the link above), but the added complexity is probably not worth it and I'd generally recommend handling it at the application level.
I would add a field called EmployeeId in the EmployeeTable
I'd get rid of Type
For BusinessManager table and SoftwareDeveloper for example, I'll add EmployeeId
From here, you can then proceed to create Foreign Keys from BusinessManager, SoftwareDeveloper table to Employee
To further expand on your one way with the core table is to create a surrogate key based off an identity column. This will create a unique employee id for each employee (this will help you distinguish between employees with the same name as well).
The foreign keys preserve your referential integrity. You wouldn't necessarily need EmployeeTypeId as someone else mentioned as you could filter on existence in the SoftwareDeveloper or BusinessManagers tables. The column would instead act as a cached data point for easier querying.
You have to fill in the types in the below sample code and rename the foreign keys.
create table EmployeeType(
EmployeeTypeId
, EmployeeTypeName
, constraint PK_EmployeeType primary key (EmployeeTypeId)
)
create table Employees(
EmployeeId int identity(1,1)
, Name
, Family
, EmployeeTypeId
, constraint PK_Employees primary key (EmployeeId)
, constraint FK_blahblah foreign key (EmployeeTypeId) references EmployeeType(EmployeeTypeId)
)
create table SoftwareDeveloper(
EmployeeId
, Language
, Technology
, CanDevelopWeb
, constraint FK_blahblah foreign key (EmployeeId) references Employees(EmployeeId)
)
create table BusinessManagers(
EmployeeId
, FieldOfExpertise
, MaximumContractValue
, BonusRate
, constraint FK_blahblah foreign key (EmployeeId) references Employees(EmployeeId)
)
No existing SQL engine has solutions that make life easy on you in this situation.
Your problem is discussed at fairly large in "Practical Issues in Database Management", in the chapter on "entity subtyping". Commendable reading, not only for this particular chapter.
The proper solution, from a logical design perspective, would be similar to yours, but for the "type" column in the core table. You don't need that, since you can derive the 'type' from which non-core table the employee appears in.
What you need to look at is the business rules, aka data constraints, that will ensure the overall integrity (aka consistency) of the data (of course whether any of these actually apply is something your business users, not me, should tell you) :
Each named employee must have exactly one job, and thus some job detail somewhere. iow : (1) no named employees without any job detail whatsoever and (2) no named employees with >1 job detail.
(3) All job details must be for a named employee.
Of these, (3) is the only one you can implement declaratively if you are using an SQL engine. It's just a regular FK from the non-core tables to the core table.
(1) and (2) could be defined declaratively in standard SQL, using either CREATE ASSERTION or a CHECK CONSTRAINT involving references to other tables than the one the CHECK CONSTRAINT is defined on, but neither of those constructs are supported by any SQL engine I know.
One more thing about why [including] the 'type' column is a rather poor choice to make : it changes how constraint (3) must be formulated. For example, you can no longer say "all business managers must be named employees", but instead you'd have to say "all business managers are named employees whose type is <type here>". Iow, the "regular FK" to your core table has now become a reference to a VIEW on your core table, something you might want to declare as, say,
CREATE TABLE BUSMANS ... REFERENCES (SELECT ... FROM CORE WHERE TYPE='BM');
or
CREATE VIEW BM AS (SELECT ... FROM CORE WHERE TYPE='BM');
CREATE TABLE BUSMANS ... REFERENCES BM;
Once again something SQL doesn't allow you to do.
You can use all fields in the same table, but you'll need an extra table named Employee_Type (for example) and here you have to put Developer, Business Manager, ... of course with an unique ID. So your relation will be employee_type_id in Employee table.
Using PHP or ASP you can control what field you want to show depending the employee_type_id (or text) in a drop-down menu.
You are on the right track. You can set up PK/FK relationships from the general person table to each of the specialized tables. You should add a personID to all the tables to use for the relationship as you do not want to set up a relationship on name because it cannot be a PK as it is not unique. Also names change, they are a very poor choice for an FK relationship as a name change could cause many records to need to change. It is important to use separate tables rather than one because some of those things are in a one to many relationship. A Developer for instnce may have many differnt technologies and that sort of thing should NEVER be stored in a comma delimted list.
You could also set up trigger to enforce that records can only be added to a specialty table if the main record has a particular personType. However, be wary of doing this as you wil have peopl who change roles over time. Do you want to lose the history of wha the person knew when he was a developer when he gets promoted to a manager. Then if he decides to step back down to development (A frequent occurance) you would have to recreate his old record.
I was a developer in a certain project developed under sql-server and .Net, they don't use physical relations between their tables but they use logical ones " logical foreign keys ".
I asked them that for what reason they do that ,they say "it is more optimal".
What I really want to know, is it really more optimal or it is just a myth?
When it comes to reads from a database, whether foreign keys are defined or not doesn't come into it. There is no relationship between having foreign keys and the performance of reads.
Things that will effect performance are how the tables are stored, what indexes are defined on them and the stored statistics (just to name a few).
This is a bad justification for not having referential integrity in the database (in particular as it can be trivial to test).
Using the assumption that " logical foreign keys " are just values that reference a key in another table without a physical link between them in terms of constraints I can tell you what the benefits of the physical link is.
First of all a "physical" foreign key is a constraint and it enforces referential integrity between the two values. So that, if you want for example to use a foreign key that doesn't exist in the other table you will receive an error. The same thing will also happen if you try to delete a key that is a foreign key by constraint in another table.
Secondly it is arguable that it is more optimal since you can index the foreign key constraints and benefit from that for example when you use joins.
More on this: http://msdn.microsoft.com/en-us/library/ff647793.aspx
There is actually no physical difference between a "real" foreign key and a "logical" foreign key. They're both just columns in a table and don't affect the way that a table is stored on disk. This actually surprised me too when I first learned.
The only difference is that when you have a "real" foreign key, whenever a delete, update, or insert statement is ran on a table, the database server has to check that the value is being updated to a legitimate value. If you look at the execution plan for a statement that's an update, insert, delete, or merge, you'll actually see it has to scan or seek on all tables that have a foreign key.
This can be quite a performance overhead if there are a lot of foreign keys or there aren't helpful indexes.
Picture you have a table for Companies, and then another table for Employees. Your employees table will likely have a column called companyId.
When you run:
delete from Companies where companyId = 123;
The database server needs to make sure that there aren't any employees for that companyId. The same applies when you run:
insert into Employees (companyId, name) values (123, 'John');
The database server needs to search the companies table to make sure that the companyId 123 exists.
Yes it is faster to have only "logical" foreign keys. However, it comes at the cost of possible data corruption and might cost more time finding bugs and other sources of data corruption. Whether it's worth it is up to you. One thing to consider is that it doesn't affect read-only queries.
Edit As Martin Smith pointed out and I had left out, there are some cases where the foreign key would be faster. If there is an inner join on a table with a foreign key, and no columns are referenced by the second table, then the query doesn't have to hit the second table since it can trust the foreign key.