I found a case in ER where for the life of me I can't figure out how to achieve referential integrity. The classical Employee, Manager, Department relationship can illustrate this problem.
With the following constraints:
Employee can work in only one Department.
Department can have many Employees.
Employee can have one Manager working in the same Department.
Manager can have many Employees working in the same Department.
Employee that doesn't have a Manager is a Manager.
This diagram illustrates the concept.
Before normalisation I end up with the following table.
After normalisation I end up with these tables.
However, there is still nothing stopping me from accidentally assigning a manager working in one department to an employee working in a different department in the EmployeeManager table.
One possible solution that I found was to put Department into the EmployeeManager table and define a reference integrity constraint so that {Manager, Department} refers {Employee, Department} in the EmployeeDepartment table.
However, for this to work doesn't {Manager, Department} have to be a candidate key? Is there a different design that can solve this?
Update
Ok to answer my first question, doesn't {Manager, Department} have to be a candidate key? It turns out that the {Manager, Department} in the EmployeeManager table doesn't have to be a candidate key or a unique key. It simply has to be a foreign key referencing the {Employee, Department} in the EmployeeDepartment table. The uniqueness of {Employee, Department} key isn't well defined and may differ between different engines. MySQL for example advises that the foreign keys reference only unique keys.
Additionally, MySQL requires that the referenced columns be indexed for performance reasons. However, the system does not enforce a requirement that the referenced columns be UNIQUE or be declared NOT NULL. The handling of foreign key references to nonunique keys or keys that contain NULL values is not well defined for operations such as UPDATE or DELETE CASCADE. You are advised to use foreign keys that reference only UNIQUE (including PRIMARY) and NOT NULL keys.
In my case it will work because Employee can only work in one Department however if the constraint chances to allow Employees work in many Departments it won't work because {Employee, Department} will no longer be unique.
It should work in all cases including if the constraint chances to allow Employees work in many Departments.
Is there a different design that can solve this? I also thought about replacing EmployeeDepartment with ManagerDepartment table with {Manager} as a primary key and going back to a previous EmployeeManager table with (Employee, Manager) columns. So now to find out which Department an Employee works you need to join EmployeeManager with ManagerDepartment table.
Do you see any bad practises or anomalies with this design?
Assuming all these columns are declared NOT NULL . . .
One possible solution that I found was to put Department into the
EmployeeManager table and define a reference integrity constraint so
that {Manager, Department} refers {Employee, Department} in the
EmployeeDepartment table.
Yes, add a column for "department" to the "EmployeeManager" table. But you need two foreign key constraints that overlap. (But see below . . .)
(manager, department) references EmployeeDepartment (Employee, Department)
(employee, department) references EmployeeDepartment (Employee, Department)
Since EmployeeDepartment.Employee is unique, the pair of columns EmployeeDepartment.Employee and EmployeeDepartment.Department is also unique. So you can declare "Employee" as a primary key, and also declare a unique constraint on the pair of columns (Employee, Department). Should the requirements change and allow employees to work in multiple departments, you can drop the single-column primary key. I would probably drop both the primary key and unique constraints, and create a new primary key constraint that included both columns, but all that's strictly necessary is to drop the primary key constraint.
In systems like yours, it's usually a good idea to have a table of managers, with the obvious foreign key references. Right now, if you delete the employee Will, you lose the fact that Steve is a manager.
Related
I have a SQL Server database and it contains a table to record a employee salary.
It has 3 columns declared as foreign keys, and reference to the employee table's column, employee_id:
employee_id
submitted_by
confirmed_by
But is it best practice to make it all as FK, or do I only need employee_id?
Because in my application, submitted_by and confirmed_by will be selected by a drop down list and assume it exist on employee table.
Thanks you for advice.
Yes, since all users of your system are also Employees modelled by your system, if you wish to have Referential Integrity (RI) enforced in the database, all three columns should have foreign keys back to the referenced employee table. Note that since confirmed by sounds like part of a workflow process, where the user confirming may not be available at the time the record is inserted, you can make the field confirmed_by in table EmployeeSalary nullable (confirmed_by INT NULL), in which case RI will only be enforced at the later time when the field is actually populated.
You should name each of the foreign keys appropriately by expressing the role in the foreign key, e.g.
FK_EmployeeSalary_SalariedEmployee
FK_EmployeeSalary_EmployeeSubmittedBy
FK_EmployeeSalary_EmployeeConfirmedBy
Although the front end may restrict choices via the drop down, referential integrity is still beneficial:
Protect against bugs, e.g. where the submitted by employee is omitted (in the case of a non-nullable FK) or the employee provided doesn't exist in the employees table.
Prevent accidental deletion of an employee to which foreign key data is linked.
There is a (very) minor performance penalty on RI whereby the DB will need to check the existence of the PK in the employee table - in most instances this will be negligible.
Any column that references a key in another table should be declared as a foreign key. This way, if you mistakenly try to put a nonexistent value there, the database will report an error.
I have got two questions when designing a database for a sales system.
Is it possible to have a isolated table, which means a table does not have relationship with all other tables?
How to solve the following issue:
Table: SalesOrderDetail, Table: InventoryTrans
Every record in SalesOrderDetail will insert into InventoryTrans, but not all records in InventoryTrans are from SalesOrderDetail. Because other tables may also insert records into the InventoryTrans.
Therefore, I want to add a reference column SalesOrderDetailID to InventoryTrans table, but does not specify FK constraint. Because if the record is not from SalesOrderDetail table, then the SalesOrderDetailID should be null.
Is this the right design?
Yes, you can have a table that has no foreign key references to other tables. A table that stores various configuration settings is probably the most common, but there are others.
The column InventoryTrans.SalesOrderDetailID can be a nullable foreign key reference. But you haven't provided enough detail to tell whether that's a good design decision. Making an educated guess, I'd say probably not. (Other kinds of transactions would probably benefit from a foreign key reference.)
I have two tables:
User (username, password)
Profile (profileId, gender, dateofbirth, ...)
Currently I'm using this approach: each Profile record has a field named "userId" as foreign key which links to the User table. When a user registers, his Profile record is automatically created.
I'm confused with my friend suggestion: to have the "userId" field as the foreign and primary key and delete the "profileId" field. Which approach is better?
Foreign keys are almost always "Allow Duplicates," which would make them unsuitable as Primary Keys.
Instead, find a field that uniquely identifies each record in the table, or add a new field (either an auto-incrementing integer or a GUID) to act as the primary key.
The only exception to this are tables with a one-to-one relationship, where the foreign key and primary key of the linked table are one and the same.
Primary keys always need to be unique, foreign keys need to allow non-unique values if the table is a one-to-many relationship. It is perfectly fine to use a foreign key as the primary key if the table is connected by a one-to-one relationship, not a one-to-many relationship. If you want the same user record to have the possibility of having more than 1 related profile record, go with a separate primary key, otherwise stick with what you have.
Yes, it is legal to have a primary key being a foreign key. This is a rare construct, but it applies for:
a 1:1 relation. The two tables cannot be merged in one because of different permissions and privileges only apply at table level (as of 2017, such a database would be odd).
a 1:0..1 relation. Profile may or may not exist, depending on the user type.
performance is an issue, and the design acts as a partition: the profile table is rarely accessed, hosted on a separate disk or has a different sharding policy as compared to the users table. Would not make sense if the underlining storage is columnar.
Yes, a foreign key can be a primary key in the case of one to one relationship between those tables
I would not do that. I would keep the profileID as primary key of the table Profile
A foreign key is just a referential constraint between two tables
One could argue that a primary key is necessary as the target of any foreign keys which refer to it from other tables. A foreign key is a set of one or more columns in any table (not necessarily a candidate key, let alone the primary key, of that table) which may hold the value(s) found in the primary key column(s) of some other table. So we must have a primary key to match the foreign key.
Or must we? The only purpose of the primary key in the primary key/foreign key pair is to provide an unambiguous join - to maintain referential integrity with respect to the "foreign" table which holds the referenced primary key. This insures that the value to which the foreign key refers will always be valid (or null, if allowed).
http://www.aisintl.com/case/primary_and_foreign_key.html
It is generally considered bad practise to have a one to one relationship. This is because you could just have the data represented in one table and achieve the same result.
However, there are instances where you may not be able to make these changes to the table you are referencing. In this instance there is no problem using the Foreign key as the primary key. It might help to have a composite key consisting of an auto incrementing unique primary key and the foreign key.
I am currently working on a system where users can log in and generate a registration code to use with an app. For reasons I won't go into I am unable to simply add the columns required to the users table. So I am going down a one to one route with the codes table.
It depends on the business and system.
If your userId is unique and will be unique all the time, you can use userId as your primary key. But if you ever want to expand your system, it will make things difficult. I advise you to add a foreign key in table user to make a relationship with table profile instead of adding a foreign key in table profile.
Short answer: DEPENDS.... In this particular case, it might be fine. However, experts will recommend against it just about every time; including your case.
Why?
Keys are seldomly unique in tables when they are foreign (originated in another table) to the table in question. For example, an item ID might be unique in an ITEMS table, but not in an ORDERS table, since the same type of item will most likely exist in another order. Likewise, order IDs might be unique (might) in the ORDERS table, but not in some other table like ORDER_DETAILS where an order with multiple line items can exist and to query against a particular item in a particular order, you need the concatenation of two FK (order_id and item_id) as the PK for this table.
I am not DB expert, but if you can justify logically to have an auto-generated value as your PK, I would do that. If this is not practical, then a concatenation of two (or maybe more) FK could serve as your PK. BUT, I cannot think of any case where a single FK value can be justified as the PK.
It is not totally applied for the question's case, but since I ended up on this question serching for other info and by reading some comments, I can say it is possible to only have a FK in a table and get unique values.
You can use a column that have classes, which can only be assigned 1 time, it works almost like and ID, however it could be done in the case you want to use a unique categorical value that distinguish each record.
Making a primary key in a table in database is fine. Making a Composite Primary is also fine. But why cant I have 2 primary keys in a table? What kind of problems may occur if we have 2 primary keys.
Suppose I have a Students table. I don't want Roll No. and Names of each student to be unique. Then why can't I create 2 primary keys in a table? I don't see any logical problem in it now. But definitely I am missing a serious issue that's the reason it does not exist.
I am new in databases, so don't have much idea. It may also create a technical issue rather. Will be happy if someone can educate me on this.
Thanks.
You can create a UNIQUE constraint for both columns UNIQUE(roll,name).
The PK is unique by definition, cause it is used to identify a row from the others, for example, when a foreign key references that table, it is referencing the PK.
If you need another column to 'act' like a PK, give it the attributes unique and not null.
Well, this is simply by definition. There can not be two "primary" conditions, just like there can not be two "latest" versions.
Every table can contain more than one unique keys, but if you decide to have a primary key, this is just one of these unique keys, the "one" you deem the "most important", which identifies every record uniquely.
If you have a table and come to the conclusion that your primary key does not uniquely identify each record (also meaning that there can't be two records with the same values for the primary key), you have chosen the wrong primary key, as by definition, the fields of the primary key must uniquely define each record.
That, however, does not mean there can be no other combination of fields uniquely identifying the record! This is where a second feature kicks in: referential integrity.
You can "link" tables using their primary key. For example: If you have a Customer table and an Orders table, where the Customers table has a primary key on the customer number and the Orders table has a primary key on the order number and the customer number, that means:
Every customer can be identified uniquely by his customer number
Every order is uniquely identified by the order number and the customer number
You can then link the two tables on the customer number. The DB system then ensures several things, among which is the fact that you can not remove a customer who has orders in your database without first removing the orders. Otherwise, you would have orders without being able to find out the customer data, which would violate your database's referential integrity.
If you had two primary keys, the system would not know on which to ensure referential integrity, so you'd have to tell the system which key to use - which would make one of the primary keys more important, which would make it the "primary key" (!) of the primary keys.
You can have multiple candidate keys in a table but by convention only one key per table is called "primary". That's just a convention though and it doesn't make any real difference to the function of the keys. A primary key is no different to any other candidate key. If you find it convenient to call more than one key "primary" then I suggest you do so. In my opinion (I'm not the only one) the idea of designating a "primary" key at all is essentially an outdated concept of very little importance in database design.
You might be interested to know that early papers on the relational database model (e.g. by E.F.Codd, the relational model's inventor) actually used the term "primary key" to describe all the keys of a relation and not just one. So there is a perfectly good precedent for multiple primary keys per table. The idea of designating exactly one primary key is more recent and probably came into common use through the popularity of ER modelling techniques.
Create an unique index on the 2nd attribute (Names), it's almost the same as primary key with another name.
From Wikipedia (http://en.wikipedia.org/wiki/Unique_key):
A table can have at most one primary key, but more than one unique
key. A primary key is a combination of columns which uniquely specify
a row. It is a special case of unique keys. One difference is that
primary keys have an implicit NOT NULL constraint while unique keys do
not. Thus, the values in unique key columns may or may not be NULL,
and in fact such a column may contain at most one NULL fields.
Another difference is that primary keys must be defined using another
syntax.
I have a purchase order table and another table to contain the items within a particular purchase order for drugs.
Example:
PO_Table (POId, MainPharmacyID, SupplierID, PreparedBy)
PO_Items_Table (POItemID, ...)
I have two options of choosing which table to link to which and they seem both valid. i have done this a number of times and have done it either way.
I would love to know if their are any rules to where to attach a foreign?
In my situation where do i attach my foreign key?
Update:
My two options are putting POItemID in the PO_Table or putting POId in the PO_Items_Table.
Update 2:
Assuming the relationship between the two tables is a one-to-one relationship
Just make it point to the PRIMARY KEY of the referenced table:
PO_Table (POId PRIMARY KEY, MainPharmacyID, SupplierID, PreparedBy)
PO_Items_Table (POItemID, POId FOREIGN KEY REFERENCES PO_Table (POId), ...)
Actually, in your PO_Table I don't see any other candidate key except POId, so as for now this seems to be the only available solution to me.
What are the "two options" you are considering?
Update:
Putting POItemID in the PO_Table is not an option unless you want your orders to have no more than one item in them.
Just look into it: if you have but a single column which stores the id of the ordered item in the order table, where are you going to store the other items?
Update 2:
If there is a one-to-one relationship, normally you just merge the tables: combine all fields from both tables into a single record.
However, there are cases when you need to split the tables. Say, one of the entities is rarely defined but has too many fields.
In this case, you make a separate relation for the second entity and make its PRIMARY KEY column also a FOREIGN KEY.
Let's imagine a model which describes the locks and the keys, and the keys cannot be duplicated (so one lock matches at most one key and vice versa):
Pairs (PairID PRIMARY KEY, LockID UNIQUE, LockProductionDate, KeyId UNIQUE, KeyProductionDate)
If there is no key for a lock or no lock for a key, we just put NULLS into the corresponding fields.
However, if all keys have a lock but only few locks have keys, we can split the table:
Locks (LockID PRIMARY KEY, LockProductionDate, KeyID UNIQUE)
Keys (KeyID PRIMARY KEY, KeyProductionDate, FOREIGN KEY (KeyID) REFERENCES Locks (KeyID))
As you can see, the KeyID is both a PRIMARY KEY and a FOREIGN KEY in the Keys table.
You may want read this article in my blog:
What is entity-relationship model?
, which describes some ways to map ER model (entities and relationship) into the relational model (tables and foreign keys)
You don't have two options.. A Foreign Key constraint must be attached to the table, (and to the column) that has has the Foreign Key in it. And it must reference (or point to ) the Primary key in the other table. I don't quite understand what you mean when you say you have done this a number of times either way... What other Way ??
It looks like your PO_Table is the logical parent of the PO_Items_Table, which means the primary key of the PO_Table should be used as the Foreign Key in the items table
If PO stands for "Purchase Orders" and PO Item stands for a single line item of a purchase order, then you only have one choice about how to set up foreign keys. There may be many items for each purchase order, but there will only be one purchase order for each item. In this case, Quassnoi gave the correct design.
As a sidelight, every time I have designed a purchase order database, I have made the Items table have a compound primary key made up of POID and ItemID. But ItemID is not unique among all Items, just the items that belong to a single PO. Each time I start a new PO, I begin all over again with ItemID equal to one. This permits me to reconstruct a purchase order later on, and get the items in the same order as they were in when the order was first created. This is a trivial matter for most data processing purposes, but it can drive customers nuts if they look atr a PO later on, and the items are out of sequence, as they perceive sequence.