Related
I have two tables:
User (username, password)
Profile (profileId, gender, dateofbirth, ...)
Currently I'm using this approach: each Profile record has a field named "userId" as foreign key which links to the User table. When a user registers, his Profile record is automatically created.
I'm confused with my friend suggestion: to have the "userId" field as the foreign and primary key and delete the "profileId" field. Which approach is better?
Foreign keys are almost always "Allow Duplicates," which would make them unsuitable as Primary Keys.
Instead, find a field that uniquely identifies each record in the table, or add a new field (either an auto-incrementing integer or a GUID) to act as the primary key.
The only exception to this are tables with a one-to-one relationship, where the foreign key and primary key of the linked table are one and the same.
Primary keys always need to be unique, foreign keys need to allow non-unique values if the table is a one-to-many relationship. It is perfectly fine to use a foreign key as the primary key if the table is connected by a one-to-one relationship, not a one-to-many relationship. If you want the same user record to have the possibility of having more than 1 related profile record, go with a separate primary key, otherwise stick with what you have.
Yes, it is legal to have a primary key being a foreign key. This is a rare construct, but it applies for:
a 1:1 relation. The two tables cannot be merged in one because of different permissions and privileges only apply at table level (as of 2017, such a database would be odd).
a 1:0..1 relation. Profile may or may not exist, depending on the user type.
performance is an issue, and the design acts as a partition: the profile table is rarely accessed, hosted on a separate disk or has a different sharding policy as compared to the users table. Would not make sense if the underlining storage is columnar.
Yes, a foreign key can be a primary key in the case of one to one relationship between those tables
I would not do that. I would keep the profileID as primary key of the table Profile
A foreign key is just a referential constraint between two tables
One could argue that a primary key is necessary as the target of any foreign keys which refer to it from other tables. A foreign key is a set of one or more columns in any table (not necessarily a candidate key, let alone the primary key, of that table) which may hold the value(s) found in the primary key column(s) of some other table. So we must have a primary key to match the foreign key.
Or must we? The only purpose of the primary key in the primary key/foreign key pair is to provide an unambiguous join - to maintain referential integrity with respect to the "foreign" table which holds the referenced primary key. This insures that the value to which the foreign key refers will always be valid (or null, if allowed).
http://www.aisintl.com/case/primary_and_foreign_key.html
It is generally considered bad practise to have a one to one relationship. This is because you could just have the data represented in one table and achieve the same result.
However, there are instances where you may not be able to make these changes to the table you are referencing. In this instance there is no problem using the Foreign key as the primary key. It might help to have a composite key consisting of an auto incrementing unique primary key and the foreign key.
I am currently working on a system where users can log in and generate a registration code to use with an app. For reasons I won't go into I am unable to simply add the columns required to the users table. So I am going down a one to one route with the codes table.
It depends on the business and system.
If your userId is unique and will be unique all the time, you can use userId as your primary key. But if you ever want to expand your system, it will make things difficult. I advise you to add a foreign key in table user to make a relationship with table profile instead of adding a foreign key in table profile.
Short answer: DEPENDS.... In this particular case, it might be fine. However, experts will recommend against it just about every time; including your case.
Why?
Keys are seldomly unique in tables when they are foreign (originated in another table) to the table in question. For example, an item ID might be unique in an ITEMS table, but not in an ORDERS table, since the same type of item will most likely exist in another order. Likewise, order IDs might be unique (might) in the ORDERS table, but not in some other table like ORDER_DETAILS where an order with multiple line items can exist and to query against a particular item in a particular order, you need the concatenation of two FK (order_id and item_id) as the PK for this table.
I am not DB expert, but if you can justify logically to have an auto-generated value as your PK, I would do that. If this is not practical, then a concatenation of two (or maybe more) FK could serve as your PK. BUT, I cannot think of any case where a single FK value can be justified as the PK.
It is not totally applied for the question's case, but since I ended up on this question serching for other info and by reading some comments, I can say it is possible to only have a FK in a table and get unique values.
You can use a column that have classes, which can only be assigned 1 time, it works almost like and ID, however it could be done in the case you want to use a unique categorical value that distinguish each record.
I designed a database to use GUIDs for the UserID but I added UserId as a foreign key to many other tables, and as I plan to have a very large number of database inputs I must design this well.
I also use ASP Membership tables (not profiles only membership, users and roles).
So currently I use a GUID as PK and as FK in every other table, this is maybe bad design?
What I think is maybe better and is the source of my question, should I add in Users table UserId(int) as a primary key and use this field as a foreign key to other tables and user GUID UserId only to reference aspnet_membership?
aspnet_membership{UserID(uniqueIdentifier)}
Users{
UserID_FK(uniqueIdentifier) // FK to aspnet_membership table
UserID(int) // primary key in this table --> Should I add this
...
}
In other tables user can create items in tables and I always must add UserId for example:
TableA{TableA_PK(int)..., CreatedBy(uniqueIdentifier)}
TableB{TableB_PK(int)..., CreatedBy(uniqueIdentifier)}
TableC{TableC_PK(int)..., CreatedBy(uniqueIdentifier)}
...
Ultimately the answer is that it really depends.
Microsoft have documented the performance differences of each here. While the article differs slightly to your situation, as you have to use a UNIQUEIDENTIFIER to link back to asp membership, many of the discussion points still apply.
If you have to create your own users table anyway it would make more sense to have your own int primary key, and use the GUID as a foreign key. It keeps separate entities separate. What if at some point in the future you wanted to add a different membership to a user? You would then need to update a primary key, which would have to cascade to any tables referencing this and could be quite a performance hit. If it is just a unique column in your users table it is a simple update.
All binary datatypes (uniqueidetifier is binary(16)) are fine as foreign keys. But GUID might cause another small problem as primary key. By default primary key is clustered, but GUID is generated randomly not sequentially as IDENTITY, so it will frequently split IO pages, that cause performance degradation a little.
I'd go ahead with your current design.
If you're worried because of inner joins performance in your queries because of comparing strings instead of ints, nowadays there is no difference at all.
I'm creating a database table and I don't have a logical primary key assigned to it. Should each and every table have a primary key?
Short answer: yes.
Long answer:
You need your table to be joinable on something
If you want your table to be clustered, you need some kind of a primary key.
If your table design does not need a primary key, rethink your design: most probably, you are missing something. Why keep identical records?
In MySQL, the InnoDB storage engine always creates a primary key if you didn't specify it explicitly, thus making an extra column you don't have access to.
Note that a primary key can be composite.
If you have a many-to-many link table, you create the primary key on all fields involved in the link. Thus you ensure that you don't have two or more records describing one link.
Besides the logical consistency issues, most RDBMS engines will benefit from including these fields in a unique index.
And since any primary key involves creating a unique index, you should declare it and get both logical consistency and performance.
See this article in my blog for why you should always create a unique index on unique data:
Making an index UNIQUE
P.S. There are some very, very special cases where you don't need a primary key.
Mostly they include log tables which don't have any indexes for performance reasons.
Always best to have a primary key. This way it meets first normal form and allows you to continue along the database normalization path.
As stated by others, there are some reasons not to have a primary key, but most will not be harmed if there is a primary key
Disagree with the suggested answer. The short answer is: NO.
The purpose of the primary key is to uniquely identify a row on the table in order to form a relationship with another table. Traditionally, an auto-incremented integer value is used for this purpose, but there are variations to this.
There are cases though, for example logging time-series data, where the existence of a such key is simply not needed and just takes up memory. Making a row unique is simply ...not required!
A small example:
Table A: LogData
Columns: DateAndTime, UserId, AttribA, AttribB, AttribC etc...
No Primary Key needed.
Table B: User
Columns: Id, FirstName, LastName etc.
Primary Key (Id) needed in order to be used as a "foreign key" to LogData table.
Pretty much any time I've created a table without a primary key, thinking I wouldn't need one, I've ended up going back and adding one. I now create even my join tables with an auto-generated identity field that I use as the primary key.
Except for a few very rare cases (possibly a many-to-many relationship table, or a table you temporarily use for bulk-loading huge amounts of data), I would go with the saying:
If it doesn't have a primary key, it's not a table!
Marc
Just add it, you will be sorry later when you didn't (selecting, deleting. linking, etc)
Will you ever need to join this table to other tables? Do you need a way to uniquely identify a record? If the answer is yes, you need a primary key. Assume your data is something like a customer table that has the names of the people who are customers. There may be no natural key because you need the addresses, emails, phone numbers, etc. to determine if this Sally Smith is different from that Sally Smith and you will be storing that information in related tables as the person can have mulitple phones, addesses, emails, etc. Suppose Sally Smith marries John Jones and becomes Sally Jones. If you don't have an artifical key onthe table, when you update the name, you just changed 7 Sally Smiths to Sally Jones even though only one of them got married and changed her name. And of course in this case withouth an artificial key how do you know which Sally Smith lives in Chicago and which one lives in LA?
You say you have no natural key, therefore you don't have any combinations of field to make unique either, this makes the artficial key critical.
I have found anytime I don't have a natural key, an artifical key is an absolute must for maintaining data integrity. If you do have a natural key, you can use that as the key field instead. But personally unless the natural key is one field, I still prefer an artifical key and unique index on the natural key. You will regret it later if you don't put one in.
It is a good practice to have a PK on every table, but it's not a MUST. Most probably you will need a unique index, and/or a clustered index (which is PK or not) depending on your need.
Check out the Primary Keys and Clustered Indexes sections on Books Online (for SQL Server)
"PRIMARY KEY constraints identify the column or set of columns that have values that uniquely identify a row in a table. No two rows in a table can have the same primary key value. You cannot enter NULL for any column in a primary key. We recommend using a small, integer column as a primary key. Each table should have a primary key. A column or combination of columns that qualify as a primary key value is referred to as a candidate key."
But then check this out also: http://www.aisintl.com/case/primary_and_foreign_key.html
To make it future proof you really should. If you want to replicate it you'll need one. If you want to join it to another table your life (and that of the poor fools who have to maintain it next year) will be so much easier.
I am in the role of maintaining application created by offshore development team. Now I am having all kinds of issues in the application because original database schema did not contain PRIMARY KEYS on some tables. So please dont let other people suffer because of your poor design. It is always good idea to have primary keys on tables.
Late to the party but I wanted to add my two cents:
Should each and every table have a primary key?
If you are talking about "Relational Albegra", the answer is Yes. Modelling data this way requires the entities and tables to have a primary key. The problem with relational algebra (apart from the fact there are like 20 different, mismatching flavors of it), is that it only exists on paper. You can't build real world applications using relational algebra.
Now, if you are talking about databases from real world apps, they partially/mostly adhere to the relational algebra, by taking the best of it and by overlooking other parts of it. Also, database engines offer massive non-relational functionality nowadays (it's 2020 now). So in this case the answer is No. In any case, 99.9% of my real world tables have a primary key, but there are justifiable exceptions. Case in point: event/log tables (multiple indexes, but not a single key in sight).
Bottom line, in transactional applications that follow the entity/relationship model it makes a lot of sense to have primary keys for almost (if not) all of the tables. If you ever decide to skip the primary key of a table, make sure you have a good reason for it, and you are prepared to defend your decision.
I know that in order to use certain features of the gridview in .NET, you need a primary key in order for the gridview to know which row needs updating/deleting. General practice should be to have a primary key or primary key cluster. I personally prefer the former.
I'd like to find something official like this - 15.6.2.1 Clustered and Secondary Indexes - MySQL.
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
So, why not create primary key or something like it by yourself? Besides, ORM cannot identify this hidden ID, meaning that you cannot use ID in your code.
I always have a primary key, even if in the beginning I don't have a purpose in mind yet for it. There have been a few times when I eventually need a PK in a table that doesn't have one and it's always more trouble to put it in later. I think there is more of an upside to always including one.
If you are using Hibernate its not possible to create an Entity without a primary key. This issues can create problem if you are working with an existing database which was created with plain sql/ddl scripts, and no primary key was added
In short, no. However, you need to keep in mind that certain client access CRUD operations require it. For future proofing, I tend to always utilize primary keys.
Although I'm guilty of this crime, it seems to me there can't be any good reason for a table to not have an identity field primary key.
Pros:
- whether you want to or not, you can now uniquely identify every row in your table which previously you could not do
- you can't do sql replication without a primary key on your table
Cons:
- an extra 32 bits for each row of your table
Consider for example the case where you need to store user settings in a table in your database. You have a column for the setting name and a column for the setting value. No primary key is necessary, but having an integer identity column and using it as your primary key seems like a best practice for any table you ever create.
Are there other reasons besides size that every table shouldn't just have an integer identity field?
Sure, an example in a single-database solution is if you have a table of countries, it probably makes more sense to use the ISO 3166-1-alpha-2 country code as the primary key as this is an international standard, and makes queries much more readable (e.g. CountryCode = 'GB' as opposed to CountryCode = 28). A similar argument could be applied to ISO 4217 currency codes.
In a SQL Server database solution using replication, a UNIQUEIDENTIFIER key would make more sense as GUIDs are required for some types of replication (and also make it much easier to avoid key conflicts if there are multiple source databases!).
The most clear example of a table that doesn't need a surrogate key is a many-to-many relation:
CREATE TABLE Authorship (
author_id INT NOT NULL,
book_id INT NOT NULL,
PRIMARY KEY (author_id, book_id),
FOREIGN KEY (author_id) REFERENCES Authors (author_id),
FOREIGN KEY (book_id) REFERENCES Books (book_id)
);
I also prefer a natural key when I design a tagging system:
CREATE TABLE Tags (
tag VARCHAR(20) PRIMARY KEY
);
CREATE TABLE ArticlesTagged (
article_id INT NOT NULL,
tag VARCHAR(20) NOT NULL,
PRIMARY KEY (article_id, tag),
FOREIGN KEY (article_id) REFERENCES Articles (article_id),
FOREIGN KEY (tag) REFERENCES Tags (tag)
);
This has some advantages over using a surrogate "tag_id" key:
You can ensure tags are unique, without adding a superfluous UNIQUE constraint.
You prevent two distinct tags from having the exact same spelling.
Dependent tables that reference the tag already have the tag text; they don't need to join to Tags to get the text.
Every table should have a primary key. It doesn't matter if it's an integer, GUID, or the "setting name" column. The type depends on the requirements of the application. Ideally, if you are going to join the table to another, it would be best to use a GUID or integer as your primary key.
Yes, there are good reasons. You can have semantically meaningful true keys, rather than articificial identity keys. Also, it is not a good idea to have a seperate autoincrementing primary key for a Many-Many table. There are some reasons you might want to choose a GUID.
That being said, I typically use autoincrementing 64bit integers for primary keys.
Every table should have a primary key. But it doesn't need to be a single field identifier. Take for example in a finance system, you may have the primary key on a journal table being the Journal ID and Line No. This will produce a unique combination for each row (and the Journal ID will be a primary key in its own table)
Your primary key needs to be defined on how you are going to link the table to other tables.
I don't think every table needs a primary key. Sometimes you only want to "connect" the contents of two tables - via their primary key.
So you have a table like users and one table like groups (each with primary keys) and you have a third table called users_groups with only two colums (user and group) where users and groups are connected with each other.
For example a row with user = 3 and group = 6 would link the user with primary key 3 to the group with primary key 6.
One reason not to have primary key defined as identity is having primary key defined as GUIDs or populated with externally generated values.
In general, every table that is semantically meaningful by itself should have primary key and such key should have no semantic meaning. A join table that realizes many-to-many relationship is not meaningful by itself and so it doesn't need such primary key (it already has one via its values).
To be a properly normalised table, each row should only have a single identifiable key. Many tables will already have natural keys, such a unique invoice number. I agree, especially with storage being so cheap, there is little overhead in having an autonumber/identity key on all tables, but in this instance which is the real key.
Another area where I personally don't use this approach if for reference data, where typically we have a Description and a Value
Code, Description
'L', 'Live'
'O', 'Old'
'P', 'Pending'
In this situation making code a primary key ensures no duplicates, and is more human readable.
The key difference (sorry) between a natural primary key and a surrogate primary key is that the value of the natural key contains information whereas the value of a surrogate key doesn't.
Why is this important? Well a natural primary key is by definition guaranteed to be unique, but its value is not usually guaranteed to stay the same. When it changes, you have to update it in multiple places.
A surrogate key's value has no real meaning and simply serves to identify that row, so it never needs to be changed. It is a feature of the model rather than the domain itself.
So the only place I would say a surrogate key isn't appropriate is in an association table which only contains columns referring to rows in other tables (most many-to-many relations). The only information this table carries is the association between two (or more) rows, and it already consists solely of surrogate key values. In this case I would choose a composite primary key.
If such a table had bag semantics, or carried additional information about the association, I would add a surrogate key.
A primary key is ALWAYS a good idea. It allows for very fast and easy joining of tables. It aides external tools that can read system tables to make join allowing less skilled people to create their own queries by drag-and-drop. It also makes the implementation of referential integrity a breeze and that is a good idea from the get go.
I know for sure that some very smart people working for web giants do this. While I don't know why their own reasons, I know 2 cases where PK-less tables make sense:
Importing data. The table is temporary. Insertions and whole table scans need to be as fast as possible. Also, we need to accept duplicate records. Later we will clean the data, but the import process needs to work.
Analytics in a DBMS. Identifying a row is not useful - if we need to do it, it is not analytics. We just need a non-relational, redundant, horrible blob that looks like a table. We will build summary tables or materialized views by writing proper SQL queries.
Note that these cases have good reasons to be non-relational. But normally your tables should be relational, so... yes, they need a primary key.
I have the following tables in my database that have a many-to-many relationship, which is expressed by a connecting table that has foreign keys to the primary keys of each of the main tables:
Widget: WidgetID (PK), Title, Price
User: UserID (PK), FirstName, LastName
Assume that each User-Widget combination is unique. I can see two options for how to structure the connecting table that defines the data relationship:
UserWidgets1: UserWidgetID (PK), WidgetID (FK), UserID (FK)
UserWidgets2: WidgetID (PK, FK), UserID (PK, FK)
Option 1 has a single column for the Primary Key. However, this seems unnecessary since the only data being stored in the table is the relationship between the two primary tables, and this relationship itself can form a unique key. Thus leading to option 2, which has a two-column primary key, but loses the one-column unique identifier that option 1 has. I could also optionally add a two-column unique index (WidgetID, UserID) to the first table.
Is there any real difference between the two performance-wise, or any reason to prefer one approach over the other for structuring the UserWidgets many-to-many table?
You only have one primary key in either case. The second one is what's called a compound key. There's no good reason for introducing a new column. In practise, you will have to keep a unique index on all candidate keys. Adding a new column buys you nothing but maintenance overhead.
Go with option 2.
Personally, I would have the synthetic/surrogate key column in many-to-many tables for the following reasons:
If you've used numeric synthetic keys in your entity tables then having the same on the relationship tables maintains consistency in design and naming convention.
It may be the case in the future that the many-to-many table itself becomes a parent entity to a subordinate entity that needs a unique reference to an individual row.
It's not really going to use that much additional disk space.
The synthetic key is not a replacement to the natural/compound key nor becomes the PRIMARY KEY for that table just because it's the first column in the table, so I partially agree with the Josh Berkus article. However, I don't agree that natural keys are always good candidates for PRIMARY KEY's and certainly should not be used if they are to be used as foreign keys in other tables.
Option 2 uses a simple compund key, option 1 uses a surrogate key. Option 2 is preferred in most scenarios and is close to the relational model in that it is a good candidate key.
There are situations where you may want to use a surrogate key (Option 1)
You are not certain that the compound key is a good candidate key over time. Particularly with temporal data (data that changes over time). What if you wanted to add another row to the UserWidget table with the same UserId and WidgetId? Think of Employment(EmployeeId,EmployeeId) - it would work in most cases except if someone went back to work for the same employer at a later date
If you are creating messages/business transactions or something similar that requires an easier key to use for integration. Replication maybe?
If you want to create your own auditing mechanisms (or similar) and don't want keys to get too long.
As a rule of thumb, when modeling data you will find that most associative entities (many to many) are the result of an event. Person takes up employment, item is added to basket etc. Most events have a temporal dependency on the event, where the date or time is relevant - in which case a surrogate key may be the best alternative.
So, take option 2, but make sure that you have the complete model.
I agree with the previous answers but I have one remark to add.
If you want to add more information to the relation and allow more relations between the same two entities you need option one.
For example if you want to track all the times user 1 has used widget 664 in the userwidget table the userid and widgetid isn't unique anymore.
What is the benefit of a primary key in this scenario? Consider the option of no primary key:
UserWidgets3: WidgetID (FK), UserID (FK)
If you want uniqueness then use either the compound key (UserWidgets2) or a uniqueness constraint.
The usual performance advantage of having a primary key is that you often query the table by the primary key, which is fast. In the case of many-to-many tables you don't usually query by the primary key so there is no performance benefit. Many-to-many tables are queried by their foreign keys, so you should consider adding indexes on WidgetID and UserID.
Option 2 is the correct answer, unless you have a really good reason to add a surrogate numeric key (which you have done in option 1).
Surrogate numeric key columns are not 'primary keys'. Primary keys are technically one of the combination of columns that uniquely identify a record within a table.
Anyone building a database should read this article http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327 by Josh Berkus to understand the difference between surrogate numeric key columns and primary keys.
In my experience the only real reason to add a surrogate numeric key to your table is if your primary key is a compound key and needs to be used as a foreign key reference in another table. Only then should you even think to add an extra column to the table.
Whenever I see a database structure where every table has an 'id' column the chances are it has been designed by someone who doesn't appreciate the relational model and it will invariably display one or more of the problems identified in Josh's article.
I would go with both.
Hear me out:
The compound key is obviously the nice, correct way to go in so far as reflecting the meaning of your data goes. No question.
However: I have had all sorts of trouble making hibernate work properly unless you use a single generated primary key - a surrogate key.
So I would use a logical and physical data model. The logical one has the compound key. The physical model - which implements the logical model - has the surrogate key and foreign keys.
Since each User-Widget combination is unique, you should represent that in your table by making the combination unique. In other words, go with option 2. Otherwise you may have two entries with the same widget and user IDs but different user-widget IDs.
The userwidgetid in the first table is not needed, as like you said the uniqueness comes from the combination of the widgetid and the userid.
I would use the second table, keep the foriegn keys and add a unique index on widgetid and userid.
So:
userwidgets( widgetid(fk), userid(fk),
unique_index(widgetid, userid)
)
There is some preformance gain in not having the extra primary key, as the database would not need to calculate the index for the key. In the above model though this index (through the unique_index) is still calculated, but I believe that this is easier to understand.