I am trying to implement some localization in my database.
It looks something like this (prefixes only for clarification)
tbl-Categories
ID
Language
Name
tbl-Articles
ID
CategoryID
Now, in my tbl-Categories, I want to have primary keys spanning ID and language, so that every combination of ID and language is unique. In tbl-Articles I would like a foreign key to reference ID in categories, but not Language, since I do not want to bind an article to a certain language, only category.
Of course, I cannot add a foreign key to part of the primary key. I also cannot have the primary key only on the ID of categories, since then there can only be one language. Having no primary keys disables foreign keys altogether, and that is also not a great solution.
Do you have any ideas how I can solve this in an elegant fashion?
Thanks.
Given the scenario you need to have one to many relationship established between Category and Language. Create 3 tables:
Category with CategoryID and Name as columns
Language with LanguageID and Name as Columns
CategoryLanguage with CategoryLanguageId, CategoryID and LanguageID (create a composite primary key on CategoryId and LanguageId which establishes uniqueness)
You dont have to do anything on the Articles table since ID and CategoryId establishes that an article can be in one of the category but not dependant on language.
HTH
Related
I am trying to build a movie database for educational purpose using Cassandra in the backend. The querying on the database will be principally made by movie title. So currently the data I have fits in the following model.
movie title | imdb rating | year of release | actors
Reading the CQL documentation I found the music playlist example where the following structure was used
CREATE TABLE playlists (
id uuid,
song_order int,
song_id uuid,
title text,
album text,
artist text,
PRIMARY KEY (id, song_order ) );
The query I have is what is the necessity of using a separate id column. Can't the title column be used as a primary key? what are the advantages and disadvantages of not using a separate uuid field?
The command which I am designing for my model is
CREATE TABLE movies (
title text,
imdb_rating double,
year int,
actors text,
PRIMARY KEY (title, imdb_rating ) );
Here I believe in my model title is the PRIMARY KEY and the PARTITION KEY and imdb_rating is the CLUSTERING KEY(for arranging output in ascending order). Is there anything wrong in my model and how will it affect distribution of the data and why should I/should not use uuid? I am planning to keep a replication_factor of 2 because the number of nodes I am using is just 3.
Also according to the documentation
Do not use an index in these situations:
......
•On a frequently updated or deleted column
In my database the most updated column is imdb_rating so I am not building any secondary index on it.
Can't the title column be used as a primary key?
If the movie title is unique (which is not necessarily true) you could use title as primary key.
what are the advantages and disadvantages of not using a separate uuid field?
UUID is good if you need a unique id that is globally unique and you don't have to check for it's uniqueness. If you can find a set of columns that can be granted that their combination is unique you don't have to use UUID (assuming you don't need an id to refer to it).
But it all depends on your query pattern. if you are going to look for a movie with it's id (probably coming from another table) use UUID as primary key. if you want to find movies with specific title then use title as primary key.
in your case since title is not unique, use a combination of title and UUID as composite key, given that you would search by title.
Here I believe in my model title is the PRIMARY KEY and the PARTITION KEY and imdb_rating is the CLUSTERING KEY(for arranging output in ascending order). Is there anything wrong in my model and how will it affect distribution of the data and why should I/should not use uuid?
in this case you have to use the rating and a UUID for primary key, but when you query you need to allow filtering.
I have two tables:
User (username, password)
Profile (profileId, gender, dateofbirth, ...)
Currently I'm using this approach: each Profile record has a field named "userId" as foreign key which links to the User table. When a user registers, his Profile record is automatically created.
I'm confused with my friend suggestion: to have the "userId" field as the foreign and primary key and delete the "profileId" field. Which approach is better?
Foreign keys are almost always "Allow Duplicates," which would make them unsuitable as Primary Keys.
Instead, find a field that uniquely identifies each record in the table, or add a new field (either an auto-incrementing integer or a GUID) to act as the primary key.
The only exception to this are tables with a one-to-one relationship, where the foreign key and primary key of the linked table are one and the same.
Primary keys always need to be unique, foreign keys need to allow non-unique values if the table is a one-to-many relationship. It is perfectly fine to use a foreign key as the primary key if the table is connected by a one-to-one relationship, not a one-to-many relationship. If you want the same user record to have the possibility of having more than 1 related profile record, go with a separate primary key, otherwise stick with what you have.
Yes, it is legal to have a primary key being a foreign key. This is a rare construct, but it applies for:
a 1:1 relation. The two tables cannot be merged in one because of different permissions and privileges only apply at table level (as of 2017, such a database would be odd).
a 1:0..1 relation. Profile may or may not exist, depending on the user type.
performance is an issue, and the design acts as a partition: the profile table is rarely accessed, hosted on a separate disk or has a different sharding policy as compared to the users table. Would not make sense if the underlining storage is columnar.
Yes, a foreign key can be a primary key in the case of one to one relationship between those tables
I would not do that. I would keep the profileID as primary key of the table Profile
A foreign key is just a referential constraint between two tables
One could argue that a primary key is necessary as the target of any foreign keys which refer to it from other tables. A foreign key is a set of one or more columns in any table (not necessarily a candidate key, let alone the primary key, of that table) which may hold the value(s) found in the primary key column(s) of some other table. So we must have a primary key to match the foreign key.
Or must we? The only purpose of the primary key in the primary key/foreign key pair is to provide an unambiguous join - to maintain referential integrity with respect to the "foreign" table which holds the referenced primary key. This insures that the value to which the foreign key refers will always be valid (or null, if allowed).
http://www.aisintl.com/case/primary_and_foreign_key.html
It is generally considered bad practise to have a one to one relationship. This is because you could just have the data represented in one table and achieve the same result.
However, there are instances where you may not be able to make these changes to the table you are referencing. In this instance there is no problem using the Foreign key as the primary key. It might help to have a composite key consisting of an auto incrementing unique primary key and the foreign key.
I am currently working on a system where users can log in and generate a registration code to use with an app. For reasons I won't go into I am unable to simply add the columns required to the users table. So I am going down a one to one route with the codes table.
It depends on the business and system.
If your userId is unique and will be unique all the time, you can use userId as your primary key. But if you ever want to expand your system, it will make things difficult. I advise you to add a foreign key in table user to make a relationship with table profile instead of adding a foreign key in table profile.
Short answer: DEPENDS.... In this particular case, it might be fine. However, experts will recommend against it just about every time; including your case.
Why?
Keys are seldomly unique in tables when they are foreign (originated in another table) to the table in question. For example, an item ID might be unique in an ITEMS table, but not in an ORDERS table, since the same type of item will most likely exist in another order. Likewise, order IDs might be unique (might) in the ORDERS table, but not in some other table like ORDER_DETAILS where an order with multiple line items can exist and to query against a particular item in a particular order, you need the concatenation of two FK (order_id and item_id) as the PK for this table.
I am not DB expert, but if you can justify logically to have an auto-generated value as your PK, I would do that. If this is not practical, then a concatenation of two (or maybe more) FK could serve as your PK. BUT, I cannot think of any case where a single FK value can be justified as the PK.
It is not totally applied for the question's case, but since I ended up on this question serching for other info and by reading some comments, I can say it is possible to only have a FK in a table and get unique values.
You can use a column that have classes, which can only be assigned 1 time, it works almost like and ID, however it could be done in the case you want to use a unique categorical value that distinguish each record.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Primary key/foreign Key naming convention
What is the naming convention for Primary Key column in Db Tables?
For instance:
PK_Country or CountryId or ID or PrimaryKey or.. ?
I like the Ruby on Rails conventions:
primary key in any table is auto-increment integer column called id
foreign keys are named as foreign table name plus _id. Example:
country_id is a foreign key that corresponds to a record from the countries table
Columns should be named based on the data elements that they represent, not based on what constraints apply to them. A primary key column should be named in the same way you name any other column. The ISO 11179 standard has some useful guidelines for naming data elements.
If the table is called Test I would call the PK column TestID.
That is pretty much up to the designer of the db. When I do it I make the field name of the primary key: id and field name of foreign keys: tablenameId
I always try to start primary keys with PK and foriegn keys with FK. Also, when naming foriegn keys, I try to include the table names and column names in the name of the foriegn key: FK_Questions-test-id_Tests-seq-num would be a foriegn key from table questions to table tests where questinos.test_id = tests.seq_num. i think it helps to name them that way for when you are glancing at your keys.
There really is not a defined standard; rather, you should develop your own convention and make sure that every primary key you define is consistent with that convention.
I have a purchase order table and another table to contain the items within a particular purchase order for drugs.
Example:
PO_Table (POId, MainPharmacyID, SupplierID, PreparedBy)
PO_Items_Table (POItemID, ...)
I have two options of choosing which table to link to which and they seem both valid. i have done this a number of times and have done it either way.
I would love to know if their are any rules to where to attach a foreign?
In my situation where do i attach my foreign key?
Update:
My two options are putting POItemID in the PO_Table or putting POId in the PO_Items_Table.
Update 2:
Assuming the relationship between the two tables is a one-to-one relationship
Just make it point to the PRIMARY KEY of the referenced table:
PO_Table (POId PRIMARY KEY, MainPharmacyID, SupplierID, PreparedBy)
PO_Items_Table (POItemID, POId FOREIGN KEY REFERENCES PO_Table (POId), ...)
Actually, in your PO_Table I don't see any other candidate key except POId, so as for now this seems to be the only available solution to me.
What are the "two options" you are considering?
Update:
Putting POItemID in the PO_Table is not an option unless you want your orders to have no more than one item in them.
Just look into it: if you have but a single column which stores the id of the ordered item in the order table, where are you going to store the other items?
Update 2:
If there is a one-to-one relationship, normally you just merge the tables: combine all fields from both tables into a single record.
However, there are cases when you need to split the tables. Say, one of the entities is rarely defined but has too many fields.
In this case, you make a separate relation for the second entity and make its PRIMARY KEY column also a FOREIGN KEY.
Let's imagine a model which describes the locks and the keys, and the keys cannot be duplicated (so one lock matches at most one key and vice versa):
Pairs (PairID PRIMARY KEY, LockID UNIQUE, LockProductionDate, KeyId UNIQUE, KeyProductionDate)
If there is no key for a lock or no lock for a key, we just put NULLS into the corresponding fields.
However, if all keys have a lock but only few locks have keys, we can split the table:
Locks (LockID PRIMARY KEY, LockProductionDate, KeyID UNIQUE)
Keys (KeyID PRIMARY KEY, KeyProductionDate, FOREIGN KEY (KeyID) REFERENCES Locks (KeyID))
As you can see, the KeyID is both a PRIMARY KEY and a FOREIGN KEY in the Keys table.
You may want read this article in my blog:
What is entity-relationship model?
, which describes some ways to map ER model (entities and relationship) into the relational model (tables and foreign keys)
You don't have two options.. A Foreign Key constraint must be attached to the table, (and to the column) that has has the Foreign Key in it. And it must reference (or point to ) the Primary key in the other table. I don't quite understand what you mean when you say you have done this a number of times either way... What other Way ??
It looks like your PO_Table is the logical parent of the PO_Items_Table, which means the primary key of the PO_Table should be used as the Foreign Key in the items table
If PO stands for "Purchase Orders" and PO Item stands for a single line item of a purchase order, then you only have one choice about how to set up foreign keys. There may be many items for each purchase order, but there will only be one purchase order for each item. In this case, Quassnoi gave the correct design.
As a sidelight, every time I have designed a purchase order database, I have made the Items table have a compound primary key made up of POID and ItemID. But ItemID is not unique among all Items, just the items that belong to a single PO. Each time I start a new PO, I begin all over again with ItemID equal to one. This permits me to reconstruct a purchase order later on, and get the items in the same order as they were in when the order was first created. This is a trivial matter for most data processing purposes, but it can drive customers nuts if they look atr a PO later on, and the items are out of sequence, as they perceive sequence.
I have the following tables in my database that have a many-to-many relationship, which is expressed by a connecting table that has foreign keys to the primary keys of each of the main tables:
Widget: WidgetID (PK), Title, Price
User: UserID (PK), FirstName, LastName
Assume that each User-Widget combination is unique. I can see two options for how to structure the connecting table that defines the data relationship:
UserWidgets1: UserWidgetID (PK), WidgetID (FK), UserID (FK)
UserWidgets2: WidgetID (PK, FK), UserID (PK, FK)
Option 1 has a single column for the Primary Key. However, this seems unnecessary since the only data being stored in the table is the relationship between the two primary tables, and this relationship itself can form a unique key. Thus leading to option 2, which has a two-column primary key, but loses the one-column unique identifier that option 1 has. I could also optionally add a two-column unique index (WidgetID, UserID) to the first table.
Is there any real difference between the two performance-wise, or any reason to prefer one approach over the other for structuring the UserWidgets many-to-many table?
You only have one primary key in either case. The second one is what's called a compound key. There's no good reason for introducing a new column. In practise, you will have to keep a unique index on all candidate keys. Adding a new column buys you nothing but maintenance overhead.
Go with option 2.
Personally, I would have the synthetic/surrogate key column in many-to-many tables for the following reasons:
If you've used numeric synthetic keys in your entity tables then having the same on the relationship tables maintains consistency in design and naming convention.
It may be the case in the future that the many-to-many table itself becomes a parent entity to a subordinate entity that needs a unique reference to an individual row.
It's not really going to use that much additional disk space.
The synthetic key is not a replacement to the natural/compound key nor becomes the PRIMARY KEY for that table just because it's the first column in the table, so I partially agree with the Josh Berkus article. However, I don't agree that natural keys are always good candidates for PRIMARY KEY's and certainly should not be used if they are to be used as foreign keys in other tables.
Option 2 uses a simple compund key, option 1 uses a surrogate key. Option 2 is preferred in most scenarios and is close to the relational model in that it is a good candidate key.
There are situations where you may want to use a surrogate key (Option 1)
You are not certain that the compound key is a good candidate key over time. Particularly with temporal data (data that changes over time). What if you wanted to add another row to the UserWidget table with the same UserId and WidgetId? Think of Employment(EmployeeId,EmployeeId) - it would work in most cases except if someone went back to work for the same employer at a later date
If you are creating messages/business transactions or something similar that requires an easier key to use for integration. Replication maybe?
If you want to create your own auditing mechanisms (or similar) and don't want keys to get too long.
As a rule of thumb, when modeling data you will find that most associative entities (many to many) are the result of an event. Person takes up employment, item is added to basket etc. Most events have a temporal dependency on the event, where the date or time is relevant - in which case a surrogate key may be the best alternative.
So, take option 2, but make sure that you have the complete model.
I agree with the previous answers but I have one remark to add.
If you want to add more information to the relation and allow more relations between the same two entities you need option one.
For example if you want to track all the times user 1 has used widget 664 in the userwidget table the userid and widgetid isn't unique anymore.
What is the benefit of a primary key in this scenario? Consider the option of no primary key:
UserWidgets3: WidgetID (FK), UserID (FK)
If you want uniqueness then use either the compound key (UserWidgets2) or a uniqueness constraint.
The usual performance advantage of having a primary key is that you often query the table by the primary key, which is fast. In the case of many-to-many tables you don't usually query by the primary key so there is no performance benefit. Many-to-many tables are queried by their foreign keys, so you should consider adding indexes on WidgetID and UserID.
Option 2 is the correct answer, unless you have a really good reason to add a surrogate numeric key (which you have done in option 1).
Surrogate numeric key columns are not 'primary keys'. Primary keys are technically one of the combination of columns that uniquely identify a record within a table.
Anyone building a database should read this article http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327 by Josh Berkus to understand the difference between surrogate numeric key columns and primary keys.
In my experience the only real reason to add a surrogate numeric key to your table is if your primary key is a compound key and needs to be used as a foreign key reference in another table. Only then should you even think to add an extra column to the table.
Whenever I see a database structure where every table has an 'id' column the chances are it has been designed by someone who doesn't appreciate the relational model and it will invariably display one or more of the problems identified in Josh's article.
I would go with both.
Hear me out:
The compound key is obviously the nice, correct way to go in so far as reflecting the meaning of your data goes. No question.
However: I have had all sorts of trouble making hibernate work properly unless you use a single generated primary key - a surrogate key.
So I would use a logical and physical data model. The logical one has the compound key. The physical model - which implements the logical model - has the surrogate key and foreign keys.
Since each User-Widget combination is unique, you should represent that in your table by making the combination unique. In other words, go with option 2. Otherwise you may have two entries with the same widget and user IDs but different user-widget IDs.
The userwidgetid in the first table is not needed, as like you said the uniqueness comes from the combination of the widgetid and the userid.
I would use the second table, keep the foriegn keys and add a unique index on widgetid and userid.
So:
userwidgets( widgetid(fk), userid(fk),
unique_index(widgetid, userid)
)
There is some preformance gain in not having the extra primary key, as the database would not need to calculate the index for the key. In the above model though this index (through the unique_index) is still calculated, but I believe that this is easier to understand.