Relational Database Inheritance foreign keys and primary keys - database

I'm working on a Database in which I'm trying to deduce the best ways to apply inheritance.
So far I was having 2 subclasses from an Entity, and I asked in Extended Entity-Relationship Model to tables (subclasses) about how to implement it on relational tables.
I decided to go with Concrete Table so I created 2 tables, one for each subclass of the Entity. I encountered 2 problems:
My primary keys were id int primary key autoincrement, which means the first row of each table is gonna have id = 1. So the key isn't actually unique, So when referencing it from another tables, there is no way to know which of the 2 table subclasses are we referencing (unless I add an unnecesary(?) extra column.
When adding a Foreign Key that references said id, the foreign key is supposed to referece both subclasses tables, but I don't know if that is even possible.
Any ideas or opinions about how this could be done could help a lot. thanks

It would probably make sense to have the child class tables reference the parent class, instead of the other way around. Then you can have an id column on the Entity table which is unique and foreign keys from the children to their parent instances. Presumably this will help when you want to use the data to instantiate an object in your code as well, since you should know which class you are instantiating and only care about its ancestors, not its children.

Related

Is this notation correct in Database Design ERD?

I'm creating an ERD and in this m:n relationship I'm trying to indicate that there is a composite key in the LOCATION entity (by combining Location_ID and Department_ID). I realise that this will involve a joining table when it comes to creating a table relationship diagram, but in the ERD, is this notation correct to indicate a composite key?
Your PK,FK demonstration is technically not wrong but for an ERD you ultimately want to remove all many to many relationships otherwise it will cause you more problems down the line. You especially want to remove relationships like this if you have a composite key.
Here's a quick example of how i would do it roughly. (i could do a better one if i understood more info on your scenario and other tables etc...)
You ideally want to create another entity which holds both primary keys from the other tables and therefore creates a composite key. Notice, this also removes the many to many relationships.
I hope this gives you more of an understanding :)

Implementing tree in a database with child having more than one parent

I've table with fields Id,Name,ParentId(Id) & Leaf. I want to model tree like structure where child/element with Leaf=1 can have more than more parent. How can I model this situation in this table or do I need an extra table to handle this thing. I want this modelling for implementing Tags like in Stack overflow.
You'll need another table, unless "more than one parent" has a smallish upper limit, in which case you can add ParentID fields for the number of possible parents, but this is not recommended.
You appear to have a many-to-many relationship. This can be modelled as below:
Entity table
ID (Primary Key)
... - other entity fields
Parents table
ChildID (Foreign Key - Entity.ID)
ParentID (Foreign Key - Entity.ID)
The Leaf=1 entities being the only ones allowed to have multiple parents is a constraint that is best handled on a code level, or possibly with database triggers.
It doesn't seem possible to enforce this directly without creating another table (a third one) (which will contain all entities with Leaf=1, either linking to an Entity entry or having the row defined only there, though I would not advise either - it's messy and not the type of constraint you design your database around).

Database design - defining a basic many-to-one relationship

This is a basic database design question. I want a table (or multiple tables) defining relationships between customers. I want it so PrimaryCustomer can be linked to multiple SecondaryCustomers, and can have many SecondaryCustomers with the same relationship.
PrimaryCustomerID RelationshipID SecondaryCustomerID
1) If the primary key is {PrimaryCustomerID} then I can only have one linked customer of any kind.
2) If the primary key is {PrimaryCustomerID, RelationshipID}, then I can only have one linked customer for each relationship type.
3) If the primary key is {PrimaryCustomerID, RelationshipID, SecondaryCustomerID}, then I can have whatever I like, but having all columns as the primary key seems completely wrong.
What's the right way to set things up?
A third alternative might be for the key to be (PrimaryCustomerId, SecondaryCustomerId), which would make sense if only one type of relationship is permitted per pair of customers. What keys to implement should be defined by what dependencies you need to represent in the table so that the table accurately represents the reality you are modelling. There's nothing wrong in principle with compound keys or all-key tables.
Number 3 is the right way to go for this data model. Linking tables often have all the columns in a join as all they do is link to other tables.
If a customer can only be linked to one primary customer then you can use a simple recursive relationship in the customer table itself.
CustomerID as PK
PrimaryCustomerID as FK to CustomerID
Nothing wrong with No 3.
If you need to prevent reverse-relationship duplicates, you can use
ALTER TABLE CustomerRelationship
ADD CONSTRAINT chk_id CHECK (PrimaryCustomerId < SecondaryCustomerId);

Is there an answer matrix I can use to decide if I need a foreign key or not?

For example, I have a table that stores classes, and a table that stores class_attributes. class_attributes has a class_attribute_id and a class_id, while classes has a class_id.
I'd guess if a dataset is "a solely child of" or "belongs solely to" or "is solely owned by", then I need a FK to identify the parent. Without class_id in the class_attributes table I could never find out to which class this attribute belongs to.
Maybe there's an helpful answer matrix for this?
Wikipedia is helpful.
In the context of relational
databases, a foreign key is a
referential constraint between two
tables.1 The foreign key identifies
a column or a set of columns in one
(referencing) table that refers to a
column or set of columns in another
(referenced) table. The columns in the
referencing table must be the primary
key or other candidate key in the
referenced table.
(and it goes on into more and more detail)
If you want to enforce the constraint that each row in class_attributes applies to exactly one row of classes, you need a foreign key. If you don't care about enforcing this (ie, you're fine to have attributes for non-existent classes), you don't need an FK.
I don't have an answer matrix, but just for clarification purposes, we're talking about Database Normalization:
http://en.wikipedia.org/wiki/Database_normalization
And to a certain extent Denormalization:
http://en.wikipedia.org/wiki/Denormalization
I would say, it's the other way around. First, you design what kind of objects you need to have. For those will create a table.
Part of this phase is designing the keys, that is the combinations of attributes (columns) that uniquely identify the object. You may or may not add an artificial key or surrogate key for convenience or performance reasons. From these keys, you typically elect one canonical key, the primary key, which you try to use consistently to identify objects in that table (you keep the other keys too, they serve to ensure unicity as a business rule, not so much for identificattion purposes.)
Then, you think what relationships exist between the objects. An object that is 'owned' by another object, or an object that refers to another object needs some way to identify its related object. In the corresponding table (child table) you add columns to make a foreign key to point to the primary key of the referenced table.
This takes care of all one to many relationships.
Sometimes, an object can be related multiple times to another object. For example, an order can be used to order multiple products, but a product can appear on multiple orders as well. For those relationships, you design a separate table (intersection table - in this example, order_items). This table will have a unique key created from two foreign keys: one pointing to the one parent (orders), one to the other parent (products). And again, you add the columns to the intersection table that you need to create those foreign keys.
So in short, you first design keys and foreign keys, only then you start adding columns to implement them.
Don't be concerned with the type of relationship -- it has more to do with the cardinality of the relationship.
If you have a one-to-many relationship, then you'd want to assign a Primary Key to the smaller of the tables, and store it as a Foreign Key in the larger table.
You'd also do it with one-to-one relationships, but some people argue that you should avoid them.
In the case of a many-to-many relationship, you'd want to make a join table, and then have each of the original tables have a foreign key to the join table.

One or Two Primary Keys in Many-to-Many Table?

I have the following tables in my database that have a many-to-many relationship, which is expressed by a connecting table that has foreign keys to the primary keys of each of the main tables:
Widget: WidgetID (PK), Title, Price
User: UserID (PK), FirstName, LastName
Assume that each User-Widget combination is unique. I can see two options for how to structure the connecting table that defines the data relationship:
UserWidgets1: UserWidgetID (PK), WidgetID (FK), UserID (FK)
UserWidgets2: WidgetID (PK, FK), UserID (PK, FK)
Option 1 has a single column for the Primary Key. However, this seems unnecessary since the only data being stored in the table is the relationship between the two primary tables, and this relationship itself can form a unique key. Thus leading to option 2, which has a two-column primary key, but loses the one-column unique identifier that option 1 has. I could also optionally add a two-column unique index (WidgetID, UserID) to the first table.
Is there any real difference between the two performance-wise, or any reason to prefer one approach over the other for structuring the UserWidgets many-to-many table?
You only have one primary key in either case. The second one is what's called a compound key. There's no good reason for introducing a new column. In practise, you will have to keep a unique index on all candidate keys. Adding a new column buys you nothing but maintenance overhead.
Go with option 2.
Personally, I would have the synthetic/surrogate key column in many-to-many tables for the following reasons:
If you've used numeric synthetic keys in your entity tables then having the same on the relationship tables maintains consistency in design and naming convention.
It may be the case in the future that the many-to-many table itself becomes a parent entity to a subordinate entity that needs a unique reference to an individual row.
It's not really going to use that much additional disk space.
The synthetic key is not a replacement to the natural/compound key nor becomes the PRIMARY KEY for that table just because it's the first column in the table, so I partially agree with the Josh Berkus article. However, I don't agree that natural keys are always good candidates for PRIMARY KEY's and certainly should not be used if they are to be used as foreign keys in other tables.
Option 2 uses a simple compund key, option 1 uses a surrogate key. Option 2 is preferred in most scenarios and is close to the relational model in that it is a good candidate key.
There are situations where you may want to use a surrogate key (Option 1)
You are not certain that the compound key is a good candidate key over time. Particularly with temporal data (data that changes over time). What if you wanted to add another row to the UserWidget table with the same UserId and WidgetId? Think of Employment(EmployeeId,EmployeeId) - it would work in most cases except if someone went back to work for the same employer at a later date
If you are creating messages/business transactions or something similar that requires an easier key to use for integration. Replication maybe?
If you want to create your own auditing mechanisms (or similar) and don't want keys to get too long.
As a rule of thumb, when modeling data you will find that most associative entities (many to many) are the result of an event. Person takes up employment, item is added to basket etc. Most events have a temporal dependency on the event, where the date or time is relevant - in which case a surrogate key may be the best alternative.
So, take option 2, but make sure that you have the complete model.
I agree with the previous answers but I have one remark to add.
If you want to add more information to the relation and allow more relations between the same two entities you need option one.
For example if you want to track all the times user 1 has used widget 664 in the userwidget table the userid and widgetid isn't unique anymore.
What is the benefit of a primary key in this scenario? Consider the option of no primary key:
UserWidgets3: WidgetID (FK), UserID (FK)
If you want uniqueness then use either the compound key (UserWidgets2) or a uniqueness constraint.
The usual performance advantage of having a primary key is that you often query the table by the primary key, which is fast. In the case of many-to-many tables you don't usually query by the primary key so there is no performance benefit. Many-to-many tables are queried by their foreign keys, so you should consider adding indexes on WidgetID and UserID.
Option 2 is the correct answer, unless you have a really good reason to add a surrogate numeric key (which you have done in option 1).
Surrogate numeric key columns are not 'primary keys'. Primary keys are technically one of the combination of columns that uniquely identify a record within a table.
Anyone building a database should read this article http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327 by Josh Berkus to understand the difference between surrogate numeric key columns and primary keys.
In my experience the only real reason to add a surrogate numeric key to your table is if your primary key is a compound key and needs to be used as a foreign key reference in another table. Only then should you even think to add an extra column to the table.
Whenever I see a database structure where every table has an 'id' column the chances are it has been designed by someone who doesn't appreciate the relational model and it will invariably display one or more of the problems identified in Josh's article.
I would go with both.
Hear me out:
The compound key is obviously the nice, correct way to go in so far as reflecting the meaning of your data goes. No question.
However: I have had all sorts of trouble making hibernate work properly unless you use a single generated primary key - a surrogate key.
So I would use a logical and physical data model. The logical one has the compound key. The physical model - which implements the logical model - has the surrogate key and foreign keys.
Since each User-Widget combination is unique, you should represent that in your table by making the combination unique. In other words, go with option 2. Otherwise you may have two entries with the same widget and user IDs but different user-widget IDs.
The userwidgetid in the first table is not needed, as like you said the uniqueness comes from the combination of the widgetid and the userid.
I would use the second table, keep the foriegn keys and add a unique index on widgetid and userid.
So:
userwidgets( widgetid(fk), userid(fk),
unique_index(widgetid, userid)
)
There is some preformance gain in not having the extra primary key, as the database would not need to calculate the index for the key. In the above model though this index (through the unique_index) is still calculated, but I believe that this is easier to understand.

Resources