DB Design for Choosing One of Multiple Possible Foreign Tables - database

Say if I have two or more vastly different objects that are each represented by a table in the DB. Call these Article, Book, and so on. Now say I want to add a commentening feature to each of these objects. The comments will behave exactly the same in each object, so ideally I would like to represent them in one table.
However, I don't know a good way to do this. The ways I know how to do this are:
Create a comment table per object. So have Article_comments, Book_comments, and so on. Each will have a foreign key column to the appropriate object.
Create one global comment table. Have a comment_type that references "Book" or "Article". Have a foreign key column per object that is nullable, and use the comment_type to determine which foreign key to use.
Either of the above ways will require a model/db update every time a new object is added. Is there a better way?

There is one other strategy: inherit1 different kinds of "commentable" objects from one common table then connect comments to that table:
All 3 strategies are valid and have their pros and cons:
Separate comment tables are clean but require repetition in DML and possibly client code. Also, it's impossible to enforce a common key on them, unless you employ some form of inheritance, which begs the question: why not go straight for (3) in the first place?
One comment table with multiple FKs will have a lot of NULLs (which may or may not be a problem storage and cache-wise) and requires adding a new column to the comments table whenever a new kind of "commentable" object is added to the database. BTW, you don't necessarily need the comment_type - it can be inferred from what field is non-NULL.
Inheritance is not directly supported by current relational DBMSes, which brings its own set of engineering tradeoffs. On the up side, it could enable easy addition of new kinds of commentable objects without changing the rest of the model.
1 Aka. category, subclassing, generalization hierarchy... For more on inheritance, take a look at "Subtype Relationships" section of ERwin Methods Guide.

I personally think your first option is best, but I'll throw this option in for style points:
Comments have a natural structure to them. You have a first comment, maybe comments about a comment. It's a tree of comments really.
What if you added one field to each object that points to the root of the comment tree. Then you can say, "Retrieve the comment tree for article 123.", and you could take the root and then construct the tree based off the one comment table.
Note: I still like option 1 best. =)

Related

Are Comments and Posts the same

I'm currently in the process of building another generic Blog style website, and I got to thinking. Where I usually use a separate table for Posts and another for Comments and then join them using FK's. I began to wonder, are they really worthy of separate tables?
For example properties they both share include:
ID (int)
Title (string)
Body (text)
Poster (FK)
Created At (Datetime)
Updated At (DateTime)
Likes/Dislikes(ints)
Etc..
One Post Has(optional) Many Comments,Many Comments have one Post, but also One Comment, may also have many Comments.
Now would it make more sense. For a table to contain both, Comments and Posts, and self reference them from within? Having a separate lookup table for containing what each type of entity is.
Then however, if a Post is a Comment, and a comment is no different to a post. Except for in a view context, should posted Images also be contained within the same table? As these to, can have likes/comments/name etc.
Question in short: Do blog Posts and Comments belong in the same table?
I'm going to offer you a practical answer.
As a rule of thumb, when you have two apparent sub-types with only one or two different predicates then it isn't necessarily helpful to store them in separate physical tables.
Your logical model should make a distinction between posts and comments, because they have different relationships.
For your physical model, you really only have that one different predicate, according to your description. The basic difference between a post and a comment seems to be the foreign key which references a parent post/comment. I'm assuming you would say a post cannot have a child post, whereas you have said that a comment can have a child comment.
With this being the only difference, I would say that for practical purposes your physical tables should combine posts and comments.
How can you decide in general?
In general it's not cut and dry when to physically sub-type your tables. All design is trade-offs. What I look for are the number of different columns between two sub-types, but also I look at what those columns are and how much they might impact my application logic.
Having more than a few different predicates is usually a pretty good sign that you should be sub-typing physically. However, if these columns are just coming along for the ride, as it were, and don't impact your application logic too much, then maybe they should just be nullable columns on a combined table.
On the other hand, maybe there is only one different column between two sub-types, but that column completely changes the way your application behaves. In that case, maybe for the sake of keeping your code cleaner you should physically sub-type for that column alone.

Parent child design to easily identify child type

In our database design we have a couple of tables that describe different objects but which are of the same basic type. As describing the actual tables and what each column is doing would take a long time I'm going to try to simplify it by using a similar structured example based on a job database.
So say we have following tables:
These tables have no connections between each other but share identical columns. So the first step was to unify the identical columns and introduce a unique personId:
Now we have the "header" columns in person that are then linked to the more specific job tables using a 1 to 1 relation using the personId PK as the FK. In our use case a person can only ever have one job so the personId is also unique across the Taxi driver, Programmer and Construction worker tables.
While this structure works we now have the use case where in our application we get the personId and want to get the data of the respective job table. This gets us to the problem that we can't immediately know what kind of job the person with this personId is doing.
A few options we came up with to solve this issue:
Deal with it in the backend
This means just leaving the architecture as it is and look for the right table in the backend code. This could mean looking through every table present and/or construct a semi-complicated join select in which we have to sift through all columns to find the ones which are filled.
All in all: Possible but means a lot of unecessary selects. We also would like to keep such database oriented logic in the actual database.
Using a Type Field
This means adding a field column in the Person table filled for example with numbers to determine the correct child table like:
So you could add a 0 in Type if it's a taxi driver, a 1 if it's a programmer and so on...
While this greatly reduced the amount of backend logic we then have to make sure that the numbers we use in the Type field are known in the backend and don't ever change.
Use separate IDs for each table
That means every job gets its own ID (has to be nullable) in Person like:
Now it's easy to find out which job each person has due to the others having an empty ID.
So my question is: Which one of these designs is the best practice? Am i missing an obvious solution here?
Bill Karwin made a good explanation on a problem similar to this one. https://stackoverflow.com/a/695860/7451039
We've now decided to go with the second option because it seem to come with the least drawbacks as described by the other commenters and posters. As there was no actual answer portraying the second option as a solution i will try to summarize our reasoning:
Against Option 1:
There is no way to distinguish the type from looking at the parent table. As a result the backend would have to include all logic which includes scanning all tables for the that contains the id. While you can compress most of the logic into a single big Join select it would still be a lot more logic as opposed to the other options.
Against Option 3:
As #yuri-g said this one is technically not possible as the separate IDs could not setup as primary keys. They would have to be nullable and as a result can't be indexed, essentially rendering the parent table useless as one of the reasons for it was to have a unique personID across the tables.
Against a single table containing all columns:
For smaller use cases as the one i described in the question this might me viable but we are talking about a bunch of tables with each having roughly 2-6 columns. This would make this option turn into a column-mess really quickly.
Against a flat design with a key-value table:
Our properties have completly different data types, different constraints and foreign key relations. All of this would not be possible/difficult in this design.
Against custom database objects containt the child specific properties:
While this option that #Matthew McPeak suggested might be a viable option for a lot of people our database design never really used objects so introducing them to the mix would likely cause confusion more than it would help us.
In favor of the second option:
This option is easy to use in our table oriented database structure, makes it easy to distinguish the proper child table and does not need a lot of reworking to introduce. Especially since we already have something similar to a Type table that we can easily use for this purpose.
Third option, as you describe it, is impossible: no RDBMS (at least, of I personally know about) would allow you to use NULLs in PK (even composite).
Second is realistic.
And yes, first would take up to N queries to poll relatives in order to determine the actual type (where N is the number of types).
Although you won't escape with one query in second case either: there would always be two of them, because you cant JOIN unless you know what exactly you should be joining.
So basically there are flaws in your design, and you should consider other options there.
Like, denormalization: line non-shared attributes into the parent table anyway, then fields become nulls for non-correpondent types.
Or flexible, flat list of attribute-value pairs related through primary key (yes, schema enforcement is a trade-off).
Or switch to column-oriented DB: that's a case for it.

How to model a database structure with repeating fields in every table

I'm in the process of structuring a databasemodel for my new project. For all the entities in my model (which is a cms, and the entities as such f.ex: page, content, menu, template and a bunch of others) they all have in common the same attributes on dates and names.
More specifically each entity contains the following for the dates: IsCreated, IsValidFrom, IsPublished, IsDeleted, IsEdited and IsExpired, and for names: CreatedByNameId, ValidFromByNameId, PublishedByNameId and so on...
I'm going to use EF5 for mapping to objects.
The question is as simple: What is the best way to structure this: Having all the fields in every table (which I am not obliged to...) or to have two separate tables which the other can relate to...?
Thanks in advance /Finn.
First of all - give this a read - http://www.agiledata.org/essays/mappingObjects.html
You really need to think about your queries/access paths. There are many tradeoffs between different implementations.
In reply to your example though,
Given the following setup:
COMMON
ValidFromByNameId
SPECIFIC1
FieldA
SPECIFIC2
FieldB
Querying by the COMMON attributes is easy but you'll have to work some magic when pulling up the subclasses (unless EF5 does it for you)
If the primary questions you're asking are about specific1 and specific2 then perhaps this isn't the right model. having the COMMON table doesn't really buy you much necessary as it will introduce a join to load any Specific1 object. In this case, i'd probably just have duplicate columns.
This answer is intentionally partial as a full answer is better handled by the numerous articles and blogs already out there. Search for "mapping object hierarchies to databases"

Naming database table fields to designate relationships?

Lets say I have tables Student and Mentor
Does anyone use naming convention for relationship as below? I think this way is good to see the relationships quickly. Would anyone suggest a better way?
Student
StudentID
StudentName
Student2MentorID
To start from scratch, - you probably know this already - there are several ways to represent your database schema, I mean, by using diagrams, for example ER-diagrams that helps you (and your team) stay up to date with your database's design and thus making it simpler to understand.
Now, personally when it comes to implementation, I do use some kind of naming-convention. For example:
For large projects, I use double underscores to split between table categories, (ie. hr__personnel, hr__clocks, hr__timetable, vehicles__cars, vehicles__trips) and so on.
Now, having a relationship between two tables, I do Include both (or all) of the involved table names. (ie. hr__personnel_timetable, vehicles__cars_trips, etc)
Sometimes, (as we all know), we cannot follow strictly a standard, so in those cases I use my own criteria when choosing large relationships' names.
As a rule, I also name table attributes by a three-letter preffix. For example, in my table trips, my fields will be tri_id,tri_distance, tri_elapsed
Note also, that in the above item, I didn't include a Foreign Key. So here I go then. When it comes to FK's, It's easy for me (and my team) to realize that the field IS a FK.
If we follow the previous example, I would like to know who drives in each trip (to make it easier, we assume that only one person drives one trip). So my table now is something like this: tri_id, per_id, tri_distance, tri_elapsed. Now you can easily realize that per_id is just a foreign field of the table. Just, another hint to help.
Just by following these simple steps, you will save hours, and probably some headaches too.
Hope this helps.
I think: you can add prefix (3 letters) to table depending that module represents (scholar,sales,store)
module: scholar ->sc
table: scStudent ( IdStudent,nameStudent..)
table: scMentor(IdMentor,nameMentor...)
relationship
scMentorStudent (IdMentorStudent pk..)
You can use Microsoft's EF notation :
http://weblogs.asp.net/jamauss/pages/DatabaseNamingConventions.aspx
It is better to use underscores...
I suggest to simply use existing naming convention rules such as this one:
http://www.oracle-base.com/articles/misc/naming-conventions.php

Database Child Table with Two Possible Parents

First off, I'm not sure how exactly to search for this, so if it's a duplicate please excuse me. And I'm not even sure if it'd be better suited to one of the other StackExchange sites; if so, please let me know and I'll ask over there instead. Anyways...
Quick Overview of the Project
I'm working on a hobby project -- a writer's notebook of sorts -- to practice programming and database design. The basic structure is fairly simple: the user can create notebooks, and under each notebook they can create projects associated with that notebook. Maybe the notebook is for a series of short stories, and each project is for an individual story.
They can then add items (scenes, characters, etc.) to either a specific project within the notebook, or to the notebook itself so that it's not associated with a particular project. This way, they can have scenes or locations that span multiple projects, as well as having some that are specific to a particular project.
The Problem
I'm trying to keep a good amount of the logic within the database -- especially within the table structure and constraints if at all possible. The basic structure I have for a lot of the items is basically like this (I'm using MySql, but this is a pretty generic problem -- just mentioning it for the syntax):
CREATE TABLE SCENES(
ID BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY NOT NULL,
NOTEBOOK BIGINT UNSIGNED NULL,
PROJECT BIGINT UNSIGNED NULL,
....
);
The problem is that I need to ensure that at least one of the two references, NOTEBOOK and/or PROJECT, are set. They don't have to both be set -- PROJECT has a reference to the NOTEBOOK it's in. I know I could just have a generic "Parent Id" field, but I don't believe it'd be possible to have a foreign key to two tables, right? There's also the possibility of adding additional cross-reference tables -- i.e. SCENES_X_NOTEBOOKS and SCENES_X_PROJECTS -- but that'd get out of hand pretty quickly, since I'd have to add similar tables for each of the different item types I'm working with. That would also introduce the problem of ensuring each item has an entry in the cross reference tables.
It'd be easy to put this kind of logic in a stored procedure or the application logic, but I'd really like to keep it in a constraint of some kind if at all possible, just to eliminate any possibility that the logic got bypassed some how.
Any thoughts? I'm interested in pretty much anything -- even if it involves a redesign of the tables or something.
The thing about scenes and characters is that a writer might drop them from their current project. When that happens, you don't want to lose the scenes and characters, because the writer might decide to use them years later.
I think the simplest approach is to redefine this:
They can then add items (scenes, characters, etc.) to either a
specific project within the notebook, or to the notebook itself so
that it's not associated with a particular project.
Instead of that, think about saying this.
They can then add items (scenes, characters, etc.) to either a
user-defined project within the notebook, or to the system-defined
project named "Unassigned". The project "Unassigned" is for things not
currently assigned to a user-defined project.
If you do that, then scenes and characters will always be assigned to a project--either to a user-defined project, or to the system-defined project named "Unassigned".
I'm unclear as to what exactly are you requirements, but let me at least try to answer some of your individual questions...
The problem is that I need to ensure that at least one of the two references, NOTEBOOK and/or PROJECT, are set.
CHECK (NOTEBOOK IS NOT NULL OR PROJECT IS NOT NULL)
I don't believe it'd be possible to have a foreign key to two tables, right?
Theoretically, you can reference two tables from the same field, but this would mean both of these tables would have to contain the matching row. This is probably not what you want.
You are on the right track here - let the NOTEBOOK be the child endpoint of a FK towards one table and the PROJECT towards the other. A NULL foreign key will not be enforced, so you don't have to set both of them.
There's also the possibility of adding additional cross-reference tables -- i.e. SCENES_X_NOTEBOOKS and SCENES_X_PROJECTS -- but that'd get out of hand pretty quickly, since I'd have to add similar tables for each of the different item types I'm working with.
If you are talking about junction (aka. link) tables that model many-to-many relationships, then yes - you'd have to add them for each pair of tables engaged in such a relationship.
You could, however, minimize the number of such table pairs by using inheritance (aka. category, subclassing, subtype, generalization hierarchy...). Imagine you have a set of M tables that have to be connected to a second set of N tables. Normally, you'd have create M*N junction tables. But if you inherit all tables in the first set from a common parent table, and do the same for the second set, you can now connect them all through just one junction table between these two parent tables.
The full discussion on inheritance is beyond the scope here, but you might want to look at "ERwin
Methods Guide", chapter "Subtype Relationships", as well as this post.
It'd be easy to put this kind of logic in a stored procedure or the application logic, but I'd really like to keep it in a constraint of some kind if at all possible, just to eliminate any possibility that the logic got bypassed some how.
Your instincts are correct - make database "defend" itself from the bad data as much as possible. Here is the order of preference for ensuring the correctness of your data:
The structure of tables themselves.
The declarative database constraints (integrity of domain, integrity of key and referential integrity).
Triggers and stored procedures.
Middle tier.
Client.
For example, if you can ensure a certain logic must be followed just by using declarative database constraints, don't put it in triggers.

Resources