Are Comments and Posts the same - database

I'm currently in the process of building another generic Blog style website, and I got to thinking. Where I usually use a separate table for Posts and another for Comments and then join them using FK's. I began to wonder, are they really worthy of separate tables?
For example properties they both share include:
ID (int)
Title (string)
Body (text)
Poster (FK)
Created At (Datetime)
Updated At (DateTime)
Likes/Dislikes(ints)
Etc..
One Post Has(optional) Many Comments,Many Comments have one Post, but also One Comment, may also have many Comments.
Now would it make more sense. For a table to contain both, Comments and Posts, and self reference them from within? Having a separate lookup table for containing what each type of entity is.
Then however, if a Post is a Comment, and a comment is no different to a post. Except for in a view context, should posted Images also be contained within the same table? As these to, can have likes/comments/name etc.
Question in short: Do blog Posts and Comments belong in the same table?

I'm going to offer you a practical answer.
As a rule of thumb, when you have two apparent sub-types with only one or two different predicates then it isn't necessarily helpful to store them in separate physical tables.
Your logical model should make a distinction between posts and comments, because they have different relationships.
For your physical model, you really only have that one different predicate, according to your description. The basic difference between a post and a comment seems to be the foreign key which references a parent post/comment. I'm assuming you would say a post cannot have a child post, whereas you have said that a comment can have a child comment.
With this being the only difference, I would say that for practical purposes your physical tables should combine posts and comments.
How can you decide in general?
In general it's not cut and dry when to physically sub-type your tables. All design is trade-offs. What I look for are the number of different columns between two sub-types, but also I look at what those columns are and how much they might impact my application logic.
Having more than a few different predicates is usually a pretty good sign that you should be sub-typing physically. However, if these columns are just coming along for the ride, as it were, and don't impact your application logic too much, then maybe they should just be nullable columns on a combined table.
On the other hand, maybe there is only one different column between two sub-types, but that column completely changes the way your application behaves. In that case, maybe for the sake of keeping your code cleaner you should physically sub-type for that column alone.

Related

How to model a database structure with repeating fields in every table

I'm in the process of structuring a databasemodel for my new project. For all the entities in my model (which is a cms, and the entities as such f.ex: page, content, menu, template and a bunch of others) they all have in common the same attributes on dates and names.
More specifically each entity contains the following for the dates: IsCreated, IsValidFrom, IsPublished, IsDeleted, IsEdited and IsExpired, and for names: CreatedByNameId, ValidFromByNameId, PublishedByNameId and so on...
I'm going to use EF5 for mapping to objects.
The question is as simple: What is the best way to structure this: Having all the fields in every table (which I am not obliged to...) or to have two separate tables which the other can relate to...?
Thanks in advance /Finn.
First of all - give this a read - http://www.agiledata.org/essays/mappingObjects.html
You really need to think about your queries/access paths. There are many tradeoffs between different implementations.
In reply to your example though,
Given the following setup:
COMMON
ValidFromByNameId
SPECIFIC1
FieldA
SPECIFIC2
FieldB
Querying by the COMMON attributes is easy but you'll have to work some magic when pulling up the subclasses (unless EF5 does it for you)
If the primary questions you're asking are about specific1 and specific2 then perhaps this isn't the right model. having the COMMON table doesn't really buy you much necessary as it will introduce a join to load any Specific1 object. In this case, i'd probably just have duplicate columns.
This answer is intentionally partial as a full answer is better handled by the numerous articles and blogs already out there. Search for "mapping object hierarchies to databases"

DB Design for Choosing One of Multiple Possible Foreign Tables

Say if I have two or more vastly different objects that are each represented by a table in the DB. Call these Article, Book, and so on. Now say I want to add a commentening feature to each of these objects. The comments will behave exactly the same in each object, so ideally I would like to represent them in one table.
However, I don't know a good way to do this. The ways I know how to do this are:
Create a comment table per object. So have Article_comments, Book_comments, and so on. Each will have a foreign key column to the appropriate object.
Create one global comment table. Have a comment_type that references "Book" or "Article". Have a foreign key column per object that is nullable, and use the comment_type to determine which foreign key to use.
Either of the above ways will require a model/db update every time a new object is added. Is there a better way?
There is one other strategy: inherit1 different kinds of "commentable" objects from one common table then connect comments to that table:
All 3 strategies are valid and have their pros and cons:
Separate comment tables are clean but require repetition in DML and possibly client code. Also, it's impossible to enforce a common key on them, unless you employ some form of inheritance, which begs the question: why not go straight for (3) in the first place?
One comment table with multiple FKs will have a lot of NULLs (which may or may not be a problem storage and cache-wise) and requires adding a new column to the comments table whenever a new kind of "commentable" object is added to the database. BTW, you don't necessarily need the comment_type - it can be inferred from what field is non-NULL.
Inheritance is not directly supported by current relational DBMSes, which brings its own set of engineering tradeoffs. On the up side, it could enable easy addition of new kinds of commentable objects without changing the rest of the model.
1 Aka. category, subclassing, generalization hierarchy... For more on inheritance, take a look at "Subtype Relationships" section of ERwin Methods Guide.
I personally think your first option is best, but I'll throw this option in for style points:
Comments have a natural structure to them. You have a first comment, maybe comments about a comment. It's a tree of comments really.
What if you added one field to each object that points to the root of the comment tree. Then you can say, "Retrieve the comment tree for article 123.", and you could take the root and then construct the tree based off the one comment table.
Note: I still like option 1 best. =)

representing topics and tables in one table vs. two tables?

I'm currently building a forum component for a larger application and I'm considering different approaches to certain parts of the database schema. In particular, I am considering representing topics and posts in a single table. While I view topics and posts as practically the same, I feel a bit apprehensive as this may make things less flexible in the future.
When topics of a particular forum are queried, the title and first post will be shown as well as some of the user information (basically the name and avatar). In this application, there are various attributes that are used by both topics and posts except for views and replies; and perhaps title, and forum_id(forum_id because that would mean potentially hundreds of records need to be affected if a topic is changed to another forum as opposed to changing the forum_id attribute in the topic relation).
The tables look something like what I have below here:
TOPIC POST
topic_id poster_id
forum_id topic_id
poster_id content
title upvote
views dnvote
replies closed
post_id deleted
last_edited
last_editor
parent_id
content
post_id
Doing it this way, using table inheritance, generating the posts in the topic would require a 4-table join via TOPIC, POST, USER, and TOPIC_TYPE.
On the other hand, if I decide to take the single table approach, should I simply leave the views, replies, title, and forum_id attributes as null if the topic_type is a regular post? (topic_type references an appropriate icon for the type of topic displayed, and will be used for statistics and etc.)
If you are definitely committed to using relational technology (I would consider NoSQL db for this like Mongo, etc. as well) I would separate into two tables as you proposed.
Your case here is a fundamental of relational master-detail design or whole-parts and I think that two tables are appropriate.
I think in this scenario simple normalization is preferred. It will also be useful to generate different types of reports as well. Although single table may be used but as you have designed the table in this case if you use two tables that would be more manageable to avoid entry of same value multiple times.
It might be worthwhile to distinguish between a "topic" and a "topic starter" as such. A "topic starter" is a comment that is not a reply. Every topic has exactly one topic starter, which could be referenced by a foreign key in the topic table.
Other than that, I agree with both your analysis and your design.

Person name structure in separate database table

I am wondering when and when not to pull a data structure into a separate database table when it appears in several tables.
I have pulled the 12 attribute address structure into a separate table because I have a couple of different entities containing a single address in this format.
But how about my 3 attribute person name structure (given, middle, surname)?
Should this be put into its own table referenced with a foreign key for all the entities containing a name... e.g. the company table has a contact person name, the citizen table has a person name etc.
Are these best left as attributes in the main tables or should they be extracted?
I would usually keep the address on the Person table, unless there was an unusual need for absolutely uniform addresses on each entity, or if an entity could have an arbitrary number of addresses, or if addresses need to be shared between entities, or if it was a large enterprise product where I know I have to invest in infrastructure all over the place or I will end up gutting everything down the road.
Having your addresses in a seperate table is interesting because it's flexible, but in the context of a small project lacking a special need like the ones mentioned above, it's probably a slight waste. Always be aware of the balance between complexity and flexibility. Flexibility is important, but be discriminating... It's easy to invest way too much there!
In concrete terms, the times that I experimented with (for instance) one-to-one relationships for things like addresses, I ended up refactoring them back into the table because it introduced a bunch of headaches including more complex queries, dealing with situations where the address does not exist, etc. More entities also increases your cognitive load -- it makes the project harder to think about. In my case, it was an unecessary cost because there was no concrete need and, in truth, not even a gain in flexibility.
So, based on my experiences, I would "try" to keep the addresses in the same table, and I would definitely keep the names on them - again, unless there was a special need.
So to paraphrase Einstein, make it as simple as possible and no simpler. But in the short term, experiment. It's the best way to learn these lessons.
It's about not repeating information, so you don't want to store the same information in two places when one will do.
Another useful rule of thumb is one entity per table. If you find that one table contains, say, "person" AND "order" then you probably should split those into two tables.
And (putting myself at risk of repeating information...) you might find it helpful to review some database design basics, there are plenty of related questions here on stackoverflow.
Start with these...
What is normalisation?
What is important to keep in mind when designing a database
How many fields is 'too many'?
More tables or more columns?
Creating a person entity across your data model will give you this present and future advantages -
The same person occurring as a contact, or individual in different contexts. Saves redundancy.
Info can be maintained and kept current with far-less effort.
Easier to search for a person and identify them - i.e. is it the same John Smith?
You can expand the information - i.e. maintain addresses for this person far more easily.
Programming will be more consistent and debugging will be easier as well.
Moves you closer to a 'self-documenting' system.
As a counterpoint to the other (entirely valid) replies: within your application's current structure, how likely will it be for a given individual (not just name, the actual "person" -- multiple people could be "John Smith") to appear in more than one table? The less likely this is to happen, the less likely you are to get benefits from normalization.
Another way to think of it is entities. Outside of labels (names), is their any overlap between "customer" entity and an "employee" entity?
Extract them. Your aim should be to have no repeating data in your database.
Read about Normalization
It really depends on the problem you are trying to solve. In general it is probably a good idea to have some sort of 'person' table which holds details of people. However, there are occasions where that is potentially a very bad idea.
One example would be if you are holding details of prescriptions written out to people by a doctor. In some countries it is a legal requirment that the prescription details are held with the name in which they were prescribed NOT the name the person is going under currently. For instance a woman might be prescribed a drug as miss X, but then she gets married and becomes Mrs Y. If you had a person table that was linked to the prescriptions table you would now have the wrong details and would possibly face legal consequences. In that case you would need to probably copy the relevant details of the person into the prescription table, even though this would be duplicating data.
So again - it depends on the problem you are trying to solve. Don't just blindly follow what people consider to be best practices. Understand your data and any issues surrounding it, then try to follow best practices that fit.
Depends on what you're using the database for.
If you want fast queries on your tables you should de-normalize your tables. Having to run multiple JOIN's will take longer and make your queries more complex.
On the other hand if your intention is to have a flexible storage database which is not meant to be hit with a ton of fast-response queries, then normalizing the tables by splitting them out into multiple xref'ed tables will provide more flexibility in your design and reduce the need for submitting duplicated data.
Since de-normalization is "optimization", I would suggest you normalize the tables first, index them properly and see if you're getting any bottlenecks on your queries. If so, flatten the affected tables where needed.
You should really consider your whole database structure and do a ER diagram (entity relationship diagram) first. OF COURSE there should be another table called "Person" where the concept of a person is stored...

Writing Comments table in database

I'm working on a social networking system that will have comments coming from several different locations. One could be friends, one could be events, one could be groups--much like Facebook. What I'm wondering is, from a practical standpoint, what would be the simplest way to write a comments table? Should I do it all in one table and allow foreign keys to all sorts of different tables, or should each distinct table have its own comment table? Thanks for the help!
A single comments table is the more elegant design, I think. Rather than multiple FKs though, consider an intermediate table - CommentedItem. So Friend, Event, Group, etc all have FKs to CommentedItem, and you create a CommentedItem row for each new row in each of those tables. Now Comments only needs one FK, to CommentedItem. For example, to get all Comments for a given Friend:
SELECT * FROM Comment c
JOIN CommentedItem ci on c.CommentedItemId = ci.CommentedItemId
JOIN Friend f on f.CommentedItemId = ci.CommentedItemId
WHERE f.FriendId = #FriendId
I've done both and the answer depends on the situation. For what you are trying to do, I would do a SINGLE "Comments" table, and then seperate "linker" tables. This will give you the best performance as you can achieve the "Perfect Index".
I would also recommend putting a "CommentTypeID" field in the Comments table to give a 'clue' as to which linker table you will pull from for the aditional detail.
EDIT: The CommentTypeID field should not be used in the indexes, but rather it's only for use in code.
one thing to be careful about is if you don't do a highly normalized database it can sometimes cause IO row chaining and table scans.
I believe oracle suggests performing a normalization model of about 3rd Normal form.
This is an equivalent question to this one.
EDIT: Based on a comment, it isn't clear that this is an equivalent question, so I spell it out below.
Both questions ask about projects (both happen to be Social Networks, but that's just coincidence) where there is a question about the performance of the database. Both have a diverse set of objects that share a common collection of attributes (in one it is Events, that occur on each object, in the other it is Comments that occur on each object).
Both questions effectively ask whether it is more efficient to create a UNION query that combines the disparate common features, or to factor them out into a common table, with appropriate foreign keys.
I see them as equivalent; the best answer to one will apply equally to the other.
(If you disagree, I am happy to hear why; please leave a comment.)
I would go for polymorphic associations. Many modern web development frameworks support it out of the box, which makes it really the simplest and most painless way to handle these kind of relationships.
Actually you can probably go to http://www.zazazine.com and look through their articles. You may find an answer there

Resources