What is the best database design when multiple tables share a common data model? [closed]

What is the best database design when multiple tables share a common data model? [closed] - database

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
It is very common coming up with the situation where you have multiple tables, let's say Posts, Entries and News and the users can comment in any of them.
The natural way to think this would look like would be something like this:
However, using the parent_type attribute to set up the primary key seems ambiguous to me, because it doesn't really describe clearly to which parent the comment belongs to.
It would seem to me like the clearest way to understand the relationship would be to create an intermediate table for each relationship: posts_comments, posts_entries, posts_news. But then again this doesn't have much sense in terms of database design because you don't really need an intermediate table for a one to many relationship, they are only needed for many to many relationships.
Then maybe a third approach would be to make a different Comment model depending on what it belongs to: PostComment, EntryComment. This would help in understanding better what that comment is for, but I think it is a terrible idea because a comment could just be represented by a single table and you could end up with a bunch of different tables with repeated column while it could just be represented by one table.
So, what is the usual approach for this situation?

I agree that you don't want to create three tables that are almost identical if you can help it. I've seen plenty of databases that do this and it's a pain, because any change is likely to affect all the variations, so you have to make the change three times instead of once, and there are three times as many queries to change. And sooner or later someone will make a change and not know he should make the same change to the other two tables or be in a hurry or forget, and then six months later someone else comes along and wonders if there's a reason why the three tables are subtly different or if this is just carelessness.
But you also don't want to have a column with an ambiguous definition. You especially don't want a foreign key that can refer to different tables depending on the content of a type field. That would be really ugly.
A solution to a similar problem that I used once was to create an intermediate table. In this case, it would be -- I don't know if you have a word that encompasses news, posts, and events, so let me call them all collectively "articles". So we create an article_comments table. It may have no data except an ID. Then news, posts, and events all have a pointer to article_comments, and comments has a pointer to article_comments.
So if you want all the comments for a given news record, it's:
select whatever
from news n
join article_comments ac on ac.iarticle_comments_id=n.article_comments_id
join comments c on c.article_comments_id=ac.article_comments_id
where n.news_id=#nid
Note that with this structure, all FKs are true FKs to a single table. You don't make article_comments point to news, posts, and events; you make news, posts and events point to article_comments, so all the FKs are clean.
Yes, it's an extra table to read, which would slow down queries a bit. But I think that's a price worth paying to keep the FKs clean.
One admittedly clumsy query with this structure would be if you want to know which article a given comment is for if you don't know the type of article. That would have to be:
select whatever
from comment c
join article_comment ac on ac.article_comment_id=c.article_comment_id
left join news n on n.article_comment_id=ac.article_comment_id
left join post p on p.article_comment_id=ac.article_comment_id
left join event e on e.article_comment_id=ac.article_comment_id
where c.comment_id=#cid
and then see which of news, post, and event turns up non-null. (You could also do it with a join to a subquery that's a union, but I think that would be uglier.)

So later. However, you can design model as same below:
#Entity
uid: string
type: string/ (post, new,...)
...
#Comment:
uid: string
entity_id: string
body: String
Other way (only one table)
Entity:
uid: string
parent: string
type: string /post, new, comment, revison,...
metadata: jsonb
Hope useful for anyone!

Related

Are Comments and Posts the same

I'm currently in the process of building another generic Blog style website, and I got to thinking. Where I usually use a separate table for Posts and another for Comments and then join them using FK's. I began to wonder, are they really worthy of separate tables?
For example properties they both share include:
ID (int)
Title (string)
Body (text)
Poster (FK)
Created At (Datetime)
Updated At (DateTime)
Likes/Dislikes(ints)
Etc..
One Post Has(optional) Many Comments,Many Comments have one Post, but also One Comment, may also have many Comments.
Now would it make more sense. For a table to contain both, Comments and Posts, and self reference them from within? Having a separate lookup table for containing what each type of entity is.
Then however, if a Post is a Comment, and a comment is no different to a post. Except for in a view context, should posted Images also be contained within the same table? As these to, can have likes/comments/name etc.
Question in short: Do blog Posts and Comments belong in the same table?

I'm going to offer you a practical answer.
As a rule of thumb, when you have two apparent sub-types with only one or two different predicates then it isn't necessarily helpful to store them in separate physical tables.
Your logical model should make a distinction between posts and comments, because they have different relationships.
For your physical model, you really only have that one different predicate, according to your description. The basic difference between a post and a comment seems to be the foreign key which references a parent post/comment. I'm assuming you would say a post cannot have a child post, whereas you have said that a comment can have a child comment.
With this being the only difference, I would say that for practical purposes your physical tables should combine posts and comments.
How can you decide in general?
In general it's not cut and dry when to physically sub-type your tables. All design is trade-offs. What I look for are the number of different columns between two sub-types, but also I look at what those columns are and how much they might impact my application logic.
Having more than a few different predicates is usually a pretty good sign that you should be sub-typing physically. However, if these columns are just coming along for the ride, as it were, and don't impact your application logic too much, then maybe they should just be nullable columns on a combined table.
On the other hand, maybe there is only one different column between two sub-types, but that column completely changes the way your application behaves. In that case, maybe for the sake of keeping your code cleaner you should physically sub-type for that column alone.

How to design a database table from a form? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm learning how to design databases, and i've been ask to create the table that will hold this form: Medical History I'm learning to use Django/Python i've already made the markup in HTML and CSS, but I don't think that making each question on the form an column would be the best approach. For example in the family history i've thought of making it a separate table, while in the review of systems i want to make each to be a set.

A pragmatic approach is to define tables based on the following criteria:
1) easy to select data from them (not to obtain many JOINs or convoluted queries that require ORs or strings splitting)
2) easy to understand (each concept maps to one table)
=> usually, normalized structures do the trick here
Of course, above are challenged in high transactional environments (INSERTs, UPDATEs, DELETEs).
I would assume then your case has moderate INSERTs, but more SELECTs (reports).
For Family history section I would normalize everything:
DiseaseType
DiseaseTypeId
Code -- use to separate from a name that can change in time
Name -- breast cancer, colon cancer etc.
CollateralOption
CollateralOptionId
Code -- I would put UNIQUE constraints on Codes and Names
Name -- no, yes, father
FamilyHistory
FamilyHistoryId INT PK IDENTITY -- this may be missing, but I prefer if I use an ORM
PatientId -> FK -> Patient
DiseaseTypeId -> FK -> DiseaseType
CollateralOptionId FK -> CollateralOption
Checked BIT -- you may not define this and have records for Checked ones.
-- having this may put some storage pressure
-- but prevent some "stuffing" in the queries
These structures allow to easily COUNT number of patients with colon cancer cases in their family, for example.
Shortly put: if there is not serious reason against it, go for normalized structures.

I don't see any advantage to perform any design tricks on this data structure. Yes, making a boolean attribute of each of your checkboxes, and a string attribute of each of your free texts, will lead to a high number of attributes in one table. But this is just the logical structure of your data. All these attributes are dependent on the key, some person id, (or at least that's what I assume, as a medical layman). Also, I assume that they are independent of each other, i.e. not determined by some other combination of attributes. So they go to the same table. Putting them on several tables won't gain anything, but will force you to do lots of joins if you query on different types of attributes (like all patients whose mother had breast cancer and who now have breast lumps).
I don't know exactly what you mean by making sets of some attributes. Do you mean to have just one attribute, and encode the sequence of boolean values e.g. in one integer, like 5 for yes-no-yes? Again that's not worth the trouble, as it won't save any space or whatever, but will make queries more complicated.
If you are still in doubt, try to formulate the most frequent use cases for those data, which will probably be typical queries on combinations of these attributes. Then we might see whether a different structure would make your life easier.

Database design: Likes table [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 months ago.
Improve this question
I got two tables (for now):
Topic
Post (post is a comment for a topic)
I want to add the option to Like those objects.
so I thought about creating one table of Likes and using enum to indicate which object was liked (including the object's id of course).
by the way, if I choose this option, should it be an enum or another table represent all the objects:
id object_name
1 Topic
2 Post
another option is to create likes table for every object .
what is the best practice to take?

I think creating a separate table for each object is better.
I don't see what you gain if you use only one table. You can't use foreign keys properly also in one table.
I mean you can't add a column object_id to your table, because you do not know the table to which it will point to. In this case you have two add two columns, topic_id and post_id. Always one of the two will be NULL.

Just create another table for the likes:
tbl_posts_likes (likeID, userID, postID, like = 1, unlike = -1)
then you could write a subquery like:
SELECT SUM(like) as likeCount, SUM(unlike)
FROM tbl_posts_likes
GROUP BY postID
WHERE postID= posts.postID

Depending on how you are tracking 'likes', I would recommend adding another table called likes to the following effect:
likes (like_id, like_type)
From this point, you would simply COUNT() the number of 'likes' for each like_type (topic/post) as each time someone likes either a topic or a post, a record would be inserted. However, if you plan to track 'likes' by user, you would need to add another column for user.
If you wanted to track the individual posts or topics, you would set up a table for each object and create a foreign key contraint for the topic or post ID.
topic_likes (tl_id, topic_id)
post_likes (pl_id, post_id)
The design above would create an entry for each like. If you are only concerned with the total number of likes for each object, you could set up something like so:
likes (like_id, like_type, likes)

A table for likes is a overkill. When does anyone need to check who liked who's post. Very rarely. Its better to just maintain a count of likes and unlikes in the post table, and maintain a likes table only for archive and audit purpose. Basically for all regular operations use the counts, for audit needs.
This will prevent doing joins to calculate # of likes. Frankly accuracy of likes and unlikes is not that crucial. 99% accuracy is good enough. And consistency between API calls may cause a issue, that is very rare. Only on high load.
I also find doing operations like SUM on a database server is very costly. JOIN and then SUM, too painful and time consuming. Instead move the operation to the compute. On the API do the ++ or --. That will take the load off the DB for useless operations like this.

What is data normalization? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What exactly does database normalization do?
Can someone please clarify data normalization? What are the different levels? When should I "de-normalize"? Can I over normalize? I have a table with millions of records, and I believe I over-normalized it, but I'm not sure.

If you have million columns you probably under-normalized it.
What normalizing means is that
every non-key attribute "must provide
a fact about the key, the whole key,
and nothing but the key."
If you have a column that depends on anything but the key, you should normalize your table.
see here.
Added to reply to comment:
If you have ProductID | ProductType | ProductTypeID, where ProdcutTypeID depends only on ProductType, you should make a new table for that:
ProductID | ProductTypeID and on the other table: ProductTypeID | ProductTypeName .
So to answer your question, pertaining to Product isn't accurate enough, in my example at the first case, I was pertaining to the Product as well. All columns should pertain only to ProductID (you may say you only describe product, but not describing anything else, even if it's related to product - that's accurate).
Number of rows, generally speaking isn't relevent.

Normalization is about reducing data duplication in a relational database. The most popular level is third normal form (it's the one described by "the key, the whole key, and nothing but the key"), but there are a lot of different levels, see the Wikipedia entry for a list of the main ones. (In practice people seem to think they're doing well to achieve third normal form.) Denormalizing means accepting more data duplication, typically in exchange for better performance.

As others said Database normalization is about reduction of data duplication and more generic data models (that can easily answer to queries unexpected at design time). Normalisation of a database is allow a formal enough process. When you are experimented you mostly follow data analysis methods and get a normalized database at the end.
Normalizing database is usually a good idea, but there is a catch. In many case it involve creation of new tables and JOIN relationships between tables. JOIN is known to have a (very) high performance cost at runtime, henceforth for big volumes of data you may want to denormalize.
Another cost may also be the need to write more complex requests to access to the data you need and that can be a problem for SQL beginners. The best idea is probably to stick with normalization anyway (Third Normal Form is usually enough, as there is several levels of normalization as others said) and to become more skilled with SQL.

When to split up models into multiple database tables? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm working with Ruby on Rails, but this question I think is broader than that and applies to database design generally.
When is it a good idea to split a single model up into multiple tables? For example, assume I have a User model, and the number of fields in the model is really starting to add up. For example, the User can enter his website, his birthday, his time zone, his etc etc.
Is there any advantage or disadvantage to splitting up the model, such that maybe the User table only has basic info like login and email, and then there is another table that every User has that is something like UserInfo, and another that is UserPermissions, and another that is UserPrivacySettings or something like that?
Edit: To add additional gloss on this, most of the fields are rarely accessed, except on pages specific to them. For example, things like birthday are only ever accessed if someone clicks through to a User's profile. Furthermore, some of the fields (which are rarely accessed) have the potential to be extremely large. Most of the fields have the potential to be either set to blank or nil.

Generally it is a good idea to put things which have a one-to-one relationship in the same table. Unless your userbase includes the Queen or Paddington Bear, a user has just one birthday, so that should be an attribute of the USERS table. Things which have a one-to-many relationship should be in separate tables. So, if a user can have multiple privacy settings by all means split them out.
Splitting one table into several tables can make queries more complicated or slower, if we want to retrieve all the user's information at once. On the other hand if we have a set of attributes which is only ever queried or updated in a discrete fashion then having a separate table to hold that data is a sound idea.

This would be a situation for analysis.
When you find that a lot of the fields in such a table are NULLs, and can be grouped together (eg. UserContactInfo), it is time to look at extracting the information to its own table.
You want to avoid having a table with tens/hundreds of fields with only sparsely entered data.
Rather try to group the data logically, and crete the main table containging the fields that are mostly all populated. Then you can create subsets of data, almost as you would represent them on the UI, (Contact Info, Personal Interest, Work Related Info, etc) into seperate tables.

Retrieving a row is more expensive if it has many columns, especially if you usually need just some of the fields. Also, hosting stuff such as the components of an address in a separate class is a case of DRY. On the other hand, if you do need all fields of an object, it takes longer to execute a compound query.
I would normally not bother to distribute classes over several tables just to make the code more readable (i.e. without actually reusable parts like addresses).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight