I've a system, that have two types of users (Companies and individuals).all types have a shared set of properties but they differ in another. What is the best design merge all in one table that allows null for unmatched properties, or separate them in two tables related to a basic table with a one to one relationship.
Thanks.
Performance-wise, this is a matter of tradeoff.
Selecting the properties from another table will require an additional JOIN (which is bad for performance) but keeps the main table smaller (which is good for performance).
JOIN is quite a costly operation, so unless you have, like, 200 of these properties (which will increase the table much), you better keep them in one table.
Separating the tables, however, will make your CHECK and NOT NULL constraints more simple.
For better performance, Use one table for both, allow nulls for different properties and add a type attribute.
The one-one relationship (equvalent to subclassing in OO world) will make your schema more maintainable and wasier to understand, but involves a performance hit. Choose your pill.
Related
First of all a little bit of context:
TableA as ta
TableB as tb
One 'ta' has many 'tb', but one 'tb' can only be owned by one 'ta'.
I'd often just ad a FK to 'tb' pointing to 'ta' and it's done. Now i'm willing to model it differently (to improve it's readability); i want to use a join table, be it named 'ta_tb' and set a PK to 'tb_id', to enforce the 'one-to-many' clause.
Are there any performance issues when using the approach b in spite of approach a?
If this is a clear 1:n-relation (and ever will be!) there is no need (and no advantage) of a new table in between.
Such a joining table you would use to build a m:n-relation.
There could be one single reason to use a joining table with a 1:n-relation: If you want to manage additional data specifying details of this relation.
HTH
Whenever you normalize your database, there is always a performance hit. If you do a join table (or sometimes referred to as a cross reference) the dbms will need to do work to join the right records.
DBMS's these days do pretty well with creating indexes and reducing these performance hits. It just depends on your situation.
Is it more important to have readability and normalization? Then use a join/xref table.
Is this for a small application that you want to perform well? Just make Table B have a FK to its parent.
If you index correctly, there should be very little performance impact although there will be a very slight extra cost that is likely not noticeable unless your database is maxed out already.
However, if you do this and want to maintain the idea that each id from column b must have one and only 1 a, then you need to make sure to put a unique index on the id for table b in the join table. Later if you need to change that to a many to many relationship, you can remove the index which is certainly easier than changing the database structure. Otherwise this design puts your data integrity at risk.
So the join table might be the structure I woudl recommend if I knew that eventually a many to many relationship was possible. Otherwise, I would probably stick with the simpler structure of the FK in table b.
Yes, at least you will need one more join to access TableB fields and this will impact the performance. (there is a question regarding this here When and why are database joins expensive?)
Also relations that uses a join table is known as many to many, in your case you have a PK on the middle table "making" this relation one to many, and this is less readable than the approach A.
Keep it simple as possible, the approach A is perfect for one to many relationships.
I'm wanting advice as to the best way to design my database - it is storing medical data. I have a number of different entities (tables) that may have one or more medications associated with them. These are always 1 to many relationships, and the medications are only ever related to a single entity (ie. they are not shared). The columns for the Medication data are common.
My question is, should I have a single Medication table (and use numerous many-to-many mapping tables) OR should I use multiple Medication tables?
Option 1 - single Medication table:
[table1]1---*[table1_has_medication]*---1[medication]
[table2]1---*[table2_has_medication]*---1[medication]
[table3]1---*[table3_has_medication]*---1[medication]
Option 2 - multiple Medication tables:
[table1]1---*[table1Medication]
[table2]1---*[table2Medication]
[table3]1---*[table3Medication]
Option 1 seems neater as all Medication data is in a single table. However, a Medication is in fact only ever related to a single table so it's not a true many-to-many relationship. Also, I assume I can't support cascaded deletes for many-to-many relationships so I need to be careful of "orphaned" Medication records.
I'm interested in the opinions of experienced database designers. Thank you.
In addition to not representing your requirements accurately, a single many-to-many (aka. "junction" or "link") table has another problem: one FK can only reference one table, so either you'll have to use multiple exclusive FKs, or you'll have to enforce referential integrity yourself, which is harder to do properly than it looks.
All in all, looks like separate medication tables are what you need.
NOTE: That could potentially become a problem if your requirements evolve and you suddenly have to reference all medications from another table. If that happens, consider "inheriting" all medication tables from the common table. Here is an example you can extrapolate from.
Found a suitable answer on DBA stackexchange.
Repeated below:
Relational databases are not built to handle this situation perfectly. You have to decide what is most important to you and then make your trade-offs. You have several goals:
Maintain third normal form
Maintain referential integrity
Maintain the constraint that each account belongs to either a corporation or a natural person.
Preserve the ability to retrieve data simply and directly
The problem is that some of these goals compete with one another.
Sub-Typing Solution
You could choose a sub-typing solution where you create a super-type that incorporates both corporations and persons. This super-type would probably have a compound key of the natural key of the sub-type plus a partitioning attribute (e.g. customer_type). This is fine as far as normalization goes and it allows you to enforce referential integrity as well as the constraint that corporations and persons are mutually exclusive. The problem is that this makes data retrieval more difficult, because you always have to branch based on customer_type when you join account to the account holder. This probably means using UNION and having a lot of repetitive SQL in your query.
Two Foreign Keys Solution
You could choose a solution where you keep two foreign keys in your account table, one to corporation and one to person. This solution also allows you to maintain referential integrity, normalization and mutual exclusivity. It also has the same data retrieval drawback as the sub-typing solution. In fact, this solution is just like the sub-typing solution except that you get to the problem of branching your joining logic "sooner".
Nevertheless, a lot of data modellers would consider this solution inferior to the sub-typing solution because of the way that the mutual exclusivity constraint is enforced. In the sub-typing solution you use keys to enforce the mutual exclusivity. In the two foreign key solution you use a CHECK constraint. I know some people who have an unjustified bias against check constraints. These people would prefer the solution that keeps the constraints in the keys.
"Denormalized" Partitioning Attribute Solution
There is another option where you keep a single foreign key column on the chequing account table and use another column to tell you how to interpret the foreign key column (RoKa's OwnerTypeID column). This essentially eliminates the super-type table in the sub-typing solution by denormalizing the partitioning attribute to the child table. (Note that this is not strictly "denormalization" according to the formal definition, because the partitioning attribute is part of a primary key.) This solution seems quite simple since it avoids having an extra table to do more or less the same thing and it cuts the number of foreign key columns down to one. The problem with this solution is that it doesn't avoid the branching of retrieval logic and what's more, it doesn't allow you to maintain declarative referential integrity. SQL databases don't have the ability to manage a single foreign key column being for one of multiple parent tables.
Shared Primary Key Domain Solution
One way that people sometimes deal with this issue is to use a single pool of IDs so that there is no confusion for any given ID whether it belongs to one sub-type or another. This would probably work pretty naturally in a banking scenario, since you aren't going to issue the same bank account number to both a corporation and a natural person. This has the advantage of avoiding the need for a partitioning attribute. You could do this with or without a super-type table. Using a super-type table allows you to use declarative constraints to enforce uniqueness. Otherwise this would have to be enforced procedurally. This solution is normalized but it won't allow you to maintain declarative referential integrity unless you keep the super-type table. It still does nothing to avoid complex retrieval logic.
You can see therefore that it isn't really possible to have a clean design that follows all of the rules, while at the same time keeping your data retrieval simple. You have to decide where your trade-offs are going to be.
I have a scenario wherein I have users uploading their custom preferences to the conversion being done by our utility. So, the problem we are facing is
should we keep these custom mappings in one table with an extra column of UID or maybe GUID
,
or
should we create tables on the fly for every user that registers with us and provide him/her a separate table for their custom mappings altogether?
We're more inclined to having just an extra column and one table for the custom mappings, but does the redundant data in that one column have any effect on performance? On the other hand, logic dictates us to normalize our tables and have separate tables. I'm rather confused as to what to do.
Common practice is to use a single table with a user id. If there are commonly reused preferences, you can instead refactor these out into a separate table and reference them in one go, but that is up to you.
Why is the creating one table per user a bad idea? Well, I don't think you want one million tables in your database if you end up with one million users. Doing it this way will also guarantee your queries will be unwieldy at best, often slower than they ought to be, and often dynamically generated and EXECed - usually a last resort option and not necessarily safe.
I inherited a system where the original developers had chosen to go with the one table per client approach. We are now rewriting the system! From a dev perspective, you would end up with a tangled mess of dynamic SQL and looping. The DBAs will not be happy because they will wake up to an entirely different database every morning, which makes performance tuning and maintenance impossible.
You could have an additional BLOB or XML column in the users table, storing the preferences. This is called the property bag approach, but I would not recommend this either, as it will hamper query performance in many cases.
The best approach in this scenario is to solve the problem with normalisation. A separate table for the properties, joined to the users table with a PK/FK relationship. I would not recommend using a GUID. This GUiD would likely end-up as the PK, meaning you will have a 16-byte Clustered Index, and that will also be carried through to all non-clustered indexes on the table. You would probably also be building a non-cluster index on this in the Users table. Also, you could run the risk of falling into the pitfall of generating these GUIDs with NEW_ID() rather than NEW_SEQUENTIALID() which will lead to massive fragmentation.
If you take the normalisation approach and you have concerns about returning the results across the two tables in a tabular format, then you can simply use a PIVOT operator.
Suppose I have the following tables in my database:
Now all my queries depend on Company table. Is it a bad practice to give every other table a (redundant) relationships to the Company table to simplify my sql queries?
Edit 1: Background is a usage problem with a framework. See Django: limiting model data.
Edit 2: No tuple would change his company.
Edit 3: I don't write the mysql queries. I use a abstraction layer (django).
It is bad practice because your redundant data has to be updated independently and therefore redundantly. A process that is fraught with potential for error. (Even automatic cascading has to be assigned and maintained separately)
By introducing this relation you effectively denormalize your database. Denormalization is sometimes necessary for the sake of performance but from your question it sounds like you're just simplifying your SQL.
Use other mechanisms to abstract the complexity of your database: Views, Stored Procs, UDFs
What you are asking is whether to violate Third Normal Form in your design. Doing so is not something to be done without good reason because by creating redundancy you create the possibility for errors and inconsistencies in your data. Also, "simplifying" the model with redundant data to support some operations is likely to complicate other operations. Also, constraints and other data access logic will likely need to be duplicated unnecessarily.
Is it a bad practice to give every other table a (redundant) relation to the Company table to simplify my sql queries?
Yes, absolutely, as it would mean updating every redundant relation when you update the relations customer to company or section to company -- and if you miss any such update, you now have a database full of redundant data. It's a bad denormalization.
If your point is to just simplify your SQL, consider using views to "bring along" parent data. Here's a view that pulls company_id into contract, by join through customer:
create view contract_customer as
select
a.*,
b.contract_id, b.company_id
from
contract a
join customer b on (a.customer_id = b.customer_id);
This join is simple, but why repeat it over and over? Write it once, and then use the view in other queries.
Many (but not all) RDBMSes can even optimize out the join if you don't put any columns from customer in the select list or where clause of the query based on the view, as long as you make contract.customer_id have a foreign key referential integrity constraint on customer.customer_id. (In the absence of such a constraint, the join can't be omitted, because it would then be possible for a contract.customer_id to exist which did not exist in customer. Since you'll never want that, you'll add the foreign key constraint.)
Using the view achieves what you want, without the time overhead of having to update the child tables, without the space overhead of making child rows wider by adding the redundant column (and this really begins to matter when you have many rows, as the wider the row, the fewer rows can fit into memory at once), and most importantly, without the possibility of inconsistent data when the parent is updated but the children are not.
If you really need to simplify things, this is where a View (or multiple views) would come in handy.
Having a column for the company in your employee view would not be poorly normalized providing it is derived from a join on section.
If you mean add a Company column to every table, it's a bad idea. It'll increase the potential for data integrity issues (i.e. it gets changed in one table but not the other 6 where it should).
I'd say not in the OP's case, but sometimes it's useful (just like goto ;).
An anecdote:
I'm working with a database where most tables have a foreign key pointing to a root table for the accounts. The account numbers are external to the database and aren't allowed to be changed once issued. So there is no danger of changing the account numbers and failing to update all references in the DB. I also find that it is also considerably easier to grab data from tables keyed by account number instead of having to do complex and costly joins up the hierarchy to get to the root account table. But in my case, we don't have so much a foreign key as an external (i.e., real world) identifier, so it's not quite the same as the OP's situation and seems suitable for an exception.
That depends on your functional requirements for 'Performance'. Is your application going to handle heavy demand? Simplifying JOINS boasts performance. Besides hardware is cheap and turn-around time is important.
The more deeper you go in database normal forms - you save space but heavy on computation
At my job, we have pseudo-standard of creating one table to hold the "standard" information for an entity, and a second table, named like 'TableNameDetails', which holds optional data elements. On average, for every row in the main table will have about 8-10 detail rows in it.
My question is: What kind of performance impacts does this have over adding these details as additional nullable columns on the main table?
8-10 detail rows or 8-10 detail columns ?
If its rows, then you're mixing apples and oranges as a one-to-many relationship cannot be flatten out into columns.
If is columns, then you're talking vertical partitioning. For large and very large tables, moving seldom referenced columns into Extra or Details tables (ie partition the columns vertically into 'hot' and 'cold' tables) can have significant and event huge performance benefits. Narrower table means higher density of data per page, in turn means less pages needed for frequent queries, less IO, better cache efficiency, all goodness.
Mileage may vary, depending on the average width of the 'details' columns and how 'seldom' the columns are accessed.
I'm with Remus on all the "depends", but would just add that after choosing this design for a table/entity, you must also have a good process for determining what is "standard" and what is "details" for an entity.
Misplacing something as a detail which should be standard is probably the worst thing. Because you can't require a row to exist as easily as requiring a column to exist (big complex trigger code). Setting a default on a type of row is a lot harder (big complex constraint code). And indexing is also not easy either (sparse index, maybe?).
Misplacing something as a standard which should be a detail is less of a mistake, just taking up extra row space and potentially not being able to have a meaningful default.
If your details are very weakly structured, you could consider using an XML column for the "details" and still be able to query them using XPath/XQuery.
As a general rule, I would not use this pattern for every entity table, but only entity tables which have certain requirements and usage patterns which fit this solution's benefits well.
Is your details table an entity value table? In that case, yes you are asking for performance problems.
What you are describing is an Entity-Attribute-Value design. They have their place in the world, but they should be avoided like the plague unless absolutely necessary. The analogy I always give is that they are like drugs: in small quantities and in select circumstances they can be beneficial. Too much will kill you. Their performance will be awful and will not scale and you will not get any sort of data integrity on the values since they are all stored as strings.
So, the short answer to your question: if you never need to query for specific values nor ever need to make a columnar report of a given entity's attributes nor care about data integrity nor ever do anything other than spit the entire wad of data for an entity out as a list, they're fine. If you need to actually use them however, whatever query you write will not be efficient.
You are mixing two different data models - a domain specific one for the "standard" and a key/value one for the "extended" information.
I dislike key/value tables except when absolutely required. They run counter to the concept of an SQL database and generally represent an attempt to shoehorn object data into a data store that can't conveniently handle it.
If some of the extended information is very often NULL you can split that column off into a separate table. But if you do this to two different columns, put them in separate tables, not the same table.