data model, many2many with many2one relationship - database

I have two types of accounts (customer and provider), I chose the single-table strategy for persistence. Customer creates Orders (one2many) and provider bids on the orders in auction style (many2many relationship, because he can bid on many orders as well as other providers). My question is, is it possible to have these relationships simultaneously ? Because logically it can work. But MDA code generators don't like it. If so, what drawbacks I could come across with this datamodel.
Thanks in advance.

The disadvantage is that you can't enforce referential integrity in the database between the accountID in the accounts table and the accountID in the bids table (which I assume represents the accountID of the provider bidding on the order) because not all accountID values are allowable.
But, don't give up on the single-table solution for accounts, which may well be the correct one for your problem (I can't say for sure not completely understand the relation between providers and customers). Here's what you need to do to both use the single table solution and allow referential integrity:
Remove isProvider and isCustomer from Accounts.
Add two new tables Providers and Customers. Each table will have an accountID column which is both the primary key in that table and a foreign key back to the original account table.
Migrate any additional columns from Accounts that are unique to either Providers or Customers into the appropriate table.
Now, the accountID in the Orders table should be a foreign key into Customers, not Accounts. Similarly, the accountID column in Bids becomes a foreign key into Providers rather than Accounts.
Relational integrity and single-table storage for accounts is provided for.

"I chose the single-table strategy for persistence" - that's actually not that good a reason for combining them, in my opinion. Customers and providers are fundamentally different beasts.
The fact that you're having troubles is a clear indication that you're most likely doing it the wrong way - that's true of most things in the IT industry (and probably life itself but you don't need me proselytising on that).
I would separate them out into different tables to resolve this particular problem.
If you really want part of the data to be shared, you could put the common things in yet another table and reference it from the customers and providers tables.
You may want this if a single entity can be both a customer and provider - in that case, you would want the two different table entries to share the same information (such as balance, reputation and so on).

Related

How to handle data vault hubs with no business key?

We have a project for loading data from and external source into a Data Vault Data Warehouse. The data are salary statements between and employer and an employee.
When starting to modelling this we find two business key the company id of the employer and the social security number (SSN) of the employee. Based on we get two hubs one for the employer and one for the employee. When adding a link between these two hubs we noticed that as there may (will) be more that one salary statement for each combination of employer and employee. This means we can't model this relationship with two hubs and one link.
Logically this could be handled by adding a third salary statement hub. Then we could have a link for all these three hubs. Our problem is that we don't have any business key for the salary statement!
My only thought as a workaround is to generate an artificial business key for the salary statement using company id, SSN and period of the salary statement. This don't really feel right to generate a business key in the Data Warehouse but do we have any other options? Could this maybe be modeled differently with Data Vault?
Any thoughts and ideas highly appreciated.
What you've noticed here is a situation where Data Vault gets really difficult.
You have a situation where each data object, don't have a business key.
The Data Vault architecture needs business keys.
You generally have 4 options.
Having a business object (in this case, a salary statements) without a business key is an anti-pattern. Convince the developer of the salary system to deliver a business key or unique transaction number for each salary statement.
Create a composite key, like you mentioned.
The biggest issue with this approach is: can you be sure that the composite key always is unique?
Let's say you use company id, SSN and period. What if a mistake was made in the salary system and they had to make an extra payment in the same period?
In this situation you would have 2 rows for the same composite key (company id, SSN and period).
Create your own business key.
Write a small program that takes the data from the salary system, and adds its own business key.
This could be as simple as a database table with a primary key, and then use that primary key as a business key.
Don't use Data Vault for this object. If an object don't fit in Data Vault, or if there is another structure that fits the data better, then use that.

What's the best practice for a table that refers n tables with a match in one of them?

I'm working on a database design, and I face a situation where notifications will be sent according to logs in three tables, each log contains different data. NOTIFICATIONS table should then refer these three tables, and I thought of three possible designs, each seems to have flaws in it:
Each log table will have a unique incremented id, and NOTIFICATIONS table will have three different columns as FK's. The main flaw in this design is that I can't create real FK's since two of the three fields will be NULL for each row, and the query will have to "figure out" what kind of data is actually logged in this row.
The log tables will have one unique incremented id for all of them. Then I can make three OUTER JOINS with these tables when I query NOTIFCATIONS, and each row will have exactly one match. This seems at first like a better design, but I will have gaps in each log table and the flaws in option 1 still exist.
Option 1/2 + creating three notifications tables instead of one. This option will require the app to query notifications using UNION ALL.
Which option makes a better practice? Is there another way I didn't think of? Any advice will be appreciated.
I have one solution that sacrifices the referential integrity to help you achieve what you want.
You can keep a GUID data type as the primary key in all three log tables. In the Notification table you just need to add one foreign key column which won't point to any particular table. So only you know that it is a foreign key, SQL Server doesn't and it doesn't enforce referential integrity. In this column you store the GUID of notification. The notification can be in any of the three logs but since the primary key of all three logs is GUID, you can store the key in your Notification table.
Also you add another column in the Notification table to tell which of the three logs this GUID belongs to. Now you can uniquely know which row in the required log table you have to go to in order to find this notification info.
The problem is that you have three separate log tables. Instead you should have had only log table which would have an extra column specifying what kind of logging is it. That way you'd have only one table - referential integrity would have stayed and design would have been simple.
Use one table holding notification ids. Each of the three original tables hold subtypes of notification ids with FKs on their own ids to that table. Search re subtyping/subtables in databases. This is a standard design pattern/idiom.
(There are entities. We group them conceptually. We call the groups kinds or types. We say of a particular entity that it is a whatever kind or type of entity, or even that it "is a" whatever. We can have groups that contain all the entities of another group. Since the larger is a superset of the smaller we say that the larger type is a supertype of the smaller type, and the smaller is a subtype of the larger.)
There are idioms you can use to help constrain your tables declaratively. The main one is to have a subtype tag in the supertype table, and even also in the subtype tables (where each table has only one tag value).
I eventually faced two main options:
Following the last suggestion in this answer.
Choosing a less normalized structure for the database, AKA fake/no FK's. To be precise, in my case it would be my second option above with fake FK's.
I chose option #2 as a DBA whom I consulted enlightened me on the idea that database normalization should be done according to possible structure breakage. In my case, although notifications are created based on logs, these FK's are not necessary for querying the notifications nor for querying the log and the app do not have to ensure this relationship for a proper functioning. Thus, following option #1 may be "over-normalization".
Thanks all for your answers and comments.

SQL Server - database design - one to many OR many to many

I'm wanting advice as to the best way to design my database - it is storing medical data. I have a number of different entities (tables) that may have one or more medications associated with them. These are always 1 to many relationships, and the medications are only ever related to a single entity (ie. they are not shared). The columns for the Medication data are common.
My question is, should I have a single Medication table (and use numerous many-to-many mapping tables) OR should I use multiple Medication tables?
Option 1 - single Medication table:
[table1]1---*[table1_has_medication]*---1[medication]
[table2]1---*[table2_has_medication]*---1[medication]
[table3]1---*[table3_has_medication]*---1[medication]
Option 2 - multiple Medication tables:
[table1]1---*[table1Medication]
[table2]1---*[table2Medication]
[table3]1---*[table3Medication]
Option 1 seems neater as all Medication data is in a single table. However, a Medication is in fact only ever related to a single table so it's not a true many-to-many relationship. Also, I assume I can't support cascaded deletes for many-to-many relationships so I need to be careful of "orphaned" Medication records.
I'm interested in the opinions of experienced database designers. Thank you.
In addition to not representing your requirements accurately, a single many-to-many (aka. "junction" or "link") table has another problem: one FK can only reference one table, so either you'll have to use multiple exclusive FKs, or you'll have to enforce referential integrity yourself, which is harder to do properly than it looks.
All in all, looks like separate medication tables are what you need.
NOTE: That could potentially become a problem if your requirements evolve and you suddenly have to reference all medications from another table. If that happens, consider "inheriting" all medication tables from the common table. Here is an example you can extrapolate from.
Found a suitable answer on DBA stackexchange.
Repeated below:
Relational databases are not built to handle this situation perfectly. You have to decide what is most important to you and then make your trade-offs. You have several goals:
Maintain third normal form
Maintain referential integrity
Maintain the constraint that each account belongs to either a corporation or a natural person.
Preserve the ability to retrieve data simply and directly
The problem is that some of these goals compete with one another.
Sub-Typing Solution
You could choose a sub-typing solution where you create a super-type that incorporates both corporations and persons. This super-type would probably have a compound key of the natural key of the sub-type plus a partitioning attribute (e.g. customer_type). This is fine as far as normalization goes and it allows you to enforce referential integrity as well as the constraint that corporations and persons are mutually exclusive. The problem is that this makes data retrieval more difficult, because you always have to branch based on customer_type when you join account to the account holder. This probably means using UNION and having a lot of repetitive SQL in your query.
Two Foreign Keys Solution
You could choose a solution where you keep two foreign keys in your account table, one to corporation and one to person. This solution also allows you to maintain referential integrity, normalization and mutual exclusivity. It also has the same data retrieval drawback as the sub-typing solution. In fact, this solution is just like the sub-typing solution except that you get to the problem of branching your joining logic "sooner".
Nevertheless, a lot of data modellers would consider this solution inferior to the sub-typing solution because of the way that the mutual exclusivity constraint is enforced. In the sub-typing solution you use keys to enforce the mutual exclusivity. In the two foreign key solution you use a CHECK constraint. I know some people who have an unjustified bias against check constraints. These people would prefer the solution that keeps the constraints in the keys.
"Denormalized" Partitioning Attribute Solution
There is another option where you keep a single foreign key column on the chequing account table and use another column to tell you how to interpret the foreign key column (RoKa's OwnerTypeID column). This essentially eliminates the super-type table in the sub-typing solution by denormalizing the partitioning attribute to the child table. (Note that this is not strictly "denormalization" according to the formal definition, because the partitioning attribute is part of a primary key.) This solution seems quite simple since it avoids having an extra table to do more or less the same thing and it cuts the number of foreign key columns down to one. The problem with this solution is that it doesn't avoid the branching of retrieval logic and what's more, it doesn't allow you to maintain declarative referential integrity. SQL databases don't have the ability to manage a single foreign key column being for one of multiple parent tables.
Shared Primary Key Domain Solution
One way that people sometimes deal with this issue is to use a single pool of IDs so that there is no confusion for any given ID whether it belongs to one sub-type or another. This would probably work pretty naturally in a banking scenario, since you aren't going to issue the same bank account number to both a corporation and a natural person. This has the advantage of avoiding the need for a partitioning attribute. You could do this with or without a super-type table. Using a super-type table allows you to use declarative constraints to enforce uniqueness. Otherwise this would have to be enforced procedurally. This solution is normalized but it won't allow you to maintain declarative referential integrity unless you keep the super-type table. It still does nothing to avoid complex retrieval logic.
You can see therefore that it isn't really possible to have a clean design that follows all of the rules, while at the same time keeping your data retrieval simple. You have to decide where your trade-offs are going to be.

Database normization 2nd form: does field should depend on FK also?

I am struggling to model a scenario and came across a question that, while normalizing table should we consider FK also as key to determine whether a field should be in same table or other table?
For example, I have Users and Teams tables (One user may ZERO or more teams considering different sports).
Owner Teams
----- --------
OwnerID ---PK TeamID ---PK
OwnerName OwnerID ---FK
TeamManager
TeamLogo
If we observe this relation, TeamManager and TeamLogo are completely dependent (functionally) on only TeamID not at all dependent UserID (am I correct in understanding this?). Should we have another table for UserID and TeamID to establish relationship?
Any suggestions would be really helpful.
This is not a home work. I am modeling for a website and improve my knowledge on normal forms to get best scalable database design.
Thank you,
... should we consider FK also as key to determine whether a field should be in same table or other table?
Being a child endpoint of a referential integrity is orthogonal to being a key (i.e. FK child may or may not be a key). The name "foreign key" only refers to the parent endpoint, which is required to be a key (in most DBMSes).
So, in your example, Teams.OwnerID does not have to be a key (and actually isn't, judging on your description).
If we observe this relation, TeamManager and TeamLogo are completely dependent (functionally) on only TeamID not at all dependent UserID (am I correct in understanding this?).
Yes, you are correct.
The Teams is in 3NF because all attributes functionally depend on key, whole key and nothing but the key (so help me Codd ;) ).
Here is why:
Nothing depends on a key subset, so this is 2NF (in fact, there is no "key subset" since key is just one attribute).
As you already noted, TeamManager and TeamLogo do not functionally depend on OwnerID, so you do not have a transitive dependency, so this is 3NF.
Should we have another table for UserID and TeamID to establish relationship?
For modeling a simple 1:N relationship like this: no.
Modeling M:N would be a different matter.
So unless there are some additional details you didn't mention, this model looks nicely normalized to me.
I I do not see a reason to have a third table for UserId and TeamId. Now if you have more infromation for TeamManager I would create a manager table. Is it one user per team? Can a user be a Manager?

what are the advantages of defining a foreign key

What is the advantage of defining a foreign key when working with an MVC framework that handles the relation?
I'm using a relational database with a framework that allows model definitions with relations. Because the foreign keys are defined through the models, it seems like foreign keys are redundant. When it comes to managing the database of an application in development, editing/deleting tables that are using foreign keys is a hassle.
Is there any advantage to using foreign keys that I'm forgoing by dropping the use of them altogether?
Foreign keys with constraints(in some DB engines) give you data integrity on the low level(level of database).
It means you can't physically create a record that doesn't fulfill relation.
It's just a way to be more safe.
It gives you data integrity that's enforced at the database level. This helps guard against possibly error in application logic that might cause invalid data.
If any data manipulation is ever done directly in SQL that bypasses your application logic, it also guards against bad data that breaks those constraints.
An additional side-benefit is that it allows tools to automatically generating database diagrams with relationships inferred from the schema itself. Now in theory all the diagramming should be done before the database is created, but as the database evolves beyond its initial incarnation these diagrams often aren't kept up to date, and the ability to generate a diagram from an existing database is helpful both for reviewing, as well as for explaining the structure to new developers joining a project.
It might be a helpful to disable FKs while the database structure is still in flux, but they're good safeguard to have when the schema is more stabilized.
A foreign key guarantees a matching record exists in a foreign table. Imagine a table called Books that has a FK constraint on a table called Authors. Every book is guaranteed to have an Author.
Now, you can do a query such as:
SELECT B.Title, A.Name FROM Books B
INNER JOIN Authors A ON B.AuthorId = A.AuthorId;
Without the FK constraint, a missing Author row would cause the entire Book row to be dropped, resulting in missing books in your dataset.
Also, with the FK constraint, attempting to delete an author that was referred to by at least one Book would result in an error, rather than corrupting your database.
Whilst they may be a pain when manipulating development/test data, they have saved me a lot of hassle in production.
Think of them as a way to maintain data integrity, especially as a safeguard against orphaned records.
For example, if you had a database relating many PhoneNumber records to a Person, what happens to PhoneNumber records when the Person record is deleted for whatever reason?
They will still exist in the database, but the ID of the Person they relate to will no longer exist in the relevant Person table and you have orphaned records.
Yes, you could write a trigger to delete the PhoneNumber whenever a Person gets removed, but this could get messy if you accidentally delete a Person and need to rollback.
Yes, you may remember to get rid of the PhoneNumber records manually, but what about other developers or methods you write 9 months down the line?
By creating a Foreign Key that ensures any PhoneNumber is related to an existing Person, you both insure against destroying this relationship and also add 'clues' as to the intended data structure.
The main benefits are data integrity and cascading deletes. You can also get a performance gain when they're defined and those fields are properly indexed. For example, you wouldn't be able to create a phone number that didn't belong to a contact, or when you delete the contact you can set it to automatically delete all of their phone numbers. Yes, you can make those connections in your UI or middle tier, but you'll still end up with orphans if someone runs an update directly against the server using SQL rather than your UI. The "hassle" part is just forcing you to consider those connections before you make a bulk change. FKs have saved my bacon many times.

Resources