I am currently working on setting up a database domain model, where in terms of normalization I will be challenged due to transitive dependency. However, for this particular model it is a choice, that we choose to add such transitive dependency for a reason, and I am wondering how you would go about dealing with such cases in the aspect of normalization?
Let me show what I mean:
I have a table called UserSubscription that have the following attributes:
id {dbgenerated}
The values for:
Depend on the subscriptionid, which points to a second table Subscription (in which the subscriptionid is a FK reference to this tables PK). One might say why, would I even consider including duplicate values from the Subscription table into the UserSubscription table? Well the reason is that the Subscription might change at any point in time, and for reference we want to store the original value of the subscription in the UserSubscription so that even if it changes we still have the values that the user signed up for originally.
I know from the perspective of normalization, that this transitive dependency I create should be fixed, and ideally I would move the values back into the subscription table, and just not allow the values to be modified, and instead create a new subscription whenever it is necessary.
But ideally I do not want to create new subscriptions every time something needs to change in those that exist, simply because it is expected these change often - following say market competition values. At the same time for every new subscription created any user will have more to choose from.
This also means that if we no longer want to use a subscription, we would need to: Remove it, and Create a new one. This can be fixed by simply Updating, since we will no longer need the old one anyways.
The above is a school project, I just wonder whether it would ever be "ok" in terms of normalization to choose such approach, when I choose to do so by choice, and to reduce the tasks associated with removing and creating new subscriptions when I expect these would change frequently.

why don't you instead create a M:N table (mapping table) USER_SUBSCRIPTION where you will have the relationships between USER and SUBSCRIPTION ? You can store all values there historically, and don't have to remove/create anything with the change.. it the user decides to opt-out, you only update the flag_active, flag_deleted, flag_dtime_end, whatever works for you...
Here is a simple model for demonstration:
id_user PK
... other details
id_subscription PK
flag_active (TRUE|FALSE or 1|0 values)
... other details
id_user FK
id_subscription FK
dtime_start -- when the subscription started
dtime_end -- when the subscription ended
flag_valid (T|F or 1|0) -- optional, will give you a quick headsup about active subscriptions ... but this is sort of redundant, for you can get it from the dtime_start vs dtime_end .. up to you
This will give you a very generic (and therefore flexibile / scalable) model to work with users' subscriptions ... no duplications, all handled by FK/PK referential constraints, ... etc


How do I define a one-to-one relationship over a one-to-many relationship in a relational database?

I'm creating a database schema with the following tables (sorry for the bad pseudocode):
user_id, PK
collection_id, PK
user_id, FK(User->user_id)
issue_id, PK
collection_id, FK(Collection->collection_id)
There is a one-to-many relationship from User to Collection, and also from Collection to Issue. So, a single user may maintain multiple collections, each with many issues.
The problem: I would like to designate a "default" collection to be displayed when the user first logs in to the application. For the record, I'm doing this in the Django framework, but I'm more interested in the elegant platform-independent solution. When I try to make a column in User that is a Foreign Key to Collection, it complains that Collection does not exist yet (I suppose because User is created first). I could add a "default" boolean column to Collection and enforce through my application that only one record per User be "true", but that seems inelegant. I could also have a separate table, say, User_Default_Collection, which has user_id as a Foreign, Unique Key, and a collection_id column which is a Foreign Key to Collection. But I'm certain this is also less than 3rd normal form. Any suggestions?
If you want to enforce that every user must and will always have his "default" collection, then because of the obvious cycle in the inclusion dependencies you are forced into either deferred constraint checking (if your DBMS allows the FK cycle to be declared in the first place) or application-enforced integrity.
If you can tolerate users not having any default collection at all, then create a separate table DFT_COLL(userid, dft_coll_id) with key userid and FK's to both USER and COLLECTION.
If it gives you trouble in cases when a user has no default collection, maybe this can still be addressed by having the system just pick one (e.g. the one with the lowest [or highest] id) and implement this with a UNION view (so that if you need the default then you read the UNION view and you're guaranteed (*) to get some result).
(*) If the user has a collection at all, that is. Note that requiring a default collection and requiring that to exist, implies requiring at least one collection per user. (And the corollary of this is that if it must be allowed for a user to have no collection at all, it is nonsensical and a contradiction to require him to have a default one.)
The most plausible solution i think would be:
Add nullable "default" field to Collection table
Create UNIQUE constraint for used_id and default
Keep "true"-s and NULLs (no false's) in "default" column.
This will not allow for multiple Collections associated with the same user_id to have the same "default" value other than null. You don't need to develop any application logic. However, this design would not force you to always have a default collection for a user.

Database design - system default items and custom user items

This question applies to any database table design, where you would have system default items and custom user defaults of the same type (ie user can add his own custom items/settings).
Here is an example of invoicing and paymenttypes, By default an invoice can have payment terms of DueOnReceipt, NET10, NET15, NET30 (this is the default for all users!) therefore you would have two tables "INVOICE" and "PAYMENT_TERM"
PAYMENT_TERM (System default)
Now what is the best way to allow a user to store their own custom "PaymentTerms" and why? (ie user can use system default payment terms OR user's own custom payment terms that he created/added)
Option 1) Add UserId to PaymentTerm, set userid for the user that has added the custom item and system default userid set to null.
UserId (System Default, UserId=null)
Option 2) Add a flag to Invoice "IsPaymentTermCustom" and Create a custom table "PAYMENT_TERM_CUSTOM"
IsPaymentTermCustom (True for custom, otherwise false for system default)
Now check via SQL query if the user is using a custom payment term or not, if IsPaymentTermCustom=True, it means the user is using custom payment term otherwise its false.
Option 3) ????
As a general rule:
Prefer adding columns to adding tables
Prefer adding rows to adding columns
Generally speaking, the considerations are:
Effects of adding a table
Requires the most changes to the app: You're supporting a new kind of "thing"
Requires more complicated SQL: You'll have to join to it somehow
May require changes to other tables to add a foreign key column referencing the new table
Impacts performance because more I/O is needed to join to and read from the new table
Note that I am not saying "never add tables". Just know the costs.
Effects of adding a column
Can be expensive to add a column if the table is large (can take hours for the ALTER TABLE ADD COLUMN to complete and during this time the table wil be locked, effectively bringing your site "down"), but this is a one-time thing
The cost to the project is low: Easy to code/maintain
Usually requires minimal changes to the app - it's a new aspect of a thing, rather than a new thing
Will perform with negligible performance difference. Will not be measurably worse, but may be a lot faster depending on the situation (if having the new column avoids joining or expensive calculations).
Effects of adding rows
Zero: If your data model can handle your new business idea by just adding more rows, that's the best option
(Pedants kindly refrain from making comments such as "there is no such thing as 'zero' impact", or "but there will still be more disk used for more rows" etc - I'm talking about material impact to the DB/project/code)
To answer the question: Option 1 is best (i.e. add a column to the payment option table).
The reasoning is based on the guidelines above and this situation is a good fit for those guidelines.
I would also store "standard" payment options in the same table, but with a NULL userid; that way you only have to add new payment options when you really have one, rather than for every customer even if they use a standard one.
It also means your invoice table does not need changing, which is a good thing - it means minimal impact to that part of your app.
It seems to me that there are merely "Payment Terms" and "Users". The decision of what are the "Default" payment terms is a business rule, and therefore would be best represented in the business layer of your application.
Assuming that you would like to have a set of pre-defined "default" payment terms present in your application from the start, these would already be present in the payment terms table. However, I would put a reference table in between USERS and PAYMENT TERMS:
Your business layer should offer up to the user (or more likely, the administrator) through a GUI the ability to:
Assign 0 to many payment term options to a particular user (some
users may not want one of the defaults to even be available, for
Add custom payment terms, which then become available for assignment to one or more users (but which avoids the creation of duplicate payment terms by different users)
Allows the definition of a custom payment term to be assigned to more than one user (say the user's company a unique payment process which requires all of their users to utilize a payment term other than one of the defaults? Create the custom term once, and assign to all users.
Your application business layer would establish rules governing access to payment terms, which could then be accessed by your user interface.
Your UI would then (again, likely through an administrator function) allow the set up of one or more payment terms in addition to the standards you describe, and then make them available to one or more users through something like a checked list box (for example).
Option 1 is definately better for the following reasons:-
You can implement a database constraint for uniqueness of the payment term name
You can implement a foreign key constraint from Invoice to PaymentTerm
Ease of Use
Conducting queries will be much simplier because you will always join from Invoice to PaymentTerm rather than requiring a more complex join. Most of the time when you select you will not care if it is an inbuilt or custom payment term. The optimizer will have an easier time with a normal join instead of one that depends on another column to decide which table to join.
Easier to display a list of PaymentTerms coming from one table
We use Option 1 in our data-model quite alot.
Part of the problem, as I see it, is that different payment terms lead to different calculations, too. If I were still in the welding supply business, I'd want to add "2% 10 NET 30", which would mean 2% discount if the payment is made in full within 10 days, otherwise, net 30."
Setting that issue aside, I think ownership of the payment terms makes sense. Assume that the table of users (not shown) includes the user "system" as, say, user_id 0.
create table payment_terms (
payment_term_id integer primary key,
payment_term_owner_id integer not null references users (user_id),
payment_term_desc varchar(30) not null unique,
insert into payment_terms values (1, 0, 'Net 10');
insert into payment_terms values (2, 0, 'Net 15');
insert into payment_terms values (5, 1, '2% 10, Net 30');
This keeps foreign keys simple, and it makes it easy to select payment terms at run time for presentation in the user interface.
Be very careful here. You probably want to store the description, not the ID number, with your invoices. (It's unique; you can set a foreign key reference to it.) If you store only the ID number, updating a user's custom description might subtly corrupt all the data that references it.
For example, let's say that the user created a custom payment term number 5, '2% 10, Net 30'. You store the ID number 5 in your table of invoices. Then the user decides that things will be different starting today, and updates that description to '2% 10, Net 20'. Now on all your past invoices, the arithmetic no longer matches the payment terms.
Your auditor will kill you. Twice.
You'll want to prevent ordinary users from deleting rows owned by the system user. There are several ways to do that.
Use a BEFORE DELETE trigger.
Add another table with foreign key references to the rows owned by the system user.
Restrict all access through stored procedures that prevent deleting system rows.
(And flags are almost never the best idea.)
Applying general rules of database design to the problem at hand:
one table for system payment terms
one table for user payment terms
a view of join of the two above
Now you can join invoice on the view of payment terms.
No flag columns
No nulls
You separate system defaults from user data
Things become straight forward for the db

What would you do to avoid conflicting data in this database schema?

I'm working on a multi-user internet database-driven website with SQL Server 2008 / LinqToSQL / custom-made repositories as the DAL. I have run across a normalization problem which can lead to an inconsistent database state if exploited correctly and I am wondering how to deal with the problem.
The problem: Several different companies have access to my website. They should be able to track their Projects and Clients at my website. Some (but not all) of the projects should be assignable to clients.
This results in the following database schema:
CompanyID (not nullable)
CompanyID (not nullable)
ClientID (nullable)
This leads to the following relationships:
Companies-Clients (1:n)
Companies-Projects (1:n)
Now, if a user is malicious, he might for example insert a Project with his own CompanyID, but with a ClientID belonging to another user, leaving the database in an inconsistent state.
The problem occurs in a similar fashion all over my database schema, so I'd like to solve this in a generic way if any possible. I had the following two ideas:
Check for database writes that might lead to inconsistencies in the DAL. This would be generic, but requires some additional database queries before an update and create queries are performed, so it will result in less performance.
Create an additional table for the clients-Projects relationship and make sure the relationships created this way are consistent. This also requires some additional select queries, but far less than in the first case. On the other hand it is not generic, so it is easier to miss something in the long run, especially when adding more tables / dependencies to the database.
What would you do? Is there any better solution I missed?
Edit: You might wonder why the Projects table has a CompanyID. This is because I want users to be able to add projects with and without clients. I need to keep track of which company (and therefore which website user) a clientless project belongs to, which is why a project needs a CompanyID.
I'd go with with the latter, having one or more tables that define the allowable relationships between entities.
Note, there's no circularity in the references you have, so the title is misleading.
What you have is the possibility of conflicting data, that's different.
Why do you have "CompanyID" in the project table? The ID of the company involved is implicitly given by the client you link to. You don't need it.
Remove that column and you've removed your problem.
Additionally, what is the purpose of the "name" column in the client table? Can you have a client with one name, differing from the name of the company?
Or is "client" the person at that company?
Edit: Ok with the clarification about projects without companies, I would separate out the references, but you're not going to get rid of the problem you're describing without constraints that prevent multiple references being made.
A simple constraint for your existing tables would be that not both the CompanyID and ClientID fields of the project row could be non-null at the same time.
If you want to use the table like this and avoid the all the new queries just put triggers on the table and when user tries to insert row with wrong data the trigger with stop him.
Best Regards,
My first thought would be to create a special client record for each company with name "No client". Then eliminate the CompanyId from the Project table, and if a project has no client, use the "No client" record rather than a "normal" client record. If processing of such no-client's is special, add a flag to the no-client record to explicitly identify it. (I'd hate to rely on the name being "No Client" or something like that -- too fuzzy.)
Then there would be no way to store inconsistent data so the problem would go away.
In the end I implemented a completely generic solution which solves my problem without much runtime overhead and without requiring any changes to the database. I'll describe it here in case someone else has the same problem.
First off, the approach only works because the only table that other tables are referencing through multiple paths is the Companies table. Since this is the case in my database, I only have to check whether all n:1 referenced entities of each entity that is to be created / updated / deleted are referencing the same company (or no company at all).
I am enforcing this by deriving all of my Linq entities from one of the following types:
SingleReferenceEntityBase - The norm. Only checks (via reflection) if there really is only one reference (no matter if transitive or intransitive) to the Companies table. If this is the case, the references to the companies table cannot become inconsistent.
MultiReferenceEntityBase - For special cases such as the Projects table above. Asks all directly referenced entities what company ID they are referencing. Raises an exception if there is an inconsistency. This costs me a few select queries per CRUD operation, but since MultiReferenceEntities are much rarer than SingleReferenceEntities, this is negligible.
Both of these types implement a "CheckReferences" and I am calling it whenever the linq entity is written to the database by partially implementing the OnValidate(System.Data.Linq.ChangeAction action) method which is automatically generated for all Linq entities.

How to separate automatically populated tables from manually populated tables, properly, in SQL Server?

Lets say I have the following 2 tables in a database:
[Movies] (Scheme: Automatic)
[Comments] (Scheme: Manual)
The "Movies" table gets updated by a service every 10 minutes and the "Comments" table gets updated manually by the users of the database.
Normally you'd just create a simple foreign-key relationship between the two tables with cascading updates and deletes but in this case I want to be able to keep the manually entered data even if the movie it refers to gets deleted (the update service isn't that reliable). This should only be a problem in one-to-many releationships from an automatic table to a manual table. How would you separate the manual and the automatically populated parts of the database?
I was planning to add a foreign-key that isn't maintaining referencial integrity and only cascades updates, not deletions. But are there any pitfalls I should be aware of by doing it this way? I mean, except the fact that I might end up with some of the manual data that doesn't actually reference anything.
Edit / Clarification:
Just to clarify. The example tables are totally made up. In reality the DB will contain objects like servers, applications, application notes, versions numbers etc. Server related information will be populated automatically but some application details will be filled in manually. It could be information like special configurations and such. Even if the server record gets deleted the application notes on that server are still valuable and shouldn't be deleted.
I'd suggest you use an import table that gets updated by the service and then populate the movies tables from that. Then you get to keep movies that are deleted in the movies table. Possible tagging them as deleted or obsolete, but you'd still be able to keep them for historical purposes.
I think you should use a soft delete for that scenario. I don't think you want to have comments you don't know which movie they belong to.
Agree; an example route would be to copy the movies table and add a status field which indicates each record's present state (live/checking/deleted). Then the autoimport should go into a temporary table, set the status of all movies to 'checking', then use the temporary table to update the real movies table, setting the movie status to live when it's found in the temporary table. Once complete, set any movie which still has a status of 'checking' to deleted, since they weren't found in the autoimport. At the application end, select any movie which doesn't have status = deleted.
"I was planning to add a foreign-key
that isn't maintaining referencial
integrity and only cascades updates,
not deletions."
Since you appear to be using surrogate keys, updates will not be relevant to foreign elements. Additionally, since you do not care about orphaning data, then why use the referential constraint at all? You use constraints to ensure that something exists, which you do not appear to require in this situation.

DB Schema Organization

I'm currently in the planning phase of building a scheduling web app (for volunteer staffing of events), and I've got a question for those with more experience.
There's a calendar of events, and any user at any time can register for any of the events. At a later time, but before that event, one of the admins will step in and select a "Staff List" out of those that registered, and the rest will be put into an "Alternate List".
What I've been thinking so far is that there will be an Event table, a User table, and then three others:
Maps users to events they registered to. Does not imply either the Staff nor the Alt list membership.
Maps users to events they registered to, and also happen to be staffing.
Similar to UserStaff
The question then becomes two part:
Is this a good way to do it?
Should each of those three associative tables have the user id and the event id?
That second question is really the one I'd like to see discussed. That seems like a lot of duplicated material (everything in either UserStaff or UserAlt will always be in UserEvent), so I was thinking of creating a unique key for the UserEvent table, in addition to the composite key, that the other tables (UserStaff and UserAlt) will refer to. On the plus side, there is less duplicated content, on the down side there's an intermediary table (UserEvent) that needs to be referenced in almost every query this way.
Hopefully I've been clear enough, and thanks in advance.
I would have the following tables:
User (UserID, firstname, lastname, etc.)
Event (EventID, Name, Date, Location, Capacity, etc.)
EventRegistration (EventRegistrationID, UserID, EventID, ParticipantTypeID, etc.)
ParticipantType (ParticipantTypeID, Name)
ParticipantType.Name is one of "participant" or "staff".
This seems good, although you might want to consider combining your User - Event association tables into one, and having a column on that table that indicates the purpose of the association, i.e. Event, Staff, or Alt. This would effectively obviate the need for the duplication you describe in the UserEvent tables, since Staff and Alt could be considered to be supersets of Event for most purposes.
One benefit of this approach is that it allows for there to be multiple types of User - Event associations, such as if you have a User who is a Staffer for an Event but not a Participant, or a User who is just an Alt; this approach saves you from having to enumerate all the possible combinations. Now, if your design explicitly specifies that you can only have a certain set of User Participation types, this might introduce a level of dissociation you don't want; you may prefer to have explicit constraints on the set of participation levels that a User may have on an Event. If you don't have that tightly specified set, on the other hand, this system allows for adding more Participation roles easily (and without disturbing existing Participation roles).
Not a direct answer to your question, but here's a site I like. It's got tons (and tons) of sample schema. I generally don't use it as definitive (of course), but sometimes it will give me an idea on something I wasn't thinking of.
