How do I define a one-to-one relationship over a one-to-many relationship in a relational database? - database

I'm creating a database schema with the following tables (sorry for the bad pseudocode):
User
====
user_id, PK
Collection
==========
collection_id, PK
user_id, FK(User->user_id)
Issue
=====
issue_id, PK
collection_id, FK(Collection->collection_id)
There is a one-to-many relationship from User to Collection, and also from Collection to Issue. So, a single user may maintain multiple collections, each with many issues.
The problem: I would like to designate a "default" collection to be displayed when the user first logs in to the application. For the record, I'm doing this in the Django framework, but I'm more interested in the elegant platform-independent solution. When I try to make a column in User that is a Foreign Key to Collection, it complains that Collection does not exist yet (I suppose because User is created first). I could add a "default" boolean column to Collection and enforce through my application that only one record per User be "true", but that seems inelegant. I could also have a separate table, say, User_Default_Collection, which has user_id as a Foreign, Unique Key, and a collection_id column which is a Foreign Key to Collection. But I'm certain this is also less than 3rd normal form. Any suggestions?

If you want to enforce that every user must and will always have his "default" collection, then because of the obvious cycle in the inclusion dependencies you are forced into either deferred constraint checking (if your DBMS allows the FK cycle to be declared in the first place) or application-enforced integrity.
If you can tolerate users not having any default collection at all, then create a separate table DFT_COLL(userid, dft_coll_id) with key userid and FK's to both USER and COLLECTION.
If it gives you trouble in cases when a user has no default collection, maybe this can still be addressed by having the system just pick one (e.g. the one with the lowest [or highest] id) and implement this with a UNION view (so that if you need the default then you read the UNION view and you're guaranteed (*) to get some result).
(*) If the user has a collection at all, that is. Note that requiring a default collection and requiring that to exist, implies requiring at least one collection per user. (And the corollary of this is that if it must be allowed for a user to have no collection at all, it is nonsensical and a contradiction to require him to have a default one.)

The most plausible solution i think would be:
Add nullable "default" field to Collection table
Create UNIQUE constraint for used_id and default
Keep "true"-s and NULLs (no false's) in "default" column.
This will not allow for multiple Collections associated with the same user_id to have the same "default" value other than null. You don't need to develop any application logic. However, this design would not force you to always have a default collection for a user.

Related

Dealing with 3NF in terms of database domain modelling where attributes are added knowing they create transitive dependency

I am currently working on setting up a database domain model, where in terms of normalization I will be challenged due to transitive dependency. However, for this particular model it is a choice, that we choose to add such transitive dependency for a reason, and I am wondering how you would go about dealing with such cases in the aspect of normalization?
Let me show what I mean:
I have a table called UserSubscription that have the following attributes:
id {dbgenerated}
created
user
price
currency
subscriptionid
The values for:
price
currency
Depend on the subscriptionid, which points to a second table Subscription (in which the subscriptionid is a FK reference to this tables PK). One might say why, would I even consider including duplicate values from the Subscription table into the UserSubscription table? Well the reason is that the Subscription might change at any point in time, and for reference we want to store the original value of the subscription in the UserSubscription so that even if it changes we still have the values that the user signed up for originally.
I know from the perspective of normalization, that this transitive dependency I create should be fixed, and ideally I would move the values back into the subscription table, and just not allow the values to be modified, and instead create a new subscription whenever it is necessary.
But ideally I do not want to create new subscriptions every time something needs to change in those that exist, simply because it is expected these change often - following say market competition values. At the same time for every new subscription created any user will have more to choose from.
This also means that if we no longer want to use a subscription, we would need to: Remove it, and Create a new one. This can be fixed by simply Updating, since we will no longer need the old one anyways.
The above is a school project, I just wonder whether it would ever be "ok" in terms of normalization to choose such approach, when I choose to do so by choice, and to reduce the tasks associated with removing and creating new subscriptions when I expect these would change frequently.
why don't you instead create a M:N table (mapping table) USER_SUBSCRIPTION where you will have the relationships between USER and SUBSCRIPTION ? You can store all values there historically, and don't have to remove/create anything with the change.. it the user decides to opt-out, you only update the flag_active, flag_deleted, flag_dtime_end, whatever works for you...
Here is a simple model for demonstration:
USER
id_user PK
name
... other details
SUBSCRIPTION
id_subscription PK
name
details
flag_active (TRUE|FALSE or 1|0 values)
... other details
USER_SUBSCRIPTION
id_user FK
id_subscription FK
dtime_start -- when the subscription started
dtime_end -- when the subscription ended
flag_valid (T|F or 1|0) -- optional, will give you a quick headsup about active subscriptions ... but this is sort of redundant, for you can get it from the dtime_start vs dtime_end .. up to you
This will give you a very generic (and therefore flexibile / scalable) model to work with users' subscriptions ... no duplications, all handled by FK/PK referential constraints, ... etc

Database theory: best way to have a "flags" table which could apply to many entities?

I'm building a data model for a new project I'm developing. Part of this data model involves entities having "flags".
A flag is simply a boolean value - either an entity has the flag or it does not have the flag. To that end I have a table simply called "flags" that has an ID, a string name, and a description. (An example of a flag might be "is active" or "should be displayed" or "belongs to group".)
So for example, any user in my users table could have none, one, or many flags. So I create a userFlags bridge table with user ID and flag ID. If the table contains a row for the given flag ID and user ID, that user has that flag.
Ok, so now I add another entity - say "section". Each section can also have flags. So I create a sectionFlags table to accommodate this.
Now I have another entity - "content", so again, "contentFlags".
And so on.
My final data model has basically two tables per entity, one to hold the entity and one for flags.
While this certainly works, it seems like there may be a better way to design my model, so I don't have to have so many bridge tables. One idea I had was a master "hasFlags" table with flag ID, item ID and item type. The item type could be an enumerated field only accepting values corresponding to known entities. The only problem there is that my foreign key for the entity will not work because each "item ID" could refer to a different entity. (I have actually used this technique in other data models, and while it certainly works, you lose referential integrity as well as things like cascade updates.)
Or, perhaps my data model is fine as-is and that's just the nature of the beast.
Any more-advanced experienced DB devs care to chime in?
The many-to-many relationships are one way to do it (and possibly faster than what I'm about to suggest because they can use integer key indexes).
The other way to do this is with a polymorphic relationship.
Your entity-to-flag table needs 2 columns as well as the foreign key link to the flag table;
other_key integer not null
other_type varchar(...) not null
And in those fields you store the foreign key of the relation in the integer and the type of the relation in the varchar. Full-on ORMs that support this sometimes store the class name of the foreign relation in the type column, to aid with object loading.
The downside here is that the integer can't be a true foreign key as it will contain duplicates from many tables. It also makes your querying a bit more interesting per-join than the many-to-many tables, but it does allow you to generalise your join in code.

How to have a 1:n relationship with one prominent Member

I'm working on a Program that manages customers and their application packaging requests. I want to store the Information in a MS SQL Database and have different default values depending on the customer, because different customers have a different set of relevant or used values.
My Database has 2 relevant tables: Customer and Application. One Customer can have many applications (1:n Foreign key in Application) But each Customer also has exactly one set of Default values(1:1 Foreign key in Customer)
I could not find anyone who tried something similiar after some research and i have a really bad feeling about these two references. Is there a more elegant way to achieve one outstanding member on the N side of a 1:N relationship?
There are several approaches:
Your customer and the set of defaults is 1:1.
The customer with all other application entities is 1:n
You might put the defaults directly into your customer table (easy and fast but not clean)
You might define two tables with the same structure. One with non-nullable columns to define defaults and bind them 1:1 and the second as 1:n relation (You need a UNION query to put them together)
You might use a marker in your application table to mark the "default" row (You need to make sure, that there is only one marked record)
You might - which seems to be your current approach - set a FK-ID into your customer table to store the ID of the default row.
My approach was: Put a rank column into your application table. You might set the combination of customerID and rank as unique... This makes you able to define one with the lowest rank as the default and - similiar to a cascading stylesheet - you can start with the highest and move backward until you've found one value other than NULL.

Should primary key be constant int?

So in one of my school projects, we have to create a database for a pseudo e-commerce website. The project instructions asks us to implement our database with Boyce–Codd normal form, but I think there're some ambiguity about this normal form.
Let's say that we implement the entity Users like that :
Users(*email, username, password, some_other_fields)
(note: * meens primary key)
First of all, if I understood well the BCNF, this entity isn't BCNF. If the usernames are unique as well as emails, then we can also define this entity like this :
Users(*username, email, password, some_other_fields)
My first question is how to create this entity in Boyce–Codd normal form ?
Then I have another issue with this BCNF form : the missing id. Assuming an user can change his username and his email. The primary key will also be change. My issue is that I don't really have a temporal constant that define an element in my entity. This implies, for example, some issues about logging : assuming we log action from an user with the primary key, if foo#smth.com change his email to foo2#smth.com, we can have this kind of logs :
[foo#smth.com] : action xxx
[foo#smth.com] : action yyy
[foo2#smth.com] : action zzz
Then if we don't catch the email change, all our precedent logs means nothing : we don't know who is foo#smth.com.
Then, my second question is don't you think that using a temporal constant id (an integer for example) is more secure ?
Uniqueness is not enough for BCNF. BCNF stresses on Functional Dependency. That is, whether attributes are dependent on the key functionally.
In this case attributes cannot depend on the email. Emails can be changed, inactive, reclaimed by someone else. Therefore, being unique does not justify it enough to be a candidate key. Username may have a higher dependability if functionality restricts the user name to get changed.
Functional Dependency inherently depends on Functional Design. If the application you are creating the table for assumes that usernames will never be allowed to change, then the attributes can depend on username to be a candidate key. If the functional design does allow the username to be changed, then you need to introduce or combine a key that is both unique and functionally dependable.
In case of introduced additional unique ids, they are not 'inherently ' more 'secure' than username here. But they 'feel' or 'become' secure, because presumably the functionality and functional design do not expect the id to be changed. Again, if your functional design allows that id to be changed, then that will not remain secure. Eventually it all depends on your functionality, requirements, and how your attributes are expected to behave according to that functional spec.
If you must have to consider introducing an ID, being not satisfied with dependability of username, then instead of an int/integer, consider rather a GUID, for many many reasons such as the following:
int/interger are typically periodic, that is, they recycle after a limit of given platform, for example for 16 bit ints limit is 32767 to -32768. GUIDs may reappear too, but the chance is statistically much less significant.
for operations that take place at different subsystems and need to be synched later, non-unique ids may get created. Consider two shops of a chain that can register customer in offline mode, and later synches up at cloud. First store creates customer with an ID of say 3000, and second store does the same. When their data synch, you have to use a different composite key structure to accommodate that. GUIDs having higher chances to be unique, can solve them.

Database design - system default items and custom user items

This question applies to any database table design, where you would have system default items and custom user defaults of the same type (ie user can add his own custom items/settings).
Here is an example of invoicing and paymenttypes, By default an invoice can have payment terms of DueOnReceipt, NET10, NET15, NET30 (this is the default for all users!) therefore you would have two tables "INVOICE" and "PAYMENT_TERM"
INVOICE
Id
...
PaymentTermId
PAYMENT_TERM (System default)
Id
Name
Now what is the best way to allow a user to store their own custom "PaymentTerms" and why? (ie user can use system default payment terms OR user's own custom payment terms that he created/added)
Option 1) Add UserId to PaymentTerm, set userid for the user that has added the custom item and system default userid set to null.
INVOICE
Id
...
PaymentTermId
PaymentTerm
Id
Name
UserId (System Default, UserId=null)
Option 2) Add a flag to Invoice "IsPaymentTermCustom" and Create a custom table "PAYMENT_TERM_CUSTOM"
INVOICE
Id
...
PaymentTermId
PaymentTermCustomId
IsPaymentTermCustom (True for custom, otherwise false for system default)
PaymentTerm
Id
Name
PAYMENT_TERM_CUSTOM
Id
Name
UserId
Now check via SQL query if the user is using a custom payment term or not, if IsPaymentTermCustom=True, it means the user is using custom payment term otherwise its false.
Option 3) ????
...
As a general rule:
Prefer adding columns to adding tables
Prefer adding rows to adding columns
Generally speaking, the considerations are:
Effects of adding a table
Requires the most changes to the app: You're supporting a new kind of "thing"
Requires more complicated SQL: You'll have to join to it somehow
May require changes to other tables to add a foreign key column referencing the new table
Impacts performance because more I/O is needed to join to and read from the new table
Note that I am not saying "never add tables". Just know the costs.
Effects of adding a column
Can be expensive to add a column if the table is large (can take hours for the ALTER TABLE ADD COLUMN to complete and during this time the table wil be locked, effectively bringing your site "down"), but this is a one-time thing
The cost to the project is low: Easy to code/maintain
Usually requires minimal changes to the app - it's a new aspect of a thing, rather than a new thing
Will perform with negligible performance difference. Will not be measurably worse, but may be a lot faster depending on the situation (if having the new column avoids joining or expensive calculations).
Effects of adding rows
Zero: If your data model can handle your new business idea by just adding more rows, that's the best option
(Pedants kindly refrain from making comments such as "there is no such thing as 'zero' impact", or "but there will still be more disk used for more rows" etc - I'm talking about material impact to the DB/project/code)
To answer the question: Option 1 is best (i.e. add a column to the payment option table).
The reasoning is based on the guidelines above and this situation is a good fit for those guidelines.
Further,
I would also store "standard" payment options in the same table, but with a NULL userid; that way you only have to add new payment options when you really have one, rather than for every customer even if they use a standard one.
It also means your invoice table does not need changing, which is a good thing - it means minimal impact to that part of your app.
It seems to me that there are merely "Payment Terms" and "Users". The decision of what are the "Default" payment terms is a business rule, and therefore would be best represented in the business layer of your application.
Assuming that you would like to have a set of pre-defined "default" payment terms present in your application from the start, these would already be present in the payment terms table. However, I would put a reference table in between USERS and PAYMENT TERMS:
USERS:
user-id
user_namde
USER_PAYMENT_TERMS:
userID
payment_term_id
PAYMENT_TERMS:
payment_term_id
payment_term
Your business layer should offer up to the user (or more likely, the administrator) through a GUI the ability to:
Assign 0 to many payment term options to a particular user (some
users may not want one of the defaults to even be available, for
example.
Add custom payment terms, which then become available for assignment to one or more users (but which avoids the creation of duplicate payment terms by different users)
Allows the definition of a custom payment term to be assigned to more than one user (say the user's company a unique payment process which requires all of their users to utilize a payment term other than one of the defaults? Create the custom term once, and assign to all users.
Your application business layer would establish rules governing access to payment terms, which could then be accessed by your user interface.
Your UI would then (again, likely through an administrator function) allow the set up of one or more payment terms in addition to the standards you describe, and then make them available to one or more users through something like a checked list box (for example).
Option 1 is definately better for the following reasons:-
Correctness
You can implement a database constraint for uniqueness of the payment term name
You can implement a foreign key constraint from Invoice to PaymentTerm
Ease of Use
Conducting queries will be much simplier because you will always join from Invoice to PaymentTerm rather than requiring a more complex join. Most of the time when you select you will not care if it is an inbuilt or custom payment term. The optimizer will have an easier time with a normal join instead of one that depends on another column to decide which table to join.
Easier to display a list of PaymentTerms coming from one table
We use Option 1 in our data-model quite alot.
Part of the problem, as I see it, is that different payment terms lead to different calculations, too. If I were still in the welding supply business, I'd want to add "2% 10 NET 30", which would mean 2% discount if the payment is made in full within 10 days, otherwise, net 30."
Setting that issue aside, I think ownership of the payment terms makes sense. Assume that the table of users (not shown) includes the user "system" as, say, user_id 0.
create table payment_terms (
payment_term_id integer primary key,
payment_term_owner_id integer not null references users (user_id),
payment_term_desc varchar(30) not null unique,
);
insert into payment_terms values (1, 0, 'Net 10');
insert into payment_terms values (2, 0, 'Net 15');
...
insert into payment_terms values (5, 1, '2% 10, Net 30');
This keeps foreign keys simple, and it makes it easy to select payment terms at run time for presentation in the user interface.
Be very careful here. You probably want to store the description, not the ID number, with your invoices. (It's unique; you can set a foreign key reference to it.) If you store only the ID number, updating a user's custom description might subtly corrupt all the data that references it.
For example, let's say that the user created a custom payment term number 5, '2% 10, Net 30'. You store the ID number 5 in your table of invoices. Then the user decides that things will be different starting today, and updates that description to '2% 10, Net 20'. Now on all your past invoices, the arithmetic no longer matches the payment terms.
Your auditor will kill you. Twice.
You'll want to prevent ordinary users from deleting rows owned by the system user. There are several ways to do that.
Use a BEFORE DELETE trigger.
Add another table with foreign key references to the rows owned by the system user.
Restrict all access through stored procedures that prevent deleting system rows.
(And flags are almost never the best idea.)
Applying general rules of database design to the problem at hand:
one table for system payment terms
one table for user payment terms
a view of join of the two above
Now you can join invoice on the view of payment terms.
Benefits:
No flag columns
No nulls
You separate system defaults from user data
Things become straight forward for the db

Resources