One-To-Many join table to avoid nullable columns [closed] - sql-server

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 10 months ago.
Improve this question
I'M wondering myself whether am I the first programmer struggling with this problem, but i can't find anything in SO about this.
Point of my question, is it a good idea to make a One-To-Many join table, in order to prevent NULL references.
Let's explain, in our business requirements, we have some activities that causes a payment, i.e. sales, loans, rents, services etc. each activity can have zero or one or more payments.
When designing the DB, we have tables for each activity, Sales – Loans – Rents - Services etc, and a Payment table. The relation between the activities and the payments are one to many, each loan can have many payments, and each rent can have many payments.
But there is a problem, each payment can be a loan or a sale or any other activity, we need to relate it to its corresponding activity. I think about two options:
1) Add some Foreign keys in the Payments table for each kind of activity, LoanID - RentID - ServiceID etc. And make them Nullable, due to a loan is neither a service nor a rent.
I personally don't like this solution, it is very error prone, man can very easy forgot to add the matching FK due to it is Nullable, and then we don't know what this payment is about, we lose the Referential integrity. Although it is possible to overcome this problem by creating some constraint to ensure that there are Neither more nor less than one FK, but it is not so easy to create the right constraint and take into account all possible options, and it is hard to recreate the constraint when adding new FK columns.
Needless to say about the ugliness of such a table. Don't speak about the main issue of letting unnecessary nullable columns in a table.
2) A second solution, to create join tables in between for each kind of activity, called ActivityPayments i.e. LoanPayments etc., that holds the activity ID and the payment ID, like Many-To-Many table.
There aren’t the problems described above, each payment is related to its corresponding activity, there are no referential integrity loss, no Nullable columns.
The problem is however that it enlarges the Database, and adds another layer between the tables, and needs more work when joining in queries.
Has someone any idea?

Another option is to create a supertype table, say Activity, with all of the common attributes:
This should keep the number of tables small, and still allow you to identify the activity type for a payment. Note that this assumes that common attributes exist between the different activities. If that is not the case, the second option you listed is probably the way to go.

Look up the following tags in SO.
single-table-inheritance
class-table-inheritance
shared-primary-key
The info tab on these tags gives you a brief explanation, and the questions grouped under the tag will give you some examples.
Single table inheritance is similar you the solution you presented, and that you are unhappy with. Yes, it does involve NULLS. Generally, user errors here are prevented by the application.
Class-table-inheritance is like the solution offered by AMS. Note that SalesID and LoanID are listed as both a PK and an FK. This hints at the technique of shared primary key. With this, SalesID and LoanID are copies of a value in ActivityID. Again, it's the application layer that does the necessary work to mke sure the copies are right.

in this specific case (not necessarily applicable in similiar situations), we usualy calculate dynamically, in a view/function, each payment for what it was (in chronological order)
in other instances we had one sale table where each product can be a physical product or service or any other for-pay offer. so that limits all debit transactions to one tbale
HTA

Related

Is it better to use a field to nominate a Company as a Customer, or to have a related Customer table with probably only one field (FK and PK in one) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
Improve this question
I'm redesigning our service app, and getting rid of some really awful schema problems while I'm at it. Trying to build the replacement with best practices as much as possible.
I'm having a company table rather than just customer, as it's often useful to identify companies that are not customers (suppliers, contractors, etc etc). I'm trying to decide whether it's better to simply include a boolean field represented in the relevant part of the app by a checkbox that identifies relevant companies as customers (which would become uneditable once the customer has services attached to them), or if I should, instead, have a separate table that's basically just a single field referencing the Company ID that is in turn referenced by any child records.
This similar question asks about records that can be one of several subtypes. While the question is materially different (every policy seems to be only one of the potential subtypes, whereas Companies can be any or all of Customer/Supplier/Contractor etc) its similarity combined with the fact that it has multiple conflicting answers raises the possibility that there is no industry-wide consensus, so:
Is there an established best practice here? I'm not immediately seeing any reasons that other fields should be included in the prospective Customer table, but I'm open to the idea that there might... is that a good enough reason to go with B? Or is this a clear YMMV situation, where both options have benefits, either being equally valid?
I should, instead, have a separate table that's basically just a single field referencing the Company ID that is in turn referenced by any child records.
There are probably several attributes that apply to a customer that don't apply to a non-customer Company, so CompanyID probably won't end up being the only attribute of Customer.
So if that's the case, the clear choice is to have a separate Customer table.

Database design, an included attribute vs multiple joins? Confused

So I am taking a class in database design and management and am kind of confused from a design perspective. My example is an invoice system. I just made it up quick so it doesn't have a ton of complexity in it.
There are Customers, Orders, Invoices and Payments entities
Customers
CustId(PK),
Street,
Zip,
City,
..
Orders
OrderID(PK)
CustID(FK)
Date
Amt
....
Invoices
InvoiceID(PK),
OrderID(FK),
Date,
AmtDue,
AmtPaid,
....
Payments
PaymentNo(PK),
InvoiceID(FK),
PayMethod,
Date,
Amt,
...
Customer entity has a one to many relationship with Orders
Purchases entity has a one to many relationship with Invoices
Invoices Entity has a one to many relationship with Payments.
To get the results of a query to list all Payments made by a Customer the query would have to join Payments with the Invoice table, the Invoice table with the Orders table and the Orders table with the Customer table.
Is this the correct way to do it? One could also just put a custID in the payment entity which would then just require one join, but then there is unneeded information in the payment entity. Is this just a design thing or is it a performance issue?
Bonus question. Lets say there should be a report that says what the total customer balance is. Does there need to be a customer balance field in the database or can this be a calculated item that is produced by joining tables and adding up the amount billed vs amount paid?
Thanks!
Is this the correct way to do it?
Yes. Based on the information provided, it looks reasonable.
One could also just put a custID in the payment entity which would then just require one join, but then there is unneeded information in the payment entity. Is this just a design thing or is it a performance issue?
The question you're asking falls under "normal forms", often called normalization. Your target should be Boyce-Codd normal form (similar to 3NF), which should be described in your textbook. I will warn you that misinformation and misuderstanding of database design issues is very abundant on the interwebs, so beware of which answers you pay attention to.
The goal of normalization is to eliminate redundancy, and thus to eliminate "anomaliies", whereby two logically equivalent queries produce inconsistent results. If the same information is kept in two places, and is updated in only one, then two queries against the two different values will produce different -- i.e, inconsistent -- results.
In your example, if there is a Payments.CustID, should I believe that one, or the one derived from joining Payments to Orders? The same goes for total customer balance: do I believe the stored total, or the one I computed from the consituents?
If you are going to "denomalize for performance", as is so often alleged to be necessary, what are you going to do to ensure the redundant values are consistent?
Bonus question. Lets say there should be a report that says what the total customer balance is.
As a matter of fact, in practice balances are sort of a special case. It's often necessary to know the balance at points in time. While it's possible to compute, say, monthy account balances from inception based on transactions, as a practical matter applications usually "draw a line in the sand" and record the balance for future reference. Step are taken -- must be, for the sake of the business -- to ensure the historical information does not change or, if it does, that the recorded balance is updated to reflect the change. From that description alone, you can imagine that the work of enforcing consistency throughout the system is much more work than relying on the DBMS to enforce it. And that is why, insofar as is feasible, it's better to elimate all redundant data, and let the DBMS do the job it was designed to do.
In your analysis, seek Boyce-Codd normal form. Understand your data, eliminate the redundancies, and recognize the relations. Let the DBMS enforce referential integrity. Countless errors will be avoided, and time saved. Only when specific circumstances conspire to show that specific business requirements cannot be satisfied on a particular system with a given, correct design, does one begin the tedious and error-prone work of introducing redundant information and compensating for it with external controls.
"Is this the correct way to do it?" Of course, given your current design. But it's not the ONLY way. So you're studying DB "normalization" and seeing the pros and cons of the various "forms" of normalization. In the "real world" things can change on a dime, due to a management decision or whatever. I tend to use "compound primary keys" instead of simply one field for primary and others as FK. I handle my "FK" programmatically instead of relegating that responsibility to the DB.
I also create and utilize a number of "intermediate" tables, or sometimes "VIEWS", that I use more easily than a bunch of code with too many JOINs. (3rd Normal form addicts can hate, but my code runs faster than a scalded rabbit).
An Order means nothing without a Customer; an Invoice means nothing without an Order; a Payment is great, but means nothing without both an Order and Invoice. So lemme throw this out there -- what's wrong with having a "summary" type of entity that has Cust, Order, Invoice #, and Payment Id ?

How to enforce a one to many relationship when existing tables cannot be altered [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
For background, my situation is I have a database that is missing a lot of foreign key relationships. One in particular, let's call it Orders, which represents orders with a composite primary key of OrderID and LocationID. The other table we'll call OrderDetails which has an OrderID but no LocationID. In reality, it is impossible to have an order in two locations at once, so it was assumed that there was no need to have LocationID in the details table. I didn't design it, and I can't change that.
We also have to work under the assumption there will be no support to add location id to the details table for various reasons. We are also working with Oracle and a high volume database with many concurrent users in many locations. Finally, there will be minimal time to change any applications that use this table.
So my question is: is this solution is feasible, or is there anything else I should try?
Say I create an intersection table, for lack of a better name AllOrders or whatever with primary key OrderID. Now we link Order.OrderID to AllOrders.OrderID and link OrderDetails.OrderID to AllOrders.OrderID. Would it be reasonable then to fill in AllOrders via a trigger on each insert to Orders to enforce the integrity? I am assuming all applications are inserting details after orders or the changes to enforce would be minimal and allowed.
Are there any better solutions? I understand we would do this differently if in charge of designing or given more leeway for fixing, but I'm trying to make the most given the constraints.
Edit --
To clarify what I am looking to accomplish, I want to treat all orders with the same ID as an equivalence class modulo location and ensure that if any order is deleted it requires all orders with the same id deleted and all child order details to be deleted. With primary importance of no orphan details. This has to be done with minimal application changes if possible and no redesign of existing tables if possible.
Create a new table to handle the mapping going forward.
Table: Tb_order_orderdetails
Columns: OrderID, LocationID, OrderDetailsID

Designing a database with several different kinds of products? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
As part of a recent project I have started planning out, I am required to build the structure of a database which will contain several products. As an example, think of the way Amazon is structured. It has several categories and within those categories, several sub-categories.
My problem is that conceptually I am unsure on how to build the database tables. I have thought of creating a self-referencing table for the categories and sub-categories, but since I do plan to have a wide variety of products within the database, I don't know if I should just group them into one table called "Products" or put them all in separate tables.
For example, a toilet would be one product while a television could be another. Even though they have different categories/sub-categories, they are both products. By placing them in one "Products" table, they would share attributes that would make no sense for both of them. A toilet would not need an attribute for resolution or display size(unless it is a very special toilet?) and a television wouldn't need a seat size attribute.
I thought that one to get around this and still keep everything in one table would be to create a bunch of NOT NULL attributes that could be missing for certain items if they weren't necessary, but common sense is telling me that this is probably not the best way to go about things.
So at this point, I feel that my real problem is figuring out how to structure this database and its tables with several categories/sub-categories and different kinds of items. Would I create a table for televisions and a table for toilets? How would this all be structured? How are these sort of problems normally planned out?
Thanks
A generic products table is a good way to go. You're not going to want to create a new table in your schema every time you have a new type of product.
Similar with the categories, a self referencing table is better with a parent/child relationship so you don't have to create a new table each time you want a new level of sub-category.
Your products table should contain information that's common amongst all your products. E.g. name and possibly price (although if you have different prices for an individual product, then price is best stored in another table that references the product).
If you have a bunch of other information that relates to characteristics for each product, then maybe create an attributes table and another table that references each attribute's value for that product.
Here's a simple example schema:
This is more of a design decision than anything else.
This is how I would separate the tables:
categories (e.g. household)
sub_categories (e.g. bathroom is a foreign key of household)
products (e.g. Ceramic toilet)
As for the extra attributes, you can either store these directly within the products table or create another table called products_extra_attributes and store an optional NULL value within the products table which would be a foreign key pointing toward the additional attributes for the individual product.
Make sense? I'll make an edit later on if not as I'm answering this question from my phone.
Depends on how many products. If you only sold toilets and televisions I'd say go ahead and make totally separate tables for them, however if you have 100s of different product types all of which would have different attributes I might suggest creating a products table that stored common attributes (they all have a cost and, probably, a size) then a product type table that specifies a set of attributes for each product type, then a attributes table to define the attributes and lat a product values table.
So for example, take a Sony TV. It would be in products with the price and a link to the product type, which would be TV. That would one to many join to attributes that all TVs had and Sony TV would have entries in the product values for each of those attributes. This way, you wouldn't have to redefine shared attributes, so when you started selling other things that had resolution, you could just add them to the product type.
Make sense?

Database: Schema Verification [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
Improve this question
Can I please get input on the following subset of a schema?
One of the goals of this database is to be able to store the membership info for two completely different types of members. In this schema I just named them Users and Businesses. I am far enough along in the design of this database and know that Users and Businesses will come from different tables as represented here. The concern is tracking their membership information.
Here are some knowns:
Both types of members will be paying parties
Memberships can lapse and it is important to check when memberships are due
In tracking the status of a membership dates, subscription dates will need to be posted for the members to see and reminders sent out for renewal of membership
Suspended members will still exist in the DB for reactivation but will not have access until then
Each member, regardless of type, will have its own unique member id and each user/business can only have one membership
The Membership_Types table will hold information in regards to whether or not a member is a paying member or a comp member or part of any group memberships.
In the User_Memberships and Business_Memberships tables I have identified a member_status attribute as I will need a quick look into the active state of a membership. Instead of using a boolean status here should I switch it out with a membership_suspended_date and perform a calculation off of that instead?
Any input into the good or bad of this design will be greatly appreciated. Thanks
EDIT
Attempt #2 trying to take into consideration input from dportas.
Since a there can only be a given unique instance of a member (user or business) I added membership_change_date to capture the history of a member if they are to switch from free to paid to free etc.
Any inputs here still considering the original criteria listed above.
The two inline graphics do not appear in my browsers, so I am going by your text, and Ken's answer.
I do not believe this question has been dealt with fully.
Your desc of Membership_Type seems to me to be Subscription_Type
SubscriptionType holds generic info re pricing, terms, etc
Subscription holds info re the specific pricing, expiration dates, etc for a Member.
Yes, this is a classic case for Supertype-Subtypes or Orthogonal Design (commonly required but unfortunately not commonly understood)
Member is the Supertype; User and Business are Exclusive Subtypes. The Relational is 1::0-or-1 and one Subtype must exist for each Member
UserId and BusinessId are RoleNames for MemberId, implemented as Primary Keys in the Subtypes, which is also the Foreign Key to Member; there is no additional Id column in the Subtypes.
Easily implemented declaratively in SQL
This is pure Fifth Normal Form
Full Referential and Data Integrity is maintained in any Standard SQL (code in the Non-SQLs)
The Status of a Member is easily derived from the latest Subscription row MAX(Subcription.Date).
Any flag or boolean in Member for that purpose is duplicate data and will introduce an Update Anomaly (where the Normalised model has none).
▶Membership Entity Relation Diagram◀
Readers who are unfamiliar with the Standard for Modelling Relational Databases may find ▶IDEF1X Notational◀ useful.
If you provide the Group::Member info, I can model that.
"each user/business can only have one membership"
The table design you have displayed seems "over-normalized" and does not model what you are describing. The key insight is that a member of any kind is recorded only once regardless of whether they are a business or a "user", and they retain their account forever even if it lapses and gets reinstated repeatedly. This means you are only tracking one thing: users=members=businesses. That means, so far, one table.
Your second table is a transaction history for each member/user/business. Note that a comp goes in as a payment with 0.00 dollars.
"The Membership_Types table will hold information in regards to whether or not a member is a paying member or a comp member or part of any group memberships."
OK, this is the third table, membership types, with details on pricing.
You would have to tell us more about the group memberships before I can say what to do with those.
As for most of the rest of these requirements, they are all about notifications, those come out of the transaction table.
I suggest you create a new supertype table for all the data common to both types of membership (type code, status, date, duration). As a rule, I think it would be better for those columns to appear in one table, not two. In fact there's a name for this rule: The Principle of Orthogonal Design.
This pattern might also be useful to you: http://www.tdan.com/view-articles/5014

Resources