DB table many-to-many connection to children and grandchildren directly - database

I'm designing a database with a connection I haven't encountered before and wondering the best approach.
Let's say I have an Invoice, and that invoice can be assigned to an Organization, or an Individual, and in some cases that individual can be part of an Organization.
The way I have this thought-out so far is as follows:
Organizations Invoices Individuals
-pk -pk -pk
-name -organization_id -org_member_id
-address_id -individual_id -name
-... -... -...
So if an invoice is assigned to an individual, the individual_id is used. If that individual is associated with an organization then a through association would pick that up... (but i imagine organization_id would remain nil?) However if only an organization is assigned to the invoice then individual_id would of course be nil.
Not sure what the best way to go about this is. Thanks in advance for any advice.

There are multiple ways in which you can approach this.
One approach : as you mentioned, if the Invoice is individual based, then only individual_id is filled while keeping the organization_id as null. If that individual is part of an organization, then you can fill that organization's ID in to organization_id - so this column can be NULLABLE in your schema. If invoice is assigned to an organization only, then fill that id and keep individual_id as NULL.
Another approach : Introduce a column named assignee_type [char(1)] and use either O or I to determine the type of assignment, and just fill the assignee_id column with either Individual or Organization ID only. When you query the data, you need to refer the assignee_type column and then based on that join with either Organization table or Individual table - this can add overhead.
Both approaches have their own pros and cons, it depends on how your retrievals are going to be from this Invoice table, that will influence which approach you could take.

Related

Redundant relation: Is this a violation of database normalization?

I have a table with products that I offer. For each product ever sold, an entry is created in the ProductInstance table. This refers to this instance of the product and contains information such as the next due date (if the product is to be billed monthly) and other information relevant to this instance (e.g. personal branding).
For understanding: The products are service contracts. The template of the contract is stored in the product table (e.g. "Monthly lawn mowing"). The product instance is then e.g. "Monthly lawn mowing in sample street" and contains information like the size of the garden or something specific to this instance of the service instead of the general product.
An invoice is created for a product instance either one time or recurring. An Invoice may consists of several entries. Each entry is represented by an element in the InvoiceEntry table. This is linked to the ProductInstance to create the reference to the invoice.
I want to extend the database with purchase orders. To do this, a record is created in the Order table. This contains a relation to the customer and e.g. the order date. The single products of the order are mapped by an OrderEntry. The initial invoice generated for the order is linked via the field "invoice_id" in the table order. The invoice items from the initial order are created per OrderEntry and create one InvoiceEntry each. However, I want the ProductInstance to be created only after the invoice is paid. Therefore the OrderEntry has a relation to the product and not only to the ProductInstance. Once the order has been created, the instance is created and linked to the OrderEntry.
I see the problem that the relation between Order and Invoice is doubled: once Order <-> Invoice and once Order <-> OrderEntry <-> InvoiceEntry <-> Invoice.
And for the Product: OrderEntry <-> Product and OrderEntry <-> ProductInstance <-> Product.
Model of the described database
My question is if this "duplicate" relation is problematic, or could cause problems later. One case that feels messy to me is, what should I do if I want to upgrade the ProductInstance later (to an other product [e.g. upgrade to bigger service])? The order would still show the old product_id but the instance would point to a new product_id.
This is a nice example of real-life messy requirements, where the 'pure' theory of normalisation has to be tempered by compromises. There's no 'slam-dunk right' approach; there's some definitely 'wrong' approaches -- your proposed schema exhibits some of those. I suspect there's not even a 'best' approach. Thank you for expanding the description of the business context -- especially for the ProductInstance table.
But still your description won't support legally required behaviour:
An invoice is created for a product instance either one time or recurring. An Invoice may consists of several entries. Each entry is represented by an element in the InvoiceEntry table.
... I want the ProductInstance to be created only after the invoice is paid.
An invoice represents an indebtedness from customer to supplier. It applies at one date only, not "recurring". (So leaving out the Invoice date has exactly got in the way of you "thinking about relations".) A recurring or cyclical billing arrangement would be represented by something like a 'contract' table, from which an Invoice is generated by some scheduled process.
Or ... your "recurring" means the invoice is paid once up-front for a recurring service(?) Still you need an Invoice date. The terms of service/its recurrence would be on the ProductInstance table.
I can see no merit in delaying recording the ProductInstance 'til after invoice payment. Where are you going to hold the terms of service in the meantime? If you're raising an invoice, your auditors/the statutory authorities will want you to provide records of what the indebtedness relates to. Create ProductInstance ab initio and put a status on it. (Or in the application look up the Invoice's paid status before actually providing the service.)
There's something else about Invoices you're currently failing to capture -- and that has also lead you to a wrong design: in general there is more making up the total $ value of an invoice than product lines, such as discounts applying to the invoice overall rather than particular products; delivery charges; installation costs or inspection/certification; taxes (local/State/Federal).
From your description perhaps the only one applying is taxes. ("in this world nothing can be said to be certain, except death and taxes.") And taxes are not specific to products/no product_instance_id is applicable on an InvoiceEntry.
For this reason, on ERP schemas in general, there is no foreign key declared from InvoiceEntry to Product/Instance. (In your case you might get away with product_instance_id being nullable, but yeuch.) There might be a system-generated XRef text column, which contains different content according to what the InvoiceEntry represents, but any referencing can't be declared to the schema. (There might be a 'fully normalised' way to represent that with an auxiliary linkage table, but maintaining that in step adds too much complexity to the application.)
I see the problem that the relation between Order and Invoice is doubled: once Order <-> Invoice and once Order <-> OrderEntry <-> InvoiceEntry <-> Invoice.
Again think about the business sequence of operations that generate these records: ordering happens as a prelude to invoicing. You can't put an invoice_id on Order, because you haven't created the Invoice yet. You might put the order_id on Invoice. But here you're again in the situation that not all Invoices arrive via Orders -- some might be cash sales/immediate delivery. (You could make order_id nullable, but yeuch.) For this reason on ERP schemas in general, there is no foreign key declared from Invoice to Order, etc, etc.
And the same thinking with OrderEntry <-> InvoiceEntry: your proposed schema has the sequencing wrong/the reference points the wrong way. (And not every InvoiceEntry will have corresponding OrderEntry.)
On OrderEntry, having all of (OrderEntry)id and product_id and product_instance_id seems to me to give you way too many opportunities for tangling it all up. Can an Order have multiple Entrys for the same product_id? -- why/how? Can it have multiple Entrys for the same product_instance_id? -- why/how? Can there be a product_instance_id which refers to a different product_id than OrderEntry.product_id? This is exactly the sort of risk for confusing entanglement that normalisation aims to remove/reduce.
The customer is ordering a ProductInstance: mowing a particular size of garden at a particular address, fortnightly on a Tuesday afternnon. So OrderEntry.product_instance_id is what you want; .product_id is wrong. So (again) you need to create ProductInstance at time of recording the Order. Furthermore I strongly suspect you don't need an id on OrderEntry; instead give it a compound key (order_entry_id, product_instance_id). [**]
[**] I see you're using 'eloquent'. I suspect this is requiring id on every table. So you're not even using a relational database, this is some sort of Object-Relational hybrid. Insisting on a dedicated single id as key on every table is toxic. It has lead schema designers astray every time I get called in to help -- as here. Please if you can at all avoid it, don't do that.

What's the proper way to associate different account types (database types) to payments and invoices?

I've run into a bit of a pickle during my development of a web application. I've boiled down the complexity of the application for sake of simplicity in this question.
The purpose of this web application is to sell insurance. Insurance can be purchased through an agent (Agency) or over the phone directly (Customer). Insurance policies can be paid through the agency or the customer can pay for the policy directly. So money is owed (invoiced) and received (payments) from multiple sources (Agencies/Customers).
Billing Options:
Agency (Agency collects from customer outside of app)
Customer
Here's where it gets complicated. Agencies are stored in a separate database table than customers (for obvious reasons). However, both agencies and customers need to be able to make payments and have invoices assigned to them. I'm having difficulty figuring out how to create the proper database schema to allow for both types of database records to be connected to their invoices and payments.
My initial plan was to set up separate relationship (joining) tables that link the agencies and customers to invoices/payments.
However, now that I've been thinking about the problem more, I think it might be beneficial to merge both agencies and customers into a single "Payee" table which would then be associated with payments/invoices. The payee table would only store a primary key. It would not contain actual names or info for the payee - instead I would pull that data via a JOIN with either the agencies or customers tables.
Regardless of whatever solution I choose I am still faced with the problem when creating a new payment record is that I need to scan both the agencies and customers table for possible payees. I'm wondering if there's a proper way to approach this from a database schema standpoint (or from an accounting/e-commerce standpoint).
What is the correct way to handle this type of situation? All ideas and possible solutions are most welcome!
Update 01:
After a few helpful suggestions (see below) I've come up with a possible solution that may solve this issue while keeping the data normalized.
The one thing about this method that rubs me the wrong way is that I will have to make multiple table selects to get a list of all the people who can potentially make payments and/or have invoices assigned to them.
Perhaps this is unavoidable though in this situation since indeed there are different "types" of people that can be associated with payments and invoices. I'm stuck with a situation where I have two different types of records that need to be associated to the same thing. In the above approach I'm using the FKs to link each table (Agencies/Customers) to a Payee record (the table that unifies both Agencies/Customers) and then ultimately links them to Payments and Invoices.
Is this the proper solution? Or is there something I've overlooked?
There are several options:
You might put this like you'd do it with OOP programming and inheritance.
There is one table Person which holds an uniqueID and a type (Agency, Customer, more in Future). Additionally you might add columns with meta-data like who inserted/when/why and columns for status/soft-delete/???
There are two tables Agency and Customer, both holding a PersonID as FK.
Your Payee is the Person
You might use a schema-bound VIEW with a UNION ALL to return both tables of your modell in one result. A unique index on this view should ensure, that you'll have a unique key, at least as combination of the table-source and the ID there.
You might use a middle table with the table-source and the ID there as unique Key and use this two-column-id in you payment process
For sure there are several more...
My best friend was the first option...
My suggestion would be: instead of Payees table - to have two linking tables:
PayeeInvoices {
Id, --PK
PayeeId,
PayeeType,
InvoiceId --FK to Invoices tabse
}
and
PayeePayments {
Id, --PK
PayeeId,
PayeeType,
PaymentId --FK to Payments table.
}.
PayeeType is an option of two: Customer or Agency. When creating a new payment record you can query PayeeInvoices by InvoiceId to get PayeeType and corresponding PayeeId, and then lookup the rest of the data in corresponding tables.
EDIT:
Having second thoughts now. Instead of two extra tables PayeeInvoices and PayeePayments, you can just have PayeeId and PayeeType columns right in Invocies and Payments tables, assuming that Invoice or Payment belongs only to one Payee (Customer or Agency). Both my solutions are not really normalized, though.

Database Schema Recommendation

I am having a brain-cease on a data problem that I am in need of modeling. I will do my best to outline the tables, and relationships
users (basic user information name/etc)
users.id
hospitals (basic information about hospital name/etc)
hospitals.id
pages
pages.id
user_id (page can be affiliated with a user)
hospital_id (page can be affiliated with a hospital)
Here is where the new data begins, and I am having an issue
groups (name of a group of pages)
groups.id
groups_pages (linking table)
group_id
page_id
Now here is the tricky part .. a group can be 'owned' by either a user or hospital, but those pages arent necessarily affiliated with that user/hospital .. In addition, there is another type of entity (company) that can 'own' the group
When displaying the group, I will need to know of what type (user / hospital / company) the group is and be able to get the correct affiliated data (name, address, etc)
Im drawing a blank on how to link groups to its respective owner, knowing that its respective owner can be different.
Party is a generic term for person or organization.
Keep all common fields (phone no, address..) in the Party table.
Person and Hospital should have only specific fields for the sub-type.
If the company has different set of columns from Hospital simply add it as another subtype.
If Hospital and company have same columns, rename the Hospital to more generic Organization
PartyType is the discriminator {P,H}
You'd have to use some form of discriminator. Like adding a column with "owner_type", you could then use either an enum, a vchar, or just an int to represent what type of owner the column represents.
Here is a good tutorial on how to model inheritance in a database while maintaining a reasonable normal form and referential integrity.
Condensed version for you: Create another table, owners, and let it keep a minimal set of attributes (what users and hospitals have in common, maybe a full name, address, and of course an id). Users and hospitals will have their respective id columns that will simultaneously be their primary keys and also foreign keys referencing users.id. Give users the attributes that hospital's don't have and vice versa. Now each hospital is represented by two easily joined rows, one from owners and one from hospitals.
This allows you to reference users.id from groups.owner_id.
(There is also a simpler alternative where you create just one table for users and hospitals and put NULLs to all columns that do not apply to a particular row, but that quickly gets unwieldy.)
HospitalGroups(HospitalID, GroupID)
UserGroups(UserID, GroupID)
CompanyGroups(CompanyID, GroupID)
Groups(GroupID,....)
GroupPages(GroupID, PageID)
Pages(PageID, ...)
Would be the classic way.
The discriminator idea mentioned by #Robert would also work, but you lose referential integrity, so you need more code instead of more tables.

Many tables to a single row in relational database

Consider we have a database that has a table, which is a record of a sale. You sell both products and services, so you also have a product and service table.
Each sale can either be a product or a service, which leaves the options for designing the database to be something like the following:
Add columns for each type, ie. add Service_id and Product_id to Invoice_Row, both columns of which are nullable. If they're both null, it's an ad-hoc charge not relating to anything, but if one of them is satisfied then it is a row relating to that type.
Add a weird string/id based system, for instance: Type_table, Type_id. This would be a string/varchar and integer respectively, the former would contain for example 'Service', and the latter the id within the Service table. This is obviously loose coupling and horrible, but is a way of solving it so long as you're only accessing the DB from code, as such.
Abstract out the concept of "something that is chargeable" for with new tables, of which Product and Service now are an abstraction of, and on the Invoice_Row table you would link to something like ChargeableEntity_id. However, the ChargeableEntity table here would essentially be redundant as it too would need some way to link to an abstract "backend" table, which brings us all the way back around to the same problem.
Which way would you choose, or what are the other alternatives to solving this problem?
What you are essentially asking is how to achieve polymorphism in a relational database. There are many approaches (as you yourself demonstrate) to this problem. One solution is to use "table per class" inheritance. In this setup, there will be a parent table (akin to your "chargeable item") that contains a unique identifier and the fields that are common to both products and services. There will be two child tables, products and goods: Each will contain the unique identifier for that entity and the fields specific to it.
One benefit to this approach over others is you don't end up with one table with many nullable columns that essentially becomes a dumping ground to describe anything ("schema-less").
One downside is as your inheritance hierarchy grows, the number of joins needed to grab all the data for an entity also grows.
I believe it depends on use case(s).
You could put the common columns in one table and put product and service specific columns in its own tables.Here the deal is that you need to join stuff.
Else if you maintain two separate tables, one for Product and another for Sale. You use application logic to determine which table to insert into. And getting all sales will essentially mean , union of getting all products and getting all sale.
I would go for approach 2 personally to avoid joins and inserting into two tables whenever a sale is made.

What's more readable naming conventions for lookup tables?

We always name lookup tables - such as Countries,Cities,Regions ... etc - as below :
EntityName_LK OR LK_EntityName ( Countries_LK OR LK_Countries )
But I ask if any one have more better naming conversions for lookup tables ?
Edit:
We think to make postfix or prefix to solve like a conflict :
if we have User tables and lookup table for UserTypes (ID-Name) and we have a relation many to many between User & UserTypes that make us a table which we can name it like Types_For_User that may make confusion between UserTypes & Types_For_User So we like to make lookup table UserTypes to be like UserTypesLK to be obvious to all
Before you decide you need the "lookup" moniker, you should try to understand why you are designating some tables as "lookups" and not others. Each table should represent an entity unto itself.
What happens when a table that was designated as a "lookup" grows in scope and is no longer considered a "lookup"? You are either left with changing the table name which can be onerous or leaving it as is and having to explain to everyone that a given table isn't really a "lookup".
A common scenario mentioned in the comments related to a junction table. For example, suppose a User can have multiple "Types" which are expressed in a junction table with two foreign keys. Should that table be called User_UserTypes? To this scenario, I would first say that I prefer to use the suffix Member on the junction table. So we would have Users, UserTypes, UserTypeMembers. Secondly, the word "type" in this context is quite generic. Does a UserType really mean a Role? The term you use can make all the difference. If UserTypes are really Roles, then our table names become Users, Roles, RoleMembers which seems quite clear.
Here are two concerns for whether to use a prefix or suffix.
In a sorted list of tables, do you want the LK tables to be together or do you want all tables pertaining to EntityName to appear together
When programming in environments with auto-complete, are you likely to want to type "LK" to get the list of tables or the beginning of EntityName?
I think there are arguments for either, but I would choose to start with EntityName.
Every table can become a lookup table.
Consider that a person is a lookup in an Invoice table.
So in my opinion, tables should just be named the (singular) entity name, e.g. Person, Invoice.
What you do want is a standard for the column names and constraints, such as
FK_Invoice_Person (in table invoice, link to person)
PersonID or Person_ID (column in table invoice, linking to entity Person)
At the end of the day, it is all up to personal preference (if you can get away with dictating it) or team standards.
updated
If you have lookups that pertain only to entities, like Invoice_Terms which is a lookup from a list of 4 scenarios, then you could name it as Invoice_LK_Terms which would make it appear by name grouped under Invoice. Another way is to have a single lookup table for simple single-value lookups, separated by the function (table+column) it is for, e.g.
Lookups
Table | Column | Value
There is only one type of table and I don't believe there is any good reason for calling some tables "lookup" tables. Use a naming convention that works equally for every table.
One area where table naming conventions can help is data migration between environments. We often have to move data in lookup tables (which constrain values which may appear in other tables) along with schema changes, as these allowed value lists change. Currently we don't name lookup tables differently, but we are considering it to prevent the migration guy asking "which tables are lookup tables again?" every time.

Resources