Perplexing Complexity - Tricky Table Join - sql-server

I've been struggling with this specific set of tables for some time. There are three tables (used to be two, but it was necessary to add a third). The tables are Request, PartInfo, and Status. Request is used by the customer via a form that enters data into the table. Status is for our service agents to keep track of progress on customer requests. PartInfo is the new table, containing common data accessed by both parties.
The trick is that with each request, there is a running log of changes to that request which are stored in the same table, and linked to the original request in that series via a self-joining key called FirstRequestID (which I'll abbreviate as FID). The same is true for the Status table. Here is my basic table structure as I have currently designed it (Note: It's not too late to change the architecture if there is a better approach):
Request PartInfo Status
------- -------- ------
ID ID ID
FID FID FID
PartInfoID PartNum PartInfoID
ProductID Revision StatusID
CategoryID Description Comments
Now say I want to display the information on a particular request (including part info and status changes) in a ASP.NET GridView table. The "particular request" is identified by the FID.
Question:
How can I ensure that when I'm looking at either the Request history or the Status history, it's always pulling the proper information from the PartInfo (shared) table? In other words, what's the best way to link these three tables with the proper relationship without having 50 different junction tables to account for all the exceptions?

I apologize, but my first take on this schema was “this is a mess”. This data needs to be normalized. Unfortunately, there’s not enough information here to determine how to best do so. Based on your descriptions and part names, I come up with the following ideas.
The main entity is the Request.
Reqeusts contain Products
Requests contain Categories (unless the Category is an attribute of a Product)
Products contain Parts (unless it’s Categories that contain Parts)
The implication is that a Requested Product is associated with an arbitrary number of Parts (as opposed to a standardized set of Parts for that Product).
Status is used to track change in state of a Request over time (and is not completely dependant upon Products, Categories, or Parts)
This suggests the following tables
REQUESTS
RequestId
DateTimeCreated
PRODUCTS
ProductId
-- Add CategoryId, if it’s a Product attribute
CATEGORIES
CategoryId
REQUESTPRODUCTS
RequestProductId
RequestId
ProductId
DateTimeAdded
-- Add StatusId if a status entry must be made every time a product is requested
-- Note extra surrogate key. ReqeustId + ProductId + DateTimeAdded should be the
-- natural key, unless two identical products can be requested at the same time
-- (in which case add an “Quantity” column)
REQUESTCATEGORIES
RequestId
CategoryId
DateTimeAdded
-- Suorrogate key optional, as it’s not referenced by other tables
-- Drop, if categories are product attributes
PARTS
PartId
REQUESTPRODUCTPARTS
RequestProductId
PartId
-- Add StatusId if a status entry must be made every time a part is requested
STATUS
StatusId
RequestId
DateTimeAdded
Comments
There’s a log of ways this could go. You may end up with a lot of “junction” tables, but then your data will have referential integrity and accurate SQL queries become much, much simpler to write.

Related

Redundant relation: Is this a violation of database normalization?

I have a table with products that I offer. For each product ever sold, an entry is created in the ProductInstance table. This refers to this instance of the product and contains information such as the next due date (if the product is to be billed monthly) and other information relevant to this instance (e.g. personal branding).
For understanding: The products are service contracts. The template of the contract is stored in the product table (e.g. "Monthly lawn mowing"). The product instance is then e.g. "Monthly lawn mowing in sample street" and contains information like the size of the garden or something specific to this instance of the service instead of the general product.
An invoice is created for a product instance either one time or recurring. An Invoice may consists of several entries. Each entry is represented by an element in the InvoiceEntry table. This is linked to the ProductInstance to create the reference to the invoice.
I want to extend the database with purchase orders. To do this, a record is created in the Order table. This contains a relation to the customer and e.g. the order date. The single products of the order are mapped by an OrderEntry. The initial invoice generated for the order is linked via the field "invoice_id" in the table order. The invoice items from the initial order are created per OrderEntry and create one InvoiceEntry each. However, I want the ProductInstance to be created only after the invoice is paid. Therefore the OrderEntry has a relation to the product and not only to the ProductInstance. Once the order has been created, the instance is created and linked to the OrderEntry.
I see the problem that the relation between Order and Invoice is doubled: once Order <-> Invoice and once Order <-> OrderEntry <-> InvoiceEntry <-> Invoice.
And for the Product: OrderEntry <-> Product and OrderEntry <-> ProductInstance <-> Product.
Model of the described database
My question is if this "duplicate" relation is problematic, or could cause problems later. One case that feels messy to me is, what should I do if I want to upgrade the ProductInstance later (to an other product [e.g. upgrade to bigger service])? The order would still show the old product_id but the instance would point to a new product_id.
This is a nice example of real-life messy requirements, where the 'pure' theory of normalisation has to be tempered by compromises. There's no 'slam-dunk right' approach; there's some definitely 'wrong' approaches -- your proposed schema exhibits some of those. I suspect there's not even a 'best' approach. Thank you for expanding the description of the business context -- especially for the ProductInstance table.
But still your description won't support legally required behaviour:
An invoice is created for a product instance either one time or recurring. An Invoice may consists of several entries. Each entry is represented by an element in the InvoiceEntry table.
... I want the ProductInstance to be created only after the invoice is paid.
An invoice represents an indebtedness from customer to supplier. It applies at one date only, not "recurring". (So leaving out the Invoice date has exactly got in the way of you "thinking about relations".) A recurring or cyclical billing arrangement would be represented by something like a 'contract' table, from which an Invoice is generated by some scheduled process.
Or ... your "recurring" means the invoice is paid once up-front for a recurring service(?) Still you need an Invoice date. The terms of service/its recurrence would be on the ProductInstance table.
I can see no merit in delaying recording the ProductInstance 'til after invoice payment. Where are you going to hold the terms of service in the meantime? If you're raising an invoice, your auditors/the statutory authorities will want you to provide records of what the indebtedness relates to. Create ProductInstance ab initio and put a status on it. (Or in the application look up the Invoice's paid status before actually providing the service.)
There's something else about Invoices you're currently failing to capture -- and that has also lead you to a wrong design: in general there is more making up the total $ value of an invoice than product lines, such as discounts applying to the invoice overall rather than particular products; delivery charges; installation costs or inspection/certification; taxes (local/State/Federal).
From your description perhaps the only one applying is taxes. ("in this world nothing can be said to be certain, except death and taxes.") And taxes are not specific to products/no product_instance_id is applicable on an InvoiceEntry.
For this reason, on ERP schemas in general, there is no foreign key declared from InvoiceEntry to Product/Instance. (In your case you might get away with product_instance_id being nullable, but yeuch.) There might be a system-generated XRef text column, which contains different content according to what the InvoiceEntry represents, but any referencing can't be declared to the schema. (There might be a 'fully normalised' way to represent that with an auxiliary linkage table, but maintaining that in step adds too much complexity to the application.)
I see the problem that the relation between Order and Invoice is doubled: once Order <-> Invoice and once Order <-> OrderEntry <-> InvoiceEntry <-> Invoice.
Again think about the business sequence of operations that generate these records: ordering happens as a prelude to invoicing. You can't put an invoice_id on Order, because you haven't created the Invoice yet. You might put the order_id on Invoice. But here you're again in the situation that not all Invoices arrive via Orders -- some might be cash sales/immediate delivery. (You could make order_id nullable, but yeuch.) For this reason on ERP schemas in general, there is no foreign key declared from Invoice to Order, etc, etc.
And the same thinking with OrderEntry <-> InvoiceEntry: your proposed schema has the sequencing wrong/the reference points the wrong way. (And not every InvoiceEntry will have corresponding OrderEntry.)
On OrderEntry, having all of (OrderEntry)id and product_id and product_instance_id seems to me to give you way too many opportunities for tangling it all up. Can an Order have multiple Entrys for the same product_id? -- why/how? Can it have multiple Entrys for the same product_instance_id? -- why/how? Can there be a product_instance_id which refers to a different product_id than OrderEntry.product_id? This is exactly the sort of risk for confusing entanglement that normalisation aims to remove/reduce.
The customer is ordering a ProductInstance: mowing a particular size of garden at a particular address, fortnightly on a Tuesday afternnon. So OrderEntry.product_instance_id is what you want; .product_id is wrong. So (again) you need to create ProductInstance at time of recording the Order. Furthermore I strongly suspect you don't need an id on OrderEntry; instead give it a compound key (order_entry_id, product_instance_id). [**]
[**] I see you're using 'eloquent'. I suspect this is requiring id on every table. So you're not even using a relational database, this is some sort of Object-Relational hybrid. Insisting on a dedicated single id as key on every table is toxic. It has lead schema designers astray every time I get called in to help -- as here. Please if you can at all avoid it, don't do that.

System that keeps track of inventory with a history table

I have a system that lets members rent equipment and the system should have a history of each item that was rented and by who. The system should also track who has what equipment rented/checked out and should also sort the equipment by type, status, name, etc. Lastly it should also send out notification email on equipment that are overdue.
I'm trying to understand the relationships and how I should model this. As of now my current tables and thinking is something like this:
Member Table:
Id (PK)
MemberId
FirstName
LastName
Email
EquipmentItem Table:
Id (PK)
EquipmentName
EquipmentType (FK)
EquipmentStatus (FK)
TotalQuantity
RemainingQuantity
EquipmentStatus Table:
Id (PK)
StatusName
EquipmentType Table:
Id (PK)
TypeName
EquipmentRentalHistory Table:
Id (PK)
MemberId (FK)
EquipmentId (FK)
CheckOutDate
ReturnedDate
1) I want to know the relationships between these would the rental history be a many to many relationship between the Member table and EquipmentItem table?
2) Would EquipmentItem table have a one to many relationship between the status and type, the way I see it is EquipmentItems can have many statuses or types but each status or each type can only belong to one EquipmentItem.
3) Does it make sense to have a quantity field in the EquipmentItem, I used to work in a grocery store so I'm basing the logic on barcodes where same products would usually have the same barcode e.g. (Cheetos Puff chips) all Cheetos Puff chips would have the same barcode but would have a quantity value on it. Or would it be better to have each item unique regardless if it's the same product/model?
My logic would be:
member rents out item
system logs it into the history table
system then checks how many of the same item has been checked out so far, if say we have total quantity of 4 on that item and 3 members has checked it out
we update the remaining quantity field to the difference so in this case to 1
system can then track who has what checked out by returning all records with a returned date of null
system will then check all records with a returned date of null and then do a date range on the checked out date to determine if the equipment is overdue
send notification to member emails associated with said records from step 6
I would just like some help better understanding the relationship between these and if I have modelled my tables correctly, if not, it would be great if someone can point me in the right direction of improving upon this.
To answer your questions
With respect to modelling in an ERD, I don't think that qualifies as a many-to-many relationship, but rather, EquipmentRentalHistory is its own entity that has a many-to-many relationship with both Member and EquipmentItem.
A many-to-many would be more like, "a Member has access to 0...n EquipmentItems, and each EquipmentItem can be accessed by 0...n Members".
I would disagree that they are a one-to-many relationships.
An oxygen tank and a pair of flippers can both be classified as 'Scuba Gear' and have the status 'Checked Out'.
You could have multiple 'Scuba Gear' tags and assign each unique 'Scuba Gear' tag to its very own EquipmentItem, but then you'll just be creating new tags for every new EquipmentItem, rather than reusing existing ones.
That really depends on whether you want to identify exactly which piece of equipment a member rented (maybe something is damaged you can track down everyone who rented that specific one?). If you do differentiate, then every item will just be its own row. You should also add a new column as an external identifier, but there would be no need to keep a tally.
If it's all the same to you, then I would only keep the total but not the available. If you kept the available column, then you would constantly have to update it whenever something is logged in EquipmentRentalHistory. This would be annoying if the tables fall out of sync. You could just query EquipmentRentalHistory for the Id of the equipment, and count up the entries where returnedDate IS NULL for the number of equipment that is currently in use
Additional Note
It might be good to have a 'due date' column in the rental history rather than hard code the date calculation in case you want to varying due dates. This way you can also grant extensions.

What's the proper way to associate different account types (database types) to payments and invoices?

I've run into a bit of a pickle during my development of a web application. I've boiled down the complexity of the application for sake of simplicity in this question.
The purpose of this web application is to sell insurance. Insurance can be purchased through an agent (Agency) or over the phone directly (Customer). Insurance policies can be paid through the agency or the customer can pay for the policy directly. So money is owed (invoiced) and received (payments) from multiple sources (Agencies/Customers).
Billing Options:
Agency (Agency collects from customer outside of app)
Customer
Here's where it gets complicated. Agencies are stored in a separate database table than customers (for obvious reasons). However, both agencies and customers need to be able to make payments and have invoices assigned to them. I'm having difficulty figuring out how to create the proper database schema to allow for both types of database records to be connected to their invoices and payments.
My initial plan was to set up separate relationship (joining) tables that link the agencies and customers to invoices/payments.
However, now that I've been thinking about the problem more, I think it might be beneficial to merge both agencies and customers into a single "Payee" table which would then be associated with payments/invoices. The payee table would only store a primary key. It would not contain actual names or info for the payee - instead I would pull that data via a JOIN with either the agencies or customers tables.
Regardless of whatever solution I choose I am still faced with the problem when creating a new payment record is that I need to scan both the agencies and customers table for possible payees. I'm wondering if there's a proper way to approach this from a database schema standpoint (or from an accounting/e-commerce standpoint).
What is the correct way to handle this type of situation? All ideas and possible solutions are most welcome!
Update 01:
After a few helpful suggestions (see below) I've come up with a possible solution that may solve this issue while keeping the data normalized.
The one thing about this method that rubs me the wrong way is that I will have to make multiple table selects to get a list of all the people who can potentially make payments and/or have invoices assigned to them.
Perhaps this is unavoidable though in this situation since indeed there are different "types" of people that can be associated with payments and invoices. I'm stuck with a situation where I have two different types of records that need to be associated to the same thing. In the above approach I'm using the FKs to link each table (Agencies/Customers) to a Payee record (the table that unifies both Agencies/Customers) and then ultimately links them to Payments and Invoices.
Is this the proper solution? Or is there something I've overlooked?
There are several options:
You might put this like you'd do it with OOP programming and inheritance.
There is one table Person which holds an uniqueID and a type (Agency, Customer, more in Future). Additionally you might add columns with meta-data like who inserted/when/why and columns for status/soft-delete/???
There are two tables Agency and Customer, both holding a PersonID as FK.
Your Payee is the Person
You might use a schema-bound VIEW with a UNION ALL to return both tables of your modell in one result. A unique index on this view should ensure, that you'll have a unique key, at least as combination of the table-source and the ID there.
You might use a middle table with the table-source and the ID there as unique Key and use this two-column-id in you payment process
For sure there are several more...
My best friend was the first option...
My suggestion would be: instead of Payees table - to have two linking tables:
PayeeInvoices {
Id, --PK
PayeeId,
PayeeType,
InvoiceId --FK to Invoices tabse
}
and
PayeePayments {
Id, --PK
PayeeId,
PayeeType,
PaymentId --FK to Payments table.
}.
PayeeType is an option of two: Customer or Agency. When creating a new payment record you can query PayeeInvoices by InvoiceId to get PayeeType and corresponding PayeeId, and then lookup the rest of the data in corresponding tables.
EDIT:
Having second thoughts now. Instead of two extra tables PayeeInvoices and PayeePayments, you can just have PayeeId and PayeeType columns right in Invocies and Payments tables, assuming that Invoice or Payment belongs only to one Payee (Customer or Agency). Both my solutions are not really normalized, though.

How to design database request/approval tables?

I have the following kind of request tables:
oversea_study_request
course_request
leave_request
in these request functions, approving officer can post multiple remarks and also approve or reject the request. The system must be able to capture the history of the actions taken.
What is the best way to design it?
Should I create a common table to store the approval information and remarks.
Should I store in each request table the approval information and remarks instead.
Can someone advise on the pros and cons of each approach?
Similar to the fields organisation question here: How to better organise database to account for changing status in users; and my answer there:
If the all the requests have the same fields, field types and info, including mandatory (NOT NULL) and optional, etc. then it's better to put all the requests into one requests table. Designate one field to be request_type, with an int for efficiency and SQL convenience, or an ENUM type. Example:
overseas study = 1
course = 2
leave = 3
Similarly, do that for the approvals table also...if the process is the same for each type then store those together. Store the request id (requests.id). Since you have multiple approval-comments and approval+rejection possible, store these in approvals.action and approvals.action_date. If "actions" are independent of "approve/reject" - that is, if you can post a comment without approving/rejecting OR if you can approve/reject without a comment - then store the actions and comments separately, and include the request.id.
So you have:
Table1: requests
id INT
request_type INT (or ENUM)
request_date DATETIME
...
Table2: approvals (or 'actions', to be general)
id
request_id # (refers to requests.id above)
action_type # (approve or reject)
action_date
comment
If comments and approvals are NOT necessarily together, then:
Table2: actions
id, request_id, action_type, action_date
Table3: comments
id, request_id, comment, comment_date
And of course, add the user_id, username, etc. tables/fields. (The id in each table is it's own Primary Key)
Each request + actions + comments can be found with a SELECT and LEFT JOINs
Btw, it's "overseas" study, not "oversea" study - it's not a course in an airplane ;-)

Any simple approaches for managing customer data change requests for global reference files?

For the first time, I am developing in an environment in which there is a central repository for a number of different industry standard reference data tables and many different customers who need to select records from these industry standard reference data tables to fill in foreign key information for their customer specific records.
Because these industry standard reference files are utilized by all customers, I want to reserve Create/Update/Delete access to these records for global product administrators. However, I would like to implement a (semi-)automated interface by which specific customers could request record additions, deletions or modifications to any of the industry standard reference files that are shared among all customers.
I know I need something like a "data change request" table specifying:
user id,
user request datetime,
request type (insert, modify, delete),
a user entered text explanation of the change request,
the user request's current status (pending, declined, completed),
admin resolution datetime,
admin id,
an admin entered text description of the resolution,
etc.
What I can't figure out is how to elegantly handle the fact that these data change requests could apply to dozens of different tables with differing table column definitions. I would like to give the customer users making these data change requests a convenient way to enter their proposed record additions/modifications directly into CRUD screens that look very much like the reference table CRUD screens they don't have write/delete permissions for (with an additional text explanation and perhaps request priority field). I would also like to give the global admins a tool that allows them to view all the outstanding data change requests for the users they oversee sorted by date requested or user/date requested. Upon selecting a data change request record off the list, the admin would be directed to another CRUD screen that would be populated with the fields the customer users requested for the new/modified industry standard reference table record along with customer's text explanation, the request status and the text resolution explanation field. At this point the admin could accept/edit/reject the requested change and if accepted the affected industry standard reference file would be automatically updated with the appropriate fields and the data change request record's status, text resolution explanation and resolution datetime would all also be appropriately updated.
However, I want to keep the actual production reference tables as simple as possible and free from these extraneous and typically null customer change request fields. I'd also like the data change request file to aggregate all data change requests across all the reference tables yet somehow "point to" the specific reference table and primary key in question for modification & deletion requests or the specific reference table and associated customer user entered field values in question for record creation requests.
Does anybody have any ideas of how to design something like this effectively? Is there a cleaner, simpler way I am missing?
Option 1
If preserving the base tables is important then I would create a "change details" table as a child to your change request table. I'm envisioning something like
ChangeID
TableName
TableKeyValue
FieldName
ProposedValue
Add/Change/Delete Indicator
So you'd have a row in this table for every proposed field change. The challenge in this scenario is maintaining the mapping of TableName and FieldName values to the actual tables and fields. If your database structure if fairly static then this may not be an issue.
Option 2
Add a ChangeID field to each of your base tables. When a change is proposed add a record to the base table with the ChangeID populated. So as an example if you have a Company table, for a single company you could have multiple records:
CompanyCode ChangeID CompanyName CompanyAddress
----------- -------- ----------- --------------
COMP1 My Company Boston <-- The "live" record
COMP1 1 New Name Boston <-- A proposed change
When the admin commits the change the existing live record is deleted or archived and the ChangeID value is removed from the proposed record making it the live record. It may be a little tricky to handle proposed deletions with this option. This option also has the potential for impacting performance of selecting live data for normal usage. However it does save you the hassle of maintaining a list of table names and field names somewhere in your code.
I'm sure others will have some opinions!

Resources