Redundant relation: Is this a violation of database normalization? - database

I have a table with products that I offer. For each product ever sold, an entry is created in the ProductInstance table. This refers to this instance of the product and contains information such as the next due date (if the product is to be billed monthly) and other information relevant to this instance (e.g. personal branding).
For understanding: The products are service contracts. The template of the contract is stored in the product table (e.g. "Monthly lawn mowing"). The product instance is then e.g. "Monthly lawn mowing in sample street" and contains information like the size of the garden or something specific to this instance of the service instead of the general product.
An invoice is created for a product instance either one time or recurring. An Invoice may consists of several entries. Each entry is represented by an element in the InvoiceEntry table. This is linked to the ProductInstance to create the reference to the invoice.
I want to extend the database with purchase orders. To do this, a record is created in the Order table. This contains a relation to the customer and e.g. the order date. The single products of the order are mapped by an OrderEntry. The initial invoice generated for the order is linked via the field "invoice_id" in the table order. The invoice items from the initial order are created per OrderEntry and create one InvoiceEntry each. However, I want the ProductInstance to be created only after the invoice is paid. Therefore the OrderEntry has a relation to the product and not only to the ProductInstance. Once the order has been created, the instance is created and linked to the OrderEntry.
I see the problem that the relation between Order and Invoice is doubled: once Order <-> Invoice and once Order <-> OrderEntry <-> InvoiceEntry <-> Invoice.
And for the Product: OrderEntry <-> Product and OrderEntry <-> ProductInstance <-> Product.
Model of the described database
My question is if this "duplicate" relation is problematic, or could cause problems later. One case that feels messy to me is, what should I do if I want to upgrade the ProductInstance later (to an other product [e.g. upgrade to bigger service])? The order would still show the old product_id but the instance would point to a new product_id.

This is a nice example of real-life messy requirements, where the 'pure' theory of normalisation has to be tempered by compromises. There's no 'slam-dunk right' approach; there's some definitely 'wrong' approaches -- your proposed schema exhibits some of those. I suspect there's not even a 'best' approach. Thank you for expanding the description of the business context -- especially for the ProductInstance table.
But still your description won't support legally required behaviour:
An invoice is created for a product instance either one time or recurring. An Invoice may consists of several entries. Each entry is represented by an element in the InvoiceEntry table.
... I want the ProductInstance to be created only after the invoice is paid.
An invoice represents an indebtedness from customer to supplier. It applies at one date only, not "recurring". (So leaving out the Invoice date has exactly got in the way of you "thinking about relations".) A recurring or cyclical billing arrangement would be represented by something like a 'contract' table, from which an Invoice is generated by some scheduled process.
Or ... your "recurring" means the invoice is paid once up-front for a recurring service(?) Still you need an Invoice date. The terms of service/its recurrence would be on the ProductInstance table.
I can see no merit in delaying recording the ProductInstance 'til after invoice payment. Where are you going to hold the terms of service in the meantime? If you're raising an invoice, your auditors/the statutory authorities will want you to provide records of what the indebtedness relates to. Create ProductInstance ab initio and put a status on it. (Or in the application look up the Invoice's paid status before actually providing the service.)
There's something else about Invoices you're currently failing to capture -- and that has also lead you to a wrong design: in general there is more making up the total $ value of an invoice than product lines, such as discounts applying to the invoice overall rather than particular products; delivery charges; installation costs or inspection/certification; taxes (local/State/Federal).
From your description perhaps the only one applying is taxes. ("in this world nothing can be said to be certain, except death and taxes.") And taxes are not specific to products/no product_instance_id is applicable on an InvoiceEntry.
For this reason, on ERP schemas in general, there is no foreign key declared from InvoiceEntry to Product/Instance. (In your case you might get away with product_instance_id being nullable, but yeuch.) There might be a system-generated XRef text column, which contains different content according to what the InvoiceEntry represents, but any referencing can't be declared to the schema. (There might be a 'fully normalised' way to represent that with an auxiliary linkage table, but maintaining that in step adds too much complexity to the application.)
I see the problem that the relation between Order and Invoice is doubled: once Order <-> Invoice and once Order <-> OrderEntry <-> InvoiceEntry <-> Invoice.
Again think about the business sequence of operations that generate these records: ordering happens as a prelude to invoicing. You can't put an invoice_id on Order, because you haven't created the Invoice yet. You might put the order_id on Invoice. But here you're again in the situation that not all Invoices arrive via Orders -- some might be cash sales/immediate delivery. (You could make order_id nullable, but yeuch.) For this reason on ERP schemas in general, there is no foreign key declared from Invoice to Order, etc, etc.
And the same thinking with OrderEntry <-> InvoiceEntry: your proposed schema has the sequencing wrong/the reference points the wrong way. (And not every InvoiceEntry will have corresponding OrderEntry.)
On OrderEntry, having all of (OrderEntry)id and product_id and product_instance_id seems to me to give you way too many opportunities for tangling it all up. Can an Order have multiple Entrys for the same product_id? -- why/how? Can it have multiple Entrys for the same product_instance_id? -- why/how? Can there be a product_instance_id which refers to a different product_id than OrderEntry.product_id? This is exactly the sort of risk for confusing entanglement that normalisation aims to remove/reduce.
The customer is ordering a ProductInstance: mowing a particular size of garden at a particular address, fortnightly on a Tuesday afternnon. So OrderEntry.product_instance_id is what you want; .product_id is wrong. So (again) you need to create ProductInstance at time of recording the Order. Furthermore I strongly suspect you don't need an id on OrderEntry; instead give it a compound key (order_entry_id, product_instance_id). [**]
[**] I see you're using 'eloquent'. I suspect this is requiring id on every table. So you're not even using a relational database, this is some sort of Object-Relational hybrid. Insisting on a dedicated single id as key on every table is toxic. It has lead schema designers astray every time I get called in to help -- as here. Please if you can at all avoid it, don't do that.

Related

Data in one table affects another

I'm attempting to create a relational database for a tech company who perform sales, leases and offer support. I must store data for each of these, but the items that they are selling have the potential to be hardware or software based. This means that for sales that relate to hardware, a delivery address must be stored, whereas this would not be required for software.
So far I have attempted modelling this conceptually and have decided to have tables "sales", "leasing" and "support". Then linking to this, I have "product", which will have an id and generic product information, linking to separate "hardware" and "software" tables.
Part of the conceptual model
My concern is that if the product is hardware-based, the sales/leasing/support table's attributes would need to be different to allow for an address entry.
This has left me really stuck with how to model this part, and I would really appreciate any input that anyone could give.
Thanks in advance!
I think you want to look into normalization more and see if this answers your question. I think you should focus on one problem point and really expand on it with data/explanation/ERD to show us what data is available in what scenarios.
Let me expand some assumptions on what you said:
This means that for sales that relate to hardware, a delivery address must be stored, whereas this would not be required for software.
So let's say a PRODUCT is SOLD. The "Sale" is an entity that holds information such as
Date of sale
Price sold
Qty
Then, if the product is hardware or software, extra data is stored. Let's say you only store extra details for hardware - namely, a delivery address:
Delivery address (stored for hardware sales only)
So it sounds like "Sale_Hardware" is a sub-entity of "Sale".
PRODUCT ---> SALE (one product can have many sales, but one sale can only have one product) - see note below.
SALE ----- SALE_HARDWARE (this is a one to one relationship, and SALE_HARDWARE will only have data for some SALEs which are hardware based).
--
Note: This is a very simplistic example. Above I mentioned that PRODUCT ---> SALE (one to many) but in reality this wouldn't be true. A sale can contain many products. That is why a SALE or ORDER is usually divided into ORDER_ITEMS and each ORDER_ITEM contains a single PRODUCT.
Hope this makes sense, and I hope this touches on how to design your database using normalization. Let me know if you have questions, or if you want to change your question to focus on a specific few entities that you want to further normalize.
It's missing a lot of details, but I would go for something like that. Obviously you need to fill in the gaps!
So you get a product table, a customer table. The Hardware - Software does not realy mater, but if you must, add a Type column to your products.
A line is one product with quantity. An Order is a bunch of lines grouped together. Then again, a Lease is also a bunch of lines grouped together, but with additional conditions then a sale.
Product
Productid
ProductName
Price
ToShip: boolean, can this product be shipped or not?
....
Customer
Customerid
Firstname
Lastname
ShippingAddress
BillingAddress
Phone
...
Order
Orderid
Buyer: FK to Customer.Customerid
ShippingAddress: boolean, true == use address from Customer
false == use address here
ShiptoAddress
Shipped
TrackingNumber
Line
Lineid
Productid: FK to Product.Productid
Quantity
DiscountPercentage
Order <-> Line
Orderid: FK to Order.Orderid
Lineid: FK to Line.Lineid
Lease
Leaseid
Leaser: FK to Customer.Customerid
Terms
...
Lease <-> Line
Leaseid: FK to Lease.Leaseid
Lineid: FK to Line.Lineid
Support: support contract
Supportid
Term
SuportLevel
...
Support <-> Line
Supportid: FK to Support.Supportid
Lineid: FK to Line.Lineid
Lease could be linked to another table that specifies the LeaseType (standard conditions, lease agreement, ...). Something similar for Support.
This allows the same product to be purchased, leased or supported.
Just ideas, adapt as required.

multiple stores (sId), multiple products(pId) different prices. how do I design an efficient database

Right now, I am designing the database, as such I don't have any code. I am looking to use sql server, asp.net if that is relevant.
I have a big number of stores and a big number of products too, both in some thousands. For the same pId, prices may vary by sId. I would build it like this:
1. one "store" table containing fields (sId, name, location),
2. one "products" table containing fields (pId, name size, category, sub-category) and
3. "max(sId)" number of price tables containing fields (pId, mrp, availability).
where max(sId) is the total number of stores.
I would rather not make "max(pId)" number of tables containing fields (sId, mrp, availability) as I need to provide a UI to each store so that they can update the details about product prices and availability at their respective stores. I also need to display some products of a particular store but I never need to display some stores for any specific product. That is, search for stores by product is not required, but listing of products by store would be required.
Is this a good way or can I do better?
You appear to be on the right track and I will offer some recommendations. Although there is no requirement to display some stores for any specific products, you should always think about how the requirements will change and how your system can handle that. Build your system so that you can answer questions like these easily - What stores have product ABC priced under $3/piece?
Store table should contain, as you mentioned, information about stores. Take Aaron Bertrand's comment seriously. Name the fields in a way that the next developer can read and figure out what it is. User StoreID instead of sID.
StoreID StoreName ...other fields
------- --------------
1 North Chicago
2 East Los Angeles
Product table should contain information about products. It would be better to store category and sub-category into a different table.
ProductID ProductName ...other fields
--------- --------------
1 Bread
2 Soap
Categories can be located in its own table with hierarchal structure. See Hierarchal Data and how to use hierarchyid data type. This may help in finding out the depth of each top level category and help management decide if they are going overboard with categorization and making life miserable for everybody, including themselves unknowingly.
Many-to-many ProductCategory table can link products to categories. Also keep a history table. When a product's category is changed, keep track of what it was and what it is set to. It may help in answering questions such as - How many products were moved from Agriculture to Construction category in the last 6 months?
Many-to-many StoreProductPrice can bring together store and product and a price can be defined there. Also remember - prices may differ by customers also. Some customers may get discounts at a certain level. Although this may be too much to discuss here, it should be kept in the back of the mind in case a requirement to support customer discount structure comes up.
StoreProductID StoreID ProductID Price
-------------- ------- --------- -----
1 1 1 $4.00
2 1 2 $1.00
3 2 1 $4.05
4 2 2 $1.02
Availability of the product should be done through the inventory management database table(s). For example, you may have a master table of Warehouse and master table of Location. Bringing them together would be WearhouseLocation table. A WarehouseProduct table may bring together warehouse, product and units available.
Alternatively, your production or procurement facility might be dumping data into ProcuredProduct table. Your manufacturing unit might be putting locks on a subset of products while building something out of it. Your sales unit might be putting locks on a subset of products they are trying to sell. In other words, your products may be continually get allocated. You may run queries to find out availability of a certain product and that can be a little taxing. During any such allocation, the number of available units can be updated in a single table (which contains calculated available products that you can comfortably rely on).
So...depending on your customer's needs, the system you are building can get fairly complicated. I am recommending that you think about these things and keep your database structure flexible to anticipated changes. Normalization is a good thing, and de-normalization has its place also. Use them wisely.

A product has one price, a price has one product - hasone nhibernate relationship?

I was a little miffed about the one-to-one relationship explanation on the 'I Think You Mean A Many To One' article.
In this instance for example, a product has one price because the business in question is small, niche, localized and supports only a single currency. Multiple prices per product make no sense in this case? I'm doubtful I'm grasping the concept correctly though, because everywhere I read says it will probably be a many-to-one even if you think it isn't?
Can somebody enlighten me please? :)
In an attempt to gain more reputation so that I can help in comments instead of an "answer" The one-to-many vs one-to-one is this
View a one-to-one as an extension of the table you are looking at.
Table B extends Table A. Meaning the information wasn't necessarily relevant enough to include in the table directly, but has a bidirectional relationship with each other. Basically meaning that As Table A, I am not dependent on the information in Table B, but Table B's information is very dependent on me. For the price example it means that Table A has a row related to a row in table B. So if you entering unique information in your Price table around every item to match in Table A, then this would be useful. As in say you had a description column about the item in your price table. Otherwise the price table in this case may just be irrelevant to have in the schema.
in a one-to-many relationship Table B usually has no reference back to Table A. So in the case of price, the items you are looking at do have a price, but prices aren't exclusive to items. So to better define, A number of things may have the price 9.99, but 9.99 only needs to exist in your pricing table once.
I am not familiar with the article you refer to. However, price is a classic example of a slowly changing dimension. Price may be constant at any point in time, but over time, the price changes.
Such dimensions are typically implemented by having effective and end dates for the period in question.
Now, at a given point in time, a product probably does have only one price. Things that affect the price -- coupons, discounts for the purchaser, volume discounts, for example -- are not properties of the product. These are properties of the transaction.
That said, there may be circumstances where a fixed volume discount does not make sense. So, the "price" for a product might include volume, as well as time.
In any case, I would agree with you that price is not a good example of a 1-1 relationship. There are other factors such as time and volume that affect it.

Suggestion on database design - multiple tables involved into a relation

My application needs to implement a one to one relation between multiple tables. I have a table which store companies (which can be customers and suppliers, or both). There are twi Bit fields, Customer and Supplier.
Then I have different tables for various operations: Invoices, Bank operations, Cashdesk operations. An I need to pair payments with invoices. A payment is not exact amount of an invoice, but it can be split over each number of invoices. Also, an invoice can be split over multiple payments. Payments can be from both bank or cashdesk operations
My original approach was to have a table, PaymentRelations, with Foreign Keys InvoiceID, BankOpID, CashOpID and Amount, and for any payment between between them, I create a record with only two foreign ID's filled, and the corresponding amount. This way in any moment I can know for each operation (invoice or payment) how much was paid.
Also there are RI requirements, so if a document is involved in payment relation, it cannot be deleted (or there is cascade delete, so if a payment of invoice document is deleted, the related PaymentRelations records are deleted, so the counterpart operations are freed - they are no longer involved into payment relations so their amount can be fully used into other payment relations).
But appeared another situation. Since partners can be both customers and suppliers, it is possible to compensate between same type of operation on customer and supplier side of the same partner (e.g. a partner is both customer and supplier, he made an invoice as supplier for 100 and received an invoice as customer for 150, 50 was compensated between the received and the sent invoice and the rest of each is paid through one or multiple payment operations).
This can also happen for the other operations (e.g. he paid through a bank operation 100, he received through another bank operation 200, and 50 needs to be compensated between those two operations; same apply for caskdesk operations).
What approach would you use to model this kind of relations?
I would buy accounting software instead of writing it. Some wheels are worth reinventing; this isn't one of them.
But if you must . . .
Bitfields are the wrong way to identify customers and suppliers. This SO answer should get you over the issues with customers and suppliers.
If I had to design an accounting system, I think I'd start with a spreadsheet. I'd design a table of transactions in that spreadsheet, so I could get the feel of how certain transactions were alike, and how others were different. At this stage, I wouldn't worry about NULLs, about repeating groups, about transitive dependencies, or anything else like that.
Having developed a working(ish) model in the spreadsheet, I'd then try to normalize it to 5NF.

How to handle an immutable table referencing mutable tables?

In making a pretty standard online store in .NET, I've run in to a bit of an architectural conundrum regarding my database. I have a table "Orders", referenced by a table "OrderItems". The latter references a table "Products".
Now, the orders and orderitems tables are in most aspects immutable, that is, an order created and its orderitems should look the same no matter when you're looking at the tables (for instance, printing a receipt for an order for bookkeeping each year should yield the same receipt the customer got at the time of the order).
I can think of two ways of achieving this behavior, one of which is in use today:
1. Denormalization, where values such as price of a product are copied to the orderitem table.
2. Making referenced tables immutable. The code that handles products could create a new product whenever a value such as the price is changed. Mutable tables referencing the products one would have their references updated, whereas the immutable ones would be fine and dandy with their old reference
What is your preferred way of doing this? Is there a better, more clever way of doing this?
It depends. I'm writing on a quite complex enterprise software that includes a kind of document management and auditing and is used in pharmacy.
Normally, primitive values are denormalized. For instance, if you just need a current state of the customer when the order was created, I would stored it to the order.
There are always more complex data that that need to be available of almost every point in time. There are two approaches: you create a history of them, or you implement a revision control system, which is almost the same.
The history means that every state that ever existed is stored as a separate record, in the same or another table.
I implemented a revision control system, where I split records into two tables, one for the actual item, lets say a product, and the other one for its versions. This way I can reference the product as a whole, or any specific version of it, because both have its own primary key.
This system is used for many entities. I can safely reference an object under revision control from audit trail for instance or other immutable records. At the beginning it seems to be more complex to have such a system, but at the end it is very straight forward and solves many problems at once.
Storing the price in both the Product table and the OrderItem table is NOT denormalizing if the price can change over time. Normalization rules say that every "fact" should be recorded only once in the database. But in this case, just because both numbers are called "price" doesn't make them the same thing. One is the current price, the other is the price as of the date of the sale. These are very different things. Just like "customer zip code" and "store zip code" are completely different fields; the fact that both might be called "zip code" for short does not make them the same thing. Personally, I have a strong aversion to giving fields that hold different data the same name because it creates confusion. I would not call them both "Price": I would call one "Current_Price" and the other "Sale_Price" or something like that.
Not keeping the price at the time of the sale is clearly wrong. If we need to know this -- which we almost surely do -- than we need to save it.
Duplicating the entire product record for every sale or every time the price changes is also wrong. You almost surely have constant data about a product, like description and supplier, that does not change every time the price changes. If you duplicate the product record, you will be duplicating all this data, which definately IS denormalization. This creates many potential problems. Like, if someone fixes a spelling error in the product description, we might now have the new record saying "4-slice toaster" while the old record says "4-slice taster". If we produce a report and sort on the description, they'll get separated and look like different products. Etc.
If the only data that changes about the product and that you care about is the price, then I'd just post the price into the OrderItem record.
If there's lots of data that changes, then you want to break the Product table into two tables: One for the data that is constant or whose history you don't care about, and another for data where you need to track the history. Like, have a ProductBase table with description, vendor, stock number, shipping weight, etc.; and a ProductMutable table with our cost, sale price, and anything else that routinely changes. You probably also want an as-of date, or at least an indication of which is current. The primary key of ProductMutable could then be Product_id plus As_of_date, or if you prefer simple sequential keys for all tables, fine, it at least has a reference to product_id. The OrderItem table references ProductMutable, NOT ProductBase. We find ProductBase via ProductMutable.
I think Denormalization is the way to go.
Also, Product should not have price (when it changes from time to time & when price mean different value to different people -> retailers, customers, bulk sellers etc).
You could also have a price history table where it contains ProductID, FromDate, ToDate, Price, IsActive - to maintain the price history for a product.

Resources