Same Fact Table Column; Records with Multiple Reasons - sql-server

I am in a situation similar to the one below:
Think for instance we need to store customer sales in a fact table (under a data warehouse built with dimensional modelling). I have sales, discounts related to the sale, sales returns and cancellations to be stored.
Do you think it would be advisable to store sales for a day to a customer in a particular product (when the day is the grain) as a positive value while the returns and discounts are stored as minuses?
Also if a discount is enforced to a customer at a level other than the product (for instance brand), do you think it is alright to persist it with a key particularly assigned to the brand (product is the grain) while the product column being given an N/A, for the particular record?
Thanks in advance.

If your sales are considered a good thing (I'm assuming they are) then recording sales as positive numbers makes perfect sense. Any transaction that reduces sales (i.e. discounts and returns) should therefore be recorded as negative numbers. This will make reporting your sales very natural.
If you have diffent dimensions that might account for a record, you should populate the dimensions that make sense. So yes, attribute a discount to a brand rather than a product if that is what happened in your business transaction. This way your reporting will be able to look at all discounts, at discounts for particular products and discounts for entire brands. If your fact table shows the most direct "cause" of the discount (product or brand) then your reports will be more useful than if you link the fact to brand through a relationship to product.

Related

Redundant relation: Is this a violation of database normalization?

I have a table with products that I offer. For each product ever sold, an entry is created in the ProductInstance table. This refers to this instance of the product and contains information such as the next due date (if the product is to be billed monthly) and other information relevant to this instance (e.g. personal branding).
For understanding: The products are service contracts. The template of the contract is stored in the product table (e.g. "Monthly lawn mowing"). The product instance is then e.g. "Monthly lawn mowing in sample street" and contains information like the size of the garden or something specific to this instance of the service instead of the general product.
An invoice is created for a product instance either one time or recurring. An Invoice may consists of several entries. Each entry is represented by an element in the InvoiceEntry table. This is linked to the ProductInstance to create the reference to the invoice.
I want to extend the database with purchase orders. To do this, a record is created in the Order table. This contains a relation to the customer and e.g. the order date. The single products of the order are mapped by an OrderEntry. The initial invoice generated for the order is linked via the field "invoice_id" in the table order. The invoice items from the initial order are created per OrderEntry and create one InvoiceEntry each. However, I want the ProductInstance to be created only after the invoice is paid. Therefore the OrderEntry has a relation to the product and not only to the ProductInstance. Once the order has been created, the instance is created and linked to the OrderEntry.
I see the problem that the relation between Order and Invoice is doubled: once Order <-> Invoice and once Order <-> OrderEntry <-> InvoiceEntry <-> Invoice.
And for the Product: OrderEntry <-> Product and OrderEntry <-> ProductInstance <-> Product.
Model of the described database
My question is if this "duplicate" relation is problematic, or could cause problems later. One case that feels messy to me is, what should I do if I want to upgrade the ProductInstance later (to an other product [e.g. upgrade to bigger service])? The order would still show the old product_id but the instance would point to a new product_id.
This is a nice example of real-life messy requirements, where the 'pure' theory of normalisation has to be tempered by compromises. There's no 'slam-dunk right' approach; there's some definitely 'wrong' approaches -- your proposed schema exhibits some of those. I suspect there's not even a 'best' approach. Thank you for expanding the description of the business context -- especially for the ProductInstance table.
But still your description won't support legally required behaviour:
An invoice is created for a product instance either one time or recurring. An Invoice may consists of several entries. Each entry is represented by an element in the InvoiceEntry table.
... I want the ProductInstance to be created only after the invoice is paid.
An invoice represents an indebtedness from customer to supplier. It applies at one date only, not "recurring". (So leaving out the Invoice date has exactly got in the way of you "thinking about relations".) A recurring or cyclical billing arrangement would be represented by something like a 'contract' table, from which an Invoice is generated by some scheduled process.
Or ... your "recurring" means the invoice is paid once up-front for a recurring service(?) Still you need an Invoice date. The terms of service/its recurrence would be on the ProductInstance table.
I can see no merit in delaying recording the ProductInstance 'til after invoice payment. Where are you going to hold the terms of service in the meantime? If you're raising an invoice, your auditors/the statutory authorities will want you to provide records of what the indebtedness relates to. Create ProductInstance ab initio and put a status on it. (Or in the application look up the Invoice's paid status before actually providing the service.)
There's something else about Invoices you're currently failing to capture -- and that has also lead you to a wrong design: in general there is more making up the total $ value of an invoice than product lines, such as discounts applying to the invoice overall rather than particular products; delivery charges; installation costs or inspection/certification; taxes (local/State/Federal).
From your description perhaps the only one applying is taxes. ("in this world nothing can be said to be certain, except death and taxes.") And taxes are not specific to products/no product_instance_id is applicable on an InvoiceEntry.
For this reason, on ERP schemas in general, there is no foreign key declared from InvoiceEntry to Product/Instance. (In your case you might get away with product_instance_id being nullable, but yeuch.) There might be a system-generated XRef text column, which contains different content according to what the InvoiceEntry represents, but any referencing can't be declared to the schema. (There might be a 'fully normalised' way to represent that with an auxiliary linkage table, but maintaining that in step adds too much complexity to the application.)
I see the problem that the relation between Order and Invoice is doubled: once Order <-> Invoice and once Order <-> OrderEntry <-> InvoiceEntry <-> Invoice.
Again think about the business sequence of operations that generate these records: ordering happens as a prelude to invoicing. You can't put an invoice_id on Order, because you haven't created the Invoice yet. You might put the order_id on Invoice. But here you're again in the situation that not all Invoices arrive via Orders -- some might be cash sales/immediate delivery. (You could make order_id nullable, but yeuch.) For this reason on ERP schemas in general, there is no foreign key declared from Invoice to Order, etc, etc.
And the same thinking with OrderEntry <-> InvoiceEntry: your proposed schema has the sequencing wrong/the reference points the wrong way. (And not every InvoiceEntry will have corresponding OrderEntry.)
On OrderEntry, having all of (OrderEntry)id and product_id and product_instance_id seems to me to give you way too many opportunities for tangling it all up. Can an Order have multiple Entrys for the same product_id? -- why/how? Can it have multiple Entrys for the same product_instance_id? -- why/how? Can there be a product_instance_id which refers to a different product_id than OrderEntry.product_id? This is exactly the sort of risk for confusing entanglement that normalisation aims to remove/reduce.
The customer is ordering a ProductInstance: mowing a particular size of garden at a particular address, fortnightly on a Tuesday afternnon. So OrderEntry.product_instance_id is what you want; .product_id is wrong. So (again) you need to create ProductInstance at time of recording the Order. Furthermore I strongly suspect you don't need an id on OrderEntry; instead give it a compound key (order_entry_id, product_instance_id). [**]
[**] I see you're using 'eloquent'. I suspect this is requiring id on every table. So you're not even using a relational database, this is some sort of Object-Relational hybrid. Insisting on a dedicated single id as key on every table is toxic. It has lead schema designers astray every time I get called in to help -- as here. Please if you can at all avoid it, don't do that.

How do I persist sales price in an orders details table?

I'm writing a simple transactional database to practice my T-SQL skills.
If I sell an umbrella in my sales.orderdetails table and it's getting the current retailprice of that umbrella from the items table and putting it in the invoice, how do I keep from having incorrect historical report data 6 months from now when I jack up the retail price of the umbrella by $10?
How do i store that umbrella sold price in the orderdetails table so it's unaffected by any changes in the items table in the future?
I know you can use an SCD for a datawarehouse for this kind of issue but was wondering how to do it in an OLTP system. Computed persisted column? Can't seem to get that to work in the object explorer when I try to enter the items.retailprice as the computed value for the salesorderdetails.cost column.
The way I have seen this done in the past, without using a technique like SCD, was to have the order detail have the price that was charged and then use a foreign key to another table, possibly products or productprices, that contains the current price.
In a full-on transactional system, you'd want the order detail row to record full retail (MSRP, or what have you), current price (in case you had the item posted at a discount that day), and price charged (in case the customer used a promo/coupon code to reduce the price themselves). Unless you log all three, you're at the mercy of whatever the price changes to tomorrow or next week or next year, which makes for bad analytics.
You probably also want to capture current cost of goods, too, since that's subject to change over time, especially in an average costing scenario. Otherwise, margin calculations will be suspect.
But then, yes, a foreign key or keys to any other descriptive tables for those less ephemeral characteristics of the product.

How to relate a product dimension with a sales fact

I have been studying datawarehouse in the last couple days, particularly, i have been reading The Data Wharehouse Toolkit - The Definitive Guide to Dimensional Modeling by Kimball and Ross.
Uppon that reading, i came to the 1st exapmle where there is a sales fact and it related to a product dimension, as you can see in the bellow image:
I think i can grasp the gist of how this relationship allows us to rotate the "cube" slicing and dicing data, however this is where i get lost:
In this example and many others on the web product is a one-to-one relationship with sales, which is fine i guess for most cases. But this generates a sales registtry for at least each kind of product that was in one sale.
So supposing i bought 1 banana, 2 apples and 1 orange, this would yield at least 3 sales registry. Again, which is fine i guess as it is storing the sale's ticket ID in the sales fact, we still can relate all itens in a given sale.
However if this was an use case: relate products on sales say i want to get every sale that had a banana and get stuff like: how many items each of these sales had, their price cost, their profit, stuff like that...
Wouldn't be better if the fact-product relation were Fact-one_to_many-Product relationship? Where fact would hold the sale's ticket ID and products would have its foreign key referencing where they are from or something?
I reckon these metrics should be in the fact table, and not in the product table as i think i would want. So, is this me not fighting my urge to normalize it or does it make sense in the way i would want to do that kind of filtering -> [given all sales with X product, get data from other products in the same sale].
If i were to follow the guidelines, product dimension would have one registry for every exclusive kind of product the store would have correct? And all the measurements i want i would store it on the fact itself, like price cost, sales price, profit, etc...
On the other hand, if i were to one-to-many product dimension would have many copies of each product. Which is bad, i think. However, i think it would give me better queries in that regard.
As you can see, i'm a beginer and really in the early stages of this path, so if you would endulge me in a Explain Like I'm Five kind of answer I would appreciate.
EDITED:
Sorry #Nick.McDermaid, you are right. I meant from the perspective of the sales fact where for every sale fact i will have only one product, but are correct that for one product it can have N sales related. And so, we have one record of product in the database for every different product on our store. This is the right way to do it, how to rightfully model it. Also, the many indicator is the "sales quantity" i'm guessing.
Anyhow, while this allows for slicing and dicing when/if we have sales as the point of view, but what if i want to for example:
Get all sales that had a banana in it, with all the other items in those sales. We can still do it with this structure but its harder than if the products were repeated and we had the sale id as a foreign key in the product table.
Cuz ultimetly i want to get all the sales(and products within that sale) that had a banana. And then take metrics out of them.
What you are somewhat hinting at would be a degenerate dimension, consisting of the sales id/invoice #/purchase order # of the transaction that took place. The whole purpose of a degenerate dimension is to group items that are related by a meaningless piece of data. For example, a PO # of A1234 is meaningless on its own, it doesn't tell you anything about the purchase. However, it can be used to identify other meaningful data, such as the date of purchase of the products for the customer. In that context, the PO # is defined by the collection of the entities it brings together to describe an event.
Another critical concept in data-warehousing is the abstraction of the schema in the database from the model in the cube. You don't join and group data in a cube model. You slice and filter. There are no foreign keys in a cube model. Those are used in the underlying data schema, but all of that work is handled behind the scenes of the cube model.

database normalization of a table

Let's consider I have the following not normalized table
1) warehouse
id
item_id
residual
purchase cost
sale cost
Currency
I tried to normalize this and I obtained this tables:
1) warehouse table
id
product_id
residual
cost_id
2) costs table
id
purchase cost
sale cost
Currency
Does that comply with database normal forms?
Thanks much in advance!!!
This should be a comment, but it's too verbose.
There's not enough information to provide an answer - we have to infer structure from context - and the context is confusing. Your initial record looks like a description of a product to be bought and sold - but you've named it as warehouse - which is a place for storing products. I've no idea what you mean by residual. Do you have multiple purchase costs for a specific product? If so how are they differentiated. Similar for sale cost. If ther are multiple costs involved why is the selling price tied to the purchase cost?
I don't know what "residual" means in this context. But just ignoring that ...
I doubt that there's anything to be gained by breaking cost out into a separate table. Let's say we have two products, "toaster model 14" and "men's shirt style X7". Both have a cost of $12. So you create a cost record for $12, and point both records to this. Then you realize that you made a mistake and the toaster really cost $13. So you update the cost record. But that will then update the cost for the shirt also, which is almost surely wrong. Having a separate cost table would mean that you would always create a new cost record every time you created a stock record. Nothing is gained.
The fields you have listed look to me like they all belong in one table. You'd also need an item table that would have data like the description, maybe manufacturer, product specs, etc.
Your warehouse table appears to really be a stocked item table, as it lists items and not warehouses, but whatever. I suspect it also needs some sort of serial number, or how will you link a given physical item in the warehouse to the corresponding record?
If by "sale cost" you mean the price that you will charge to the customer when you sell it, I doubt this belongs in the warehouse table. When a customer buys a product, do you tell him, "I can sell you the one that's in bin 40 in the warehouse for $20 or the one that's in bin 42 for $22. Which do you want?" Probably not. I suspect you charge the same price regardless of which particular unit the customer gets. The fact that the price you have to pay to your supplier went up between when you bought the first one and when you bought the second one normally does not mean that you will charge your customer a different price. You may raise the price, but you will have one price regardless of which unit is sold. Therefore, the selling price goes in the item table, not the warehouse table. If "sale cost" is something else, maybe this whole paragraph is irrelevant.

what are the differences between inventory and products in a POS system?

I'm trying to make the database system point of sale, however I am confused between the entity and product inventory entity. What are the differences between product and inventory?
I know that the inventory should control the amount of product available .... but i have all that in products.
product code
name
description
cost
unit price
Subcategory code
brand code
amount available
Minimum quantity for rehearing
state
tax code
weight
amount wholesales
wholesales price
perishable
due date
creation date
upgrade date
what i should have in inventory? I have researched and according to what I read I need to have the product, the description, the quantity, purchase price, sale price, profit or gain and date of the transactions. But almost everything is in the Products table, what should I do?
A Product is an abstract Good or Service. A Good is the specification of an Asset.
Example "2014 Mazda 3" is a good. A "2014 Mazda 3 with VIN 12345" is an Asset.
A Catalog is the list of products that you want to sell. They don't need to exist yet, or you could be selling them for someone else.
Items Held For Sale are assets that you keep around to sell. These could be consigned (owned by someone else).
Inventory is an accounting concept. It is the dollar value of the items held for sale that you own, plus inbound and outbound goods that you're responsible for, plus any costs associated with holding that inventory.
You can track the value of inventory in a variety of ways such as FIFO and LIFO
I think you can store the inventory in the products table. There will certainly be transaction tables for purchases for the products and sales and even adjusting records (when items get count and the number differs from what's stored in the database), but you can easily work with the stock stored in the production table itself, thus not having to scan the whole database and sum up all purchases and sales and corrections every time (and never being able to delete old transaction data from the database, as that would invalidate the calculations).
However there are reasons to have stock stored in an inventory table instead. For instance if you want to store different statusses, e.g. you have 100 pieces in store plus twenty just arrived and still unchecked. Or you have a store with goods plus a warehouse housing additional stock. Or you have charges (different model numbers for example for a slightly altered product) which you offer as the same product, but still want to know how many old and how many new ones are in stock. And so on.
So make your mind up, if you want to store additional data with product stock, which would result in an 1:n relation instead of 1:1 which you have now.

Resources