Say I have a product with attributes - id,name,UPC,price
You can see it as a database table 'product'.
Now I'm having issues wrapping my head around this problem.
Say this same product can be in the UK so the - name, UPC, price might be different.
It would make sense to do one product table for USA, and another for the UK but it sounds redundant. Is there a better solution?
You could do a price table where the key is product id and a location. You could add other dimensions in there as well. e.g. maybe some prices are sensitive to time of year?
Related
I'm writing a simple transactional database to practice my T-SQL skills.
If I sell an umbrella in my sales.orderdetails table and it's getting the current retailprice of that umbrella from the items table and putting it in the invoice, how do I keep from having incorrect historical report data 6 months from now when I jack up the retail price of the umbrella by $10?
How do i store that umbrella sold price in the orderdetails table so it's unaffected by any changes in the items table in the future?
I know you can use an SCD for a datawarehouse for this kind of issue but was wondering how to do it in an OLTP system. Computed persisted column? Can't seem to get that to work in the object explorer when I try to enter the items.retailprice as the computed value for the salesorderdetails.cost column.
The way I have seen this done in the past, without using a technique like SCD, was to have the order detail have the price that was charged and then use a foreign key to another table, possibly products or productprices, that contains the current price.
In a full-on transactional system, you'd want the order detail row to record full retail (MSRP, or what have you), current price (in case you had the item posted at a discount that day), and price charged (in case the customer used a promo/coupon code to reduce the price themselves). Unless you log all three, you're at the mercy of whatever the price changes to tomorrow or next week or next year, which makes for bad analytics.
You probably also want to capture current cost of goods, too, since that's subject to change over time, especially in an average costing scenario. Otherwise, margin calculations will be suspect.
But then, yes, a foreign key or keys to any other descriptive tables for those less ephemeral characteristics of the product.
I have two tables. 'Products' and 'Discounts'.
Then I create a joining table 'discount_product' for Many-to-many relationship. So far so good.
Now if I want a discount to belong to ALL of the products I have to make insertions into the joining table for as many products I have. That means that having 10000+ products I'll have to insert 10000+ rows for one discount into the joining table? And that's only for one discount! What if I have 1000?
That's compelling me into returning to the old (wrong) way of doing it when I just have a column 'product_ids' in the 'Discounts' table with something like this '1|2|4|7|23|...' (or '*' for 'belongs to all') and then make a small piece of PHP code to check if discount belongs to all or to some products. I know it's wrong way of doing it. So is there a better way to make this properly?
Structure:
**products**
id
description
price
**discounts**
id
procent
value
**discount_product**
product_id
discount_id
I propose to try to change some business logic.
If the discount is not in the discount_product then this means that it applies to all products.
If the discount is in the discount_product then it means that it works only for a certain product.
If you need to ensure that the discount is not applied to any product, add the field is_active in discounts.
It's just my thoughts.
I believe that sometimes it is useful to denormalize the database because of optimization, and I would do as you suggested with the product_ids field.
I have found org tables to be very powerful and useful. I feel like I have movement, table restructuring and basic formulas down fairly well. But I am having a difficult time wrapping my head around how I should structure this for tracking large collections. Not sure if I can do this in one table or if I need multiple tables.
Say I have a business that buys and sells trading cards. There are baseball, basketball and football cards. I want to track purchase price, sale price, purchase date, sale date, average sale price, last sale price, quantity in stock, and item condition for every card sold or in stock.
Is it possible to do this in a single table or do I need multiple tables?
I'd like to track statistics such as:
"What is the average price of all football cards sold in the last six months?"
"In the last month, did I buy more basketball cards or baseball cards?
And for a more lengthy example:
"Last year I sold 4 Mickey Mantle cards. 2 in Mint condition, 1 in Excellent condition, 1 in Poor condition and 1 unsold. What percentage of Mint Mickey Mantle cards were sold last year?"
To reiterate, in org-mode can all this be accomplished within a single table? How would it be structured if say, you knew Tops only made 2000 unique cards in a particular year, would table only contain 2000 rows? (plus the header)
If it can't be accomplished in a single table, I'm just going to use a postgres database structured much like the one mentioned here. I was really hoping there was a snazzy way to do this with org-table alone. But it looks like there are other ways to manipulate databases within emacs.
Sorry if most of this sounds like a high school math problem with no code but I'm sure most people (at least here) know what a single org table with the mentioned columns and a finite set of rows would look like.
Edit1: Can org references be used to link tables together to help get the results I'm looking for?
Edit2: The reason why I thought this was possible in org-mode, was because I did not think a foreign key was necessary. Here is a very similar example not using a foreign key. When reading about construction of spreadsheets in org-mode, foreign keys seemed to be the only obvious hurdle. Anyone have thoughts on this?
I have been studying datawarehouse in the last couple days, particularly, i have been reading The Data Wharehouse Toolkit - The Definitive Guide to Dimensional Modeling by Kimball and Ross.
Uppon that reading, i came to the 1st exapmle where there is a sales fact and it related to a product dimension, as you can see in the bellow image:
I think i can grasp the gist of how this relationship allows us to rotate the "cube" slicing and dicing data, however this is where i get lost:
In this example and many others on the web product is a one-to-one relationship with sales, which is fine i guess for most cases. But this generates a sales registtry for at least each kind of product that was in one sale.
So supposing i bought 1 banana, 2 apples and 1 orange, this would yield at least 3 sales registry. Again, which is fine i guess as it is storing the sale's ticket ID in the sales fact, we still can relate all itens in a given sale.
However if this was an use case: relate products on sales say i want to get every sale that had a banana and get stuff like: how many items each of these sales had, their price cost, their profit, stuff like that...
Wouldn't be better if the fact-product relation were Fact-one_to_many-Product relationship? Where fact would hold the sale's ticket ID and products would have its foreign key referencing where they are from or something?
I reckon these metrics should be in the fact table, and not in the product table as i think i would want. So, is this me not fighting my urge to normalize it or does it make sense in the way i would want to do that kind of filtering -> [given all sales with X product, get data from other products in the same sale].
If i were to follow the guidelines, product dimension would have one registry for every exclusive kind of product the store would have correct? And all the measurements i want i would store it on the fact itself, like price cost, sales price, profit, etc...
On the other hand, if i were to one-to-many product dimension would have many copies of each product. Which is bad, i think. However, i think it would give me better queries in that regard.
As you can see, i'm a beginer and really in the early stages of this path, so if you would endulge me in a Explain Like I'm Five kind of answer I would appreciate.
EDITED:
Sorry #Nick.McDermaid, you are right. I meant from the perspective of the sales fact where for every sale fact i will have only one product, but are correct that for one product it can have N sales related. And so, we have one record of product in the database for every different product on our store. This is the right way to do it, how to rightfully model it. Also, the many indicator is the "sales quantity" i'm guessing.
Anyhow, while this allows for slicing and dicing when/if we have sales as the point of view, but what if i want to for example:
Get all sales that had a banana in it, with all the other items in those sales. We can still do it with this structure but its harder than if the products were repeated and we had the sale id as a foreign key in the product table.
Cuz ultimetly i want to get all the sales(and products within that sale) that had a banana. And then take metrics out of them.
What you are somewhat hinting at would be a degenerate dimension, consisting of the sales id/invoice #/purchase order # of the transaction that took place. The whole purpose of a degenerate dimension is to group items that are related by a meaningless piece of data. For example, a PO # of A1234 is meaningless on its own, it doesn't tell you anything about the purchase. However, it can be used to identify other meaningful data, such as the date of purchase of the products for the customer. In that context, the PO # is defined by the collection of the entities it brings together to describe an event.
Another critical concept in data-warehousing is the abstraction of the schema in the database from the model in the cube. You don't join and group data in a cube model. You slice and filter. There are no foreign keys in a cube model. Those are used in the underlying data schema, but all of that work is handled behind the scenes of the cube model.
Using the snowflake schema image from wikipedia:
http://en.wikipedia.org/wiki/File:Snowflake-schema-example.png
Would it ever make sense to have a "Brand_Id" foreign key in Fact_Sales as you do in Dim_Product? There is a many-to-one relationship of sales/brands just like sales/products or products/brands, so is there any logical reason not to? You may want to join directly to the Dim_Brand table.
I'm probably not seeing something obvious.
The type of relationship you're looking at is a has-a relationship.
A product has a brand. A sale has a product; it's the thing that was sold. But a sale does not have a brand. Or, a better way of saying this, you cannot sell a brand. (don't read too far into that one...)
So, no, you wouldn't want to add brand to sales.
If you are working in a dimensional model (the Star/Snowflake schema note in your question makes be think you are), then adding the BRAND_ID to the sales fact makes sense from a performance perspective, if the questions that the business is trying to answer are "what were the sales for brand X across all products in this time frame".
It also may be useful if the product dimension is a Type 1 SCD, and a product changes brands. You may want to preserve the prior sales as being of the "old" brand.
Keep in mind you are not doing entity - relationship modeling when you build a star/snowflake reporting schema. Questions of is-a or has-a aren't pertinent to a dimensional model.
I think that would be nice as a way to cache the data... but in all honesty, your probably better off just relying on the links as they are.
The reasoning is that you already have that definition of what that table does, store sales. To add in what brand those products are that the store sold is going to muddy the 'topic' or 'theme' of that table, recording sales of a store.
Now if by some way you had a product that can be sold under different brands (heck if I know how a package can have split personalities...) then yea, it would make sense to a degree, but a more reasonable solution is to give each product it's own SKU code then.