database design scenario - database

What's the right way to do this:
I have the following relationship between entities RAW_MATERIAL_PRODUCT and FINISHED_PRODUCT: A FINISHED PRODUCT has to be made of one ore more Raw Material Products and a Raw Material Product may be part of a Finished Product.( so a Many-Many). I have the intersection entity which i called ASSEMBLY that tells me exactly of what Raw Material Products is a Finished Product made of.
Good. Now i need to sell the Finished Products and compute the production cost. PRODUCT_OUT entity comes in, which can contain only one FINISHED PRODUCT and a FINISHED PRODUCT may be part of multiple PRODUCT_OUT.
It would be easy if, for example, Finished Product A was always made of 3 pieces of Raw Material Product a1, 2 of a2 etc. Problem is that the quantities may change.
The stock of a Raw Material Product is computed as
TotalIn - TotalOut
so i can't put quantity Attribute in ASSEMBLY because i would get incorrect data when calculating the Stock. (if quantites are changed)
My only idea is to give up to FINISHED_PRODUCT entity and make a join between PRODUCT_OUT and RAW_MATERIAL_PRODUCT with the intersection entity containing a quantity attribute. But this seems kind of stupid because almost all the time a FINISHED_PRODUCT is made of the same RAW_MATERIAL_PRODUCTS.
Is there a better way?

I'm not 100% sure I understand, but it sound like essentially the recipe can change, and your model needs to account for this?
But this seems kind of stupid because almost all the time a
FINISHED_PRODUCT is made of the same RAW_MATERIAL_PRODUCTS.
Almost all the time, or all the time? I think that's a pretty critical question.
It seems to me that when you change the recipe, you should create a new FINISHED_PRODUCT row, which has a different set of RAW_MATERIAL_PRODUCTS based on the association in the ASSEMBLY table.
If you want to group differnt recipies of the same FINISHED_PRODUCT together (kind of like versioning!), create a FINISHED_PRODUCT_TYPE table with a 1:m relationship to the FINISHED_PRODUCT table.
Edit (quote from comment):
I totally agree with you it should be a different product but if i add
one screw to a product i can't really name it Product A with 1 extra
screw. And it seems this can happen. I didn't quite get the use of
creating a FINISHED_PRODUCT_TYPE table. Could you please explain?
Sure. So your FINISHED_PRODUCT_TYPE defines the name of the product, and possibly some other data (description, category, etc.). Then each row in FINISHED_PRODUCT is essentially a "version" of that product. So "Product A" would only exist in one place, a row in the FINISHED_PRODUCT_TYPE table, but there could be one or many versions of it in the FINISHED_PRODUCT table.

Related

Modeling Fact Tables that have direct relationships, but at a detail and not a dimension layer

This is very similar to my issue.
http://forum.kimballgroup.com/t2534-modeling-fact-tables-that-have-direct-relationships-but-at-a-detail-and-not-a-dimension-layer
I’ve got a fact table for POs, Supplier Invoices, Payments, Receipts, etc. They have some dimensions in common, others not. Problem is, for example, say if they are looking at invoices by their gl account, (using an excel pivot table connected to the cube) then they expect to be able drop in a column for the PO number, the buyer of the PO, etc. Even though the buyer dimension is only related to the PO, and the account dimension is only related to the invoice. But they say, well the PO is related to the invoice, so you should be able to pull it in.
I do have a PO Ref field on the invoice fact table, but it is only filled out 50% of the time. Even when it is, you could have a one to many relationship in either way between a PO and an invoice, as far as I understand it at least.
Anyway, they expect to be able to throw in any measure from any measure group, and every single possible dimension to work, and then be able to drill down to the detail to see the POs, Invoices, Payments and Receipts and how they match up. Best practice is to keep the fact tables separate if they are different grains according to Kimball, but then all the business problems aren't solved this way.
The only solutions I can come up with are:
to either tack on a bunch of detail related columns to the degenerate dimensions when I load them. i.e. add PO to invoice and invoice to PO etc., but have it as a comma separated list in that column when it is many to one.
Create every possible relationship with every fact and dimension table. This would be a lot of work though, and some still may not have a relationship to certain dimensions.
Create a monstrous fact table with all the current ones joined together, and somehow figure out logic to only display the measure values once for the many to one joins.
This is probably a bad idea, but thought maybe somehow I could create a relationship between every measure group and the corresponding degenerate dimensions reference field. Like create a relationship between the supplier invoice degenerate dimension PO Ref field and the purchase order line measure group PO field.
Lower their expectations, lol.
Here's a screen shot of the dimension usage tab to give an idea of what it looks like currently.
I tried option 3 once. The performance was terrible. The output was misleading. Never ever again.
Your best bet is to work with the business. Where the data is not readily available (invoice without PO, for example) agree what should be done. You could show a default value (PO not recorded on invoice). You could agree on a logic, implemented in the ETL, that extracts the most likely PO.
Whatever approach you choose you must discuss it. If you do not the business will make decisions based on false assumptions. The business will find itself looking at reporting it does not understand. You must help your users to avoid these outcomes.
Once the approach has been agreed, document it. When queries arise, share the documentation. Make sure the documentation highlights all calculations, difficulties and missing source data.
Work with the teams that generate your source date. If an important field is sparsely populated arrange a meeting. See if the capture processes can be improved. Let your users know that you are investigating this area. Keep them informed of the outcome. If the source data cannot be improved (invoices continue to be raised without a PO), inform your users of the reasons for this.
Managing your customers can be challenging. Especially those who hold senior positions in the company. Transparency and solid documentation will help you.

I'm unable to normalize my Product table as I have 4 different product types

So because I have 4 different product types (books, magazines, gifts, food) I can't just put all products in one "products" table without having a bunch of null values. So I decided to break each product up into their own tables but I know this is just wrong (https://c1.staticflickr.com/1/742/23126857873_438655b10f_b.jpg).
I also tried creating an EAV model for this (https://c2.staticflickr.com/6/5734/23479108770_8ae693053a_b.jpg), but I got stuck as I'm not sure how to link the publishers and authors tables.
I know this question has been asked a lot but I don't understand ANY of the answer's I've seen. I think this is because I'm a very visual learner and this makes it hard to understand what's being talked about when not a lot of information is given.
Your model is on the right track, except that the product name should be sufficient you don't need Gift name, book name etc. What you put in those tables is the information that is specific to the type of product that the other products don't need. The Product table contains all the common fields. I would use productid in the child tables rather than renaming it giftID, magazineID etc. It is easier to remember what things are celled when you are consistent in nameing them.
Now to be practical, you put as much as you can into the product table especially if you are going to do calculations. I prefer the child tables in this specific case to have what is mostly display information. So product contains the product name, the cost, the type of product, the units the product is sold in etc. The stuff that generally is needed to calculate the cost of an order or to have a report of what was ordered. There may be one or two fields that can contain nulls, but it simplifies the calculation type queries so much it might be worth it.
The meat of the descriptive details though would go in the child table for the type of product. These would usually only be referenced when displaying the product in the shopping area and only one at a time, so you can use the product type to let you only join to the one child table you need for display. So while the order cares about the product number and name and cost calculations, it probably doesn't need to go line by line describing the book ISBN number or the megapixels in a camera. But the description page of the product does need those things.
This approach is not purely relational, although it mostly is, but it does group the information by the meanings of the data and how they will be used which will make the database easier to understand and query. I am a big fan of relational tables because database just work better when they hit at least the third normal form but sometimes you can go too far for practicality, so the meaning of the data and the way you are grouping to use the data (and not just for the user interface, but for later reporting as well) is almost always one of my considerations in design.
Breaking each product type into its own table is fine - let the child tables use the same id as the parent Product table, and create views for the child tables that join with Product
Your case is a classic case of types and subtypes. This is often called class/subclass in object modeling and generalization/specialization in ER modeling. It's a well understood pattern. There are known techniques for dealing with this pattern.
Visit the following tabs, and read the description under the info tab (presented as "learn more"). Also look over the questions grouped under these tags.
single-table-inheritance class-table-inheritance shared-primary-key
If you want to rean in more depth use these buzzwords to search for articles on the web.
You've already discovered and discarded single table inheritance on your own. Other answers have pointed you at shared primary key. Class table inheritance involves a single table for generalized data as well as the four specialized tables. Shared primary key is generally used in conjunction with class table inheritance.

Table with hierarchy or multiple tables?

I am designing database model for some application, and I have one table Post which belong to some category. OK, Category will logically be other table.
But, more categories belong to some super category or domain or area, and my question is next:
Whether create other table for super categories or domains, or to do this hierarchy in table Category with some combination of key to point to parent.
I hope I was clear with problem?
PS.I know that I can do this problem with both solution, but is there any benefits with using first over second solution, and contrary.
Thanks
It depends: if nearly each category has a parent, you could add a parent serial as a column. Then your category table will look like
+--+----+------+
|ID|Name|Parent|
+--+----+------+
The problem with this representation is that, as long the hierarchy is not cyclic, some categories will have no parent. Furthermore a category can only have one parent.
Therefore I would suggest using a category_hierarchy table. An additional table:
+-----+------+
|Child|Parent|
+-----+------+
The disadvantage of this approach is that nearly each category will be repeated. And therefore if nearly all categories have parents, the redundancy will approximately scale with that number. If relations however are quite sparse, one saves space. Furthermore using an intelligent join will prevent the second representation from taking long execution times. You can for instance define a view to handle such requests.
Furthermore there are situations where the second approach can improve speed. For instance if you don't need the hierarchy all the time (for instance when mapping serials to the category-name), lookups in the category table can be faster, simply because the table is more compact and thus more parts of the table will be cached.

Modeling A Food Recipes Database

I'm trying to design a "recipe box" database and I'm having trouble getting it right. I have no idea if I'm on the right track or not, but here's what I have.
recipes(recipeID, etc.)
ingredient(ingredientID, etc.)
recipeIngredient(recipeID, ingredientID, amount)
category(categoryID, name)
recipeCategory(recipeID, categoryID, name, etc.)
So I have a couple of questions.
How am I doing so far? Is this design okay from what you all know?
How would I implement the preparation steps? Should I create an additional many-to-many implementation (something like preparation(prepID, etc.) and recipePrep(recipeID, prepID)) or just add the directions in the recipes table? I would like this to be an ordered list in the UI (webpage).
Thank you for your help.
Have you looked at any of the existing schemas out there, such as this one at DatabaseAnswers?
some thoughts:
You might want to use the same table for Recipe and Ingredient, with a type indicator column. The reason is that Recipes can contain sub-recipes. Let's call the combined table "Item". Then your RecipeIngredient table would look like
RecipeIngredient (RecipeId, ItemId, Amount).
I'd expect that the table would also have a sequencing column.
If you want to do any calculations with these recipes (e.g., scaling, nutritional analysis, production planning) then your quantities will need to specify a unit of measure. You can do that explicitly (by having a separate column for uofm) or you can use a text field for quantity and expect the user to enter values like "1 cup", or "2 tbs". If you take that approach, you'll need to make sure that what they enter is recognizable, and parse it every time you need to use it. This can become surprising complex, especially if you want to represent recipe yields in a formalized manner.
Assuming you want 1:M from recipe to category, I'm still not sure why your RecipeCategory table would have a Name column. I'd think that the name comes from the Category definition.
I agree with Dave that it's unlikely that you'd reuse preparation steps from recipe to recipe, and so a RecipePreparationSteps table (or something like it) would be more appropriate.
However, recipes are often presented with ingredients and instructions intermixed. eg.
Intro text
some ingredients.
prep instructions
some more ingredients
baking instructions.
To accomodate that, you need to cleverly set sequencing values in the RecipeIngredient and RecipePreparation step tables so that you can combine data from both in the proper order for presentation. Another approach would be, instead of these two tables, use a "RecipeLine" table such that each row can represent either an instruction OR an ingredient. I think that may be what you were suggesting. Purists would frown on this kind of table overloading, but I'm not a purist.
This is a topic I happen to know a lot about, so ask anything.
Looks like a good start. A few thoughts:
Don't see a need for a recpieCategory table. One-to-many between recipe and category should do fine.
A PreparationSteps table should contain 1-n steps for each recipe. I wouldn't try to reuse steps between recipes.

How to handle an immutable table referencing mutable tables?

In making a pretty standard online store in .NET, I've run in to a bit of an architectural conundrum regarding my database. I have a table "Orders", referenced by a table "OrderItems". The latter references a table "Products".
Now, the orders and orderitems tables are in most aspects immutable, that is, an order created and its orderitems should look the same no matter when you're looking at the tables (for instance, printing a receipt for an order for bookkeeping each year should yield the same receipt the customer got at the time of the order).
I can think of two ways of achieving this behavior, one of which is in use today:
1. Denormalization, where values such as price of a product are copied to the orderitem table.
2. Making referenced tables immutable. The code that handles products could create a new product whenever a value such as the price is changed. Mutable tables referencing the products one would have their references updated, whereas the immutable ones would be fine and dandy with their old reference
What is your preferred way of doing this? Is there a better, more clever way of doing this?
It depends. I'm writing on a quite complex enterprise software that includes a kind of document management and auditing and is used in pharmacy.
Normally, primitive values are denormalized. For instance, if you just need a current state of the customer when the order was created, I would stored it to the order.
There are always more complex data that that need to be available of almost every point in time. There are two approaches: you create a history of them, or you implement a revision control system, which is almost the same.
The history means that every state that ever existed is stored as a separate record, in the same or another table.
I implemented a revision control system, where I split records into two tables, one for the actual item, lets say a product, and the other one for its versions. This way I can reference the product as a whole, or any specific version of it, because both have its own primary key.
This system is used for many entities. I can safely reference an object under revision control from audit trail for instance or other immutable records. At the beginning it seems to be more complex to have such a system, but at the end it is very straight forward and solves many problems at once.
Storing the price in both the Product table and the OrderItem table is NOT denormalizing if the price can change over time. Normalization rules say that every "fact" should be recorded only once in the database. But in this case, just because both numbers are called "price" doesn't make them the same thing. One is the current price, the other is the price as of the date of the sale. These are very different things. Just like "customer zip code" and "store zip code" are completely different fields; the fact that both might be called "zip code" for short does not make them the same thing. Personally, I have a strong aversion to giving fields that hold different data the same name because it creates confusion. I would not call them both "Price": I would call one "Current_Price" and the other "Sale_Price" or something like that.
Not keeping the price at the time of the sale is clearly wrong. If we need to know this -- which we almost surely do -- than we need to save it.
Duplicating the entire product record for every sale or every time the price changes is also wrong. You almost surely have constant data about a product, like description and supplier, that does not change every time the price changes. If you duplicate the product record, you will be duplicating all this data, which definately IS denormalization. This creates many potential problems. Like, if someone fixes a spelling error in the product description, we might now have the new record saying "4-slice toaster" while the old record says "4-slice taster". If we produce a report and sort on the description, they'll get separated and look like different products. Etc.
If the only data that changes about the product and that you care about is the price, then I'd just post the price into the OrderItem record.
If there's lots of data that changes, then you want to break the Product table into two tables: One for the data that is constant or whose history you don't care about, and another for data where you need to track the history. Like, have a ProductBase table with description, vendor, stock number, shipping weight, etc.; and a ProductMutable table with our cost, sale price, and anything else that routinely changes. You probably also want an as-of date, or at least an indication of which is current. The primary key of ProductMutable could then be Product_id plus As_of_date, or if you prefer simple sequential keys for all tables, fine, it at least has a reference to product_id. The OrderItem table references ProductMutable, NOT ProductBase. We find ProductBase via ProductMutable.
I think Denormalization is the way to go.
Also, Product should not have price (when it changes from time to time & when price mean different value to different people -> retailers, customers, bulk sellers etc).
You could also have a price history table where it contains ProductID, FromDate, ToDate, Price, IsActive - to maintain the price history for a product.

Resources