I am trying to create a database to store my recipes. However, I am not sure how to implement it. I looked at other questions like this but they do not have the same focus as I.
I assume any dish is actually just an ingredient, which can then be used in other dishes, or in this case in other ingredients. Any ingredient may have multiple recipes. For now, each recipe indicates how much of each ingredient is needed, but I also want to know how these ingredients are combined without having a long text description of it.
For example, in text, I would describe one (very bad) scrambled eggs recipes like this:
Scrambled eggs:
Cooked for 5 minutes(
1g Butter,
Whisked(
1g Salt,
1g Pepper,
2 Eggs
)
and then Scrambled eggs could be used in another recipe as an ingredient.
But how would that translate in a database? I don't need that database to be SQL based since this is a personal project, but I don't know any other kind of databases so far.
I thought about defining an Ingredient, as having an optional Technique associated with it but that means Whisked(1g salt, 1g pepper, 2 eggs) would have to be an Ingredient. Which I guess could work and I could also make the name of ingredients optional, but it seems awkward.
I also thought about defining a Recipe as having multiple TransformedIngredients which would contain a Technique applied to many Ingredients but sometimes a Recipe contains raw, untransformed, Ingredients and sometimes TransformedIngredients would need to be applied to TransformedIngredient. From what I know of databases that wouldn't work.
PS: I stumbled onto a functional programming Tiramisu recipe which, though very much focused on the techniques, displays fairly well what I'm trying to implement for my database.
I think what's confusing is that there are two different things to think about with a recipe, 'Items' and 'Steps'.
One database structure that comes to mind for this is a Star Schema structure which separates these ideas nicely (into Dimension and Fact tables, respectively).
A quick description of each:
Dimension
"The state of something" i.e. a record is merely there to describe what the thing is. A customer's address table would be an example of a dimension table.
Fact
"Things changing over time" i.e. each record relates to a dimension table, but has changing values. An example would be shipped purchases from a website to a customer's address. The address stays the same, but the shipments are getting constantly added to the table.
This isn't to say that Dimension tables don't change, too; obviously new users sign up for websites all the time. In the above address example, if a customer were to change his address, a new primary key value would be added for the new address.
Now on to your recipe examples:
Imagine you're cooking something. I would put anything that you hold in your hands in a "dimension" table. For example: DIM_INGREDIENT (with columns such as INDREDIENT_ID, INGREDIENT_NAME), and DIM_AMOUNT (AMOUNT_ID, AMOUNT, UNITS) to describe the amounts. And DIM_ACTION (ACTION_ID, TYPE, LENGTH, UNITS) to describe the action. There are more you can come up with; these are a few to get started.
Any steps I'd be taking could go in a FACT_RECIPE_STEPS table that would map to all the dimension tables. Any step that doesn't have a logical step would have a null value (i.e. stir for 5 minutes would have null for INGREDIENT_ID).
The FACT_RECIPE_STEPS could look like this:
RECIPE_ID, RECIPE_STEP, ACTION_STEP_ID, INGREDIENT_ID, AMOUNT_ID, ACTION_ID
What gets confusing is the "substep" of whisking the stuff together. I put that in another FACT table called FCT_ACTION_STEP since "whisking" is one action in the recipe list, but to perform the action you actually need to do three things.
I think the following is what some of the tables would look like with your data:
DIM_INGREDIENT
INGREDIENT_ID: 1
INGREDIENT_NAME: 'Scrambled eggs'
INGREDIENT_ID: 2
INGREDIENT_NAME: 'Salt'
INGREDIENT_ID: 3
INGREDIENT_NAME: 'Pepper'
INGREDIENT_ID: 4
INGREDIENT_NAME: 'Eggs'
INGREDIENT_ID: 5
INGREDIENT_NAME: 'Butter'
DIM_ACTION
ACTION_ID: 1
TYPE: 'Cook'
LENGTH: 5
UNITS: 'minutes'
ACTION_ID: 2
TYPE: 'Whisk'
LENGTH: null
UNITS: null
FCT_ACTION_STEP
STEP_ID: 1
ACTION_ID: 2
DIM_AMOUNT
AMOUNT_ID: 1
AMOUNT: 1
UNITS: 'grams'
AMOUNT_ID: 2
AMOUNT: 2
UNITS: null
FACT_RECIPE_STEPS
RECIPE_ID, RECIPE_STEP, ACTION_STEP_ID, INGREDIENT_ID, AMOUNT_ID, ACTION_ID
EDIT:
I was a bit unsure myself as to how to do the "Whisked" part of the recipe and thought that, when you add the whisked mixture to the final result, it's like adding in one ingredient to the recipe. However, you need to prepare the mixture before and it has three steps. It's basically like it's own little recipe, and the FACT_ACTION_STEP takes that other 'recipe' into account to be able to add the result one row in the FACT_RECIPE_STEPS table.
Now that I think about it a bit more, it might be better to just assign "Whisked" as its own recipe in FACT_RECIPE_STEPS and DIM_INGREDIENT (called something like "Whisked spices for eggs") +and get rid of the FACT_ACTION_STEP table altogether. That way you can easily make more complex recipes, such as "Eggs and Pancake Breakfast" where the Eggs part is the result of this recipe.
You can add some other fields to tables but I believe this schema works for you.
recipe
------------
r_id PK
recipe_name
cooking_time
recipe_of_recipes
-----------------
ror_id PK
ror_name
recipe_ror (table for many to many relation-> defining a recipe as an ingredient)
-------------
r_ror_id PK
r_id FK
ror_id FK
ingredients
-------------
i_id PK
t_id FK
r_id FK
ror_id FK (added later)
ingredient_name
quantity
technique
-------------
t_id PK
technique_name
EDIT
Let's say you want to store a recipe (X) which is a combination of x and y recipes plus z ingredient.
To prepare X recipe (big X),
in recipe,ingredients and technique tables you store
the x recipe and w,t,r ingredients with technique of p
the y recipe and b,n,m ingredients with technique of v
also z ingredient with technique of f (for this I forgot to add field ror_id as a FK in ingredients table)
You can define 2 different recipes (x and y) as ingredients of a recipe (X) using the recipe_ror table. This table relates to different recipes as one.(many to many relationship between tables recipe and recipe_of_recipes)
If you also want to store the technique for X,x or y recipes(like cook in your example), you can also add t_id field as FK to recipe and recipe_of_recipes table.
Related
Caveat: very new to database design/modeling, so bear with me :)
I'm trying to design a simple database that stores information about images in an archive. Along with file_name (which is one distinct string), I have fields like genre and starring where each field might contains multiple strings (if an image is associated with multiple genres, and/or if an image has multiple actors in it).
Right now the database is just a single table keyed on file_name, and the fields like starring and genre just have multiple comma-separated values stored. I can query it fine by using wildcards and like and in operators, but I'm wondering if there's a more elegant way to break out the data such that it is easier to use/query. For instance, I'd like to be able to find how many unique actors are represented in the archive, but I don't think that's possible with the current model.
I realize this is a pretty elementary question about data modeling, but any guidance anyone can provide or reading you can direct me to would be greatly appreciated!
Thanks!
You need to create extra tables in order to stick with the normalization. In your situation you need 4 extra tables to represent these n->m relations(2 extra would be enough if the relations were 1->n).
Tables:
image(id, file_name)
genre(id, name)
image_genres(image_id, genre_id)
stars(id, name, ...)
image_stars(image_id, star_id)
And some data in tables:
image table
id
file_name
1
/users/home/song/empire.png
2
/users/home/song/promiscuous.png
genre table
id
name
1
pop
2
blues
3
rock
image_genres table
image_id
genre_id
1
2
1
3
2
1
stars table
id
name
1
Jay-Z
2
Alicia Keys
3
Nelly Furtado
4
Timbaland
image_stars table
image_id
star_id
1
1
1
2
2
3
2
4
For unique actor count in database you can simply run the sql query below
SELECT COUNT(name) FROM stars
I'm using ssas tabular (powerpivot) and need to design a data-model and write some DAX.
I have 4 tables in my relational database-model:
Orders(order_id, order_name, order_type)
Spots (spot_id,order_id, spot_name, spot_time, spot_price)
SpotDiscount (spot_id, discount_id, discount_value)
Discounts (discount_id, discount_name)
One order can include multiple spots but one spot (spot_id 1) can only belong to one order.
One spot can include different discounts and every discount have one discount_value.
Ex:
Order_1 has spot_1 (spot_price 10), spot_2 (spot_price 20)
Spot_1 has discount_name_1(discount_value 10) and discount_name_2 (discount_value 20)
Spot_2 has discount_name_1(discount_value 15) and discount_name_3 (discount_value 30)
I need to write two measures: price(sum) and discount_value(average)
How do I correctly design a star schema with fact table (or maybe two fact tables) so that I in my powerpivot cube can get:
If i choose discount_name_1 I should get
order_1 with spot_1 and spot_2 and price on order_1 level will have value 50 and discount_value = 12,5
If I choose discount_name_3 I should get
order_1 with only spot_2 and price on order level = 20 and discount_value = 30
Fact(OrderKey, SpotKey, DiscountKey, DateKey, TimeKey Spot_Price, Discount_Value,...)
DimOrder, DimSpot, DimDiscount, etc....
TotalPrice:=
SUMX(
SUMMARIZE(
Fact
,Fact[OrderKey]
,Fact[SpotKey]
,Fact[Spot_Price]
)
,Fact[Spot_Price]
)
AverageDiscount:=
AVERAGE(Fact[Discount_Value])
Fact table is denormalized and you end up with the simplest star schema you can have.
First measure deserves some explanation. [Spot_Price] is duplicated for any spot with multiple discounts, and we would get wrong results with a simple SUM(). SUMMARIZE() does a group by on all the columns passed to it, following relationships (if necessary, we're looking at a single table here so nothing to follow).
SUMX() iterates over this table and accumulates the value of the expression in its second argument. The SUMMARIZE() has removed our duplicate [Spot_Price]s so we accumulate the unique ones (per unique combination of [OrderKey] and [SpotKey]) in a sum.
You say
One order can include multiple spots but one spot (spot_id 1) can only
belong to one order.
That's is not supported in the table definitions you give just above that statement. In the table definitions, one order has only one spot but (unless you've added a unique index to Orders on spot_id) each Spot can have multiple orders. Each Spot can also have multiple discounts.
If you want to have the relationship described in your words, the table definitions should be:
Orders(order_id, order_name, order_type)
OrderSpot(order_id, spot_id) -- with a Unique index on spot_id)
Spots (spot_id, spot_name, spot_time, price)
or:
Orders(order_id, order_name, order_type)
Spots (spot_id, spot_name, spot_time, order_id, price)
You can create the ssas cube with Order as the fact table, with one dimention in the Spot Table. If you then add the SpotDiscount and Discount tables with their relations (SpotDiscount to Spot, Discount to SpotDiscount) you have a 1 dimentional.
EDIT as per comments
Well, the Fact table would have order_id, order_name, order_type
The Dimension would be made up of the other 3 tables and have the columns you're interested in: probably spot_name, spot_time, spot_price, discount_name, discount_value.
Trying to figure out how to change a structure from what I currently have which is this:
tblHaulLogs
intLogID
intHaulType
intSerial
intOriginSource
intOrigin
intDestinationSource
intDestination
dtmHaulDate
ccyLogPay
intHauler
txtLogNotes
intInvoiceID
In this table, what I am doing is using the origin and destination source fields to determine which table the fk for the origin and destination comes from. This feels very wrong to me.
tblHaulTypes
intHaulTypeID
chrHaulType
intOriginSourceType
intDestinationSourceType
Data in the Haul Types Table:
LOT, 1, 1
DEL, 1, 2
RPO, 2, 1
Now let me explain:
The first type happens when an item goes from a sales lot to another sales lot.
The second type happens when an item goes from a sales lot to a customer(sale gets delivered).
The third type happens when an item returns from the customer back to the sales lot.
Then the Item can be resold/returned/resold/returned(rent-to-own system).
Now, here are the problems I have:
An Haul Log's origin will always be the destination of the last move. Therefore I thought that the origin field is redundant. However, it's the relation between the destination of the last move and the destination of the new move that defines what the shipper gets paid and what type of haul it is.
In other words, even though the first type and the third type technically have the same fields, the type of move is not the same because of the previous move type. What do I need to do here? Am I totally missing the boat on what the structure should be?
The questions I need to answer based on this data is:
How many Items do I have on my sales lots that are new inventory(have never been sold).
How many Items do I have that have been sold and returned(doesn't matter how many times).
I'm guessing at the relationship between the various fields and tables.
Your tblHaulTypes table looks fine.
intHaulTypeID
chrHaulType
intOriginSourceType
intDestinationSourceType
You're missing a haul type that accounts for deliveries from suppliers to your lots.
There has to be some table that lists your lots. I'd call it tblHaulLot.
intLotNumber
txtLotName
...
I'd make a tblHaulTransaction table that looks like this.
intTransactionID
intHaulTypeID
intHauler
intOriginOrganizationID
intDestinationOrganizationID
intOriginLot (null if origin is supplier)
intDestinationLot (null if destination is customer)
dtmHaulDate
txtLogNotes
Now, we need an tblOrganization.
intOrganizationID
txtOrganizationName
txtOrganizationAddress
...
The organization at ID 0 is your organization. Suppliers and customers would fill the rest of the table.
I'd make a tblHaulInvoice table that looks like this.
intInvoiceID
intTransactionID
ccyTransactionPay
dtmDateInvoiced
AmountInvoiced
The amount invoiced (and amount paid) have to be accounted for in some table. I don't know what ccy stands for, and I don't know your 3 letter code for a decimal (money) field.
How many Items do I have on my sales lots that are new inventory(have never been sold). How many Items do I have that have been sold and returned(doesn't matter how many times).
Nowhere in your data model is there any kind of inventory table. I'd need to know a lot more about your business to create one or more inventory tables.
I try to build a database model for the following structure:
I have companies with up to 3 hierachical levels. For each unit I have a value (these values are given randomly and duplicates between companies (not within) are possible. Let us say (1 Level: 222-Amazon, 2 Level: 441-Amazon: Germany, 542-Britan, 3 Level: 6-Distribution, 99-Shop, 124-Programming, 5-HR.
Of course for each company this is different. What I did is:
Table1:
ID_Worker
CompanyName
ID_CompanyLvL1
ID_CompanyLvL2
ID_CompanyLvL3
...
Table2:
ID_CompanyLevel1
Slot1
Slot2
...
Table3:
ID_CompanyLevel2
Slot1
Slot2
...
But with this approach I have the following problem: If two companies have the same number for a CompanyLevel1(2 or 3) unit I cannot distingush them anymore.
Another approach that is not working is
Table1:
ID_Company
ID_Worker
ID_CompanyLevel1
...
Tabel2:
ID_CompanyLevel1
Slot1
ID_CompanyLevel2
...
Table3:
ID_CompanyLevel2
Slot
ID_CompanyLevel3
...
With this approach I cannot identify which person is in e.g. which level2 unit. Could anyone help me with this i just cannot come up with the right design.
You need to decide whether the organization structure is purely hierarchical (an org unit can only belong to 0 or 1 other org unit), or whether it is graphical (an org unit can belong to 0, 1, or 1+ org units).
Your limit of three is a business rule, and should be enforced by database logic (trigger) and not the database schema.
Why the codes with the names?
If hierarchical, this is your schema:
create table organizations (
organization_id int primary key,
name varchar(whatever) not null,
parent_id int null references organizations(organization_id)
);
Use Recursive Common Table Expressions to query them.
If graphical, this is your schema:
create table organizations (
organization_id int primary key,
name varchar(whatever) not null
);
create table organizations_structure (
parent_organization_id int references organizations(organization_id),
child_organization_id int references organizations(organization_id),
primary key (parent_organization_id, child_organization_id),
check (parent_organization_id <> child_organization_id)
);
For anything like that - make sure you do not put yourself into a cornder. For example:
I have companies with up to 3 hierachical levels
No. YOu do have companies with CURRENTLY up to 3 hierarchical levels. And they do not want to scream at you when one of them decides to have 4.
I would suggest reading the Data Model Ressource Book Volume 1 - they describe all kinds of stuff and standard data schemata, among them entity organizations (entity as in "legal, human or organizatonal entity" which includes organigrams. Things are a lot more complex as you think when you do not want to put yourself into a corner that WILL make the program require a rewrite in the not too far future.
I have an application where the database back-end has around 15 lookup tables. For instance there is a table for Counties like this:
CountyID(PK) County
49001 Beaver
49005 Cache
49007 Carbon
49009 Daggett
49011 Davis
49015 Emery
49029 Morgan
49031 Piute
49033 Rich
49035 Salt Lake
49037 San Juan
49041 Sevier
49043 Summit
49045 Tooele
49049 Utah
49051 Wasatch
49057 Weber
The UI for this app has a number of combo boxes in various places for these lookup tables, and my client has asked that the boxes list in this case:
CountyID(PK) County
49035 Salt Lake
49049 Utah
49011 Davis
49057 Weber
49045 Tooele
'The Rest Alphabetically
The best plan I have for accomplishing this is to add a column to each lookup table for SortOrder(numeric). I had a colleague tell me he thought that would cause the tables to violate 3rd-Normal-Form, but I think the sort order still depends on the key and only the key (even though the rest of the list is alphabetical).
Is adding the SortOrder column the best way to do this, or is there a better way I am just not seeing?
I agree with #cletus that a sort order column is a good way to go and it does not violate 3NF (because, as you said, the sort order column entries are functionally dependent on the candidate keys of the table).
I'm not sure I agree that alphanumeric is better than numeric. In the specific case of counties, there are seldom new ones created. But there is no requirement that the numbers assigned are sequential; you can allocate them with numbers that are a multiple of a hundred, for example, leaving ample room for insertions.
Yes I agree a sort order column is the best solution when the requirements call for a custom sort order like the one you cite. I wouldn't go with a numeric column however. If the data is alphanumeric, the sort order should be alphanumeric. That way you can seed the value with whatever is in the county field.
If you use a numeric field you'll have to resequence the entire table (potentially) whenever you add a new entry. So:
Columns: ID, County, SortOrder
Seed:
UPADTE County SET SortOrder = CONCAT('M-', County)
and for the special cases:
UPDATE County
SET SortOrder = CONCAT('E-' . County)
WHERE County IN ('Salt Lake', 'Utah', 'Davis', 'Weber', 'Tooele')
Arguably you may want to put another marker column in to indicate those entries are special.
I went with numeric and large multiples.
Even with the CONCAT('E-'.. example, I don't get the required sort order. That would give me Davis, SL, Tooele... and Salt Lake needs to be first.
I ended up using multiples of 10 and assigned the non-special-sort entries a value like 10000. That way the view for each lookup can have
ORDER BY SortOrder ASC, OtherField ASC
Another programmer suggested using DECODE in Oracle, or CASE statements in SQL Server, but this is a more general solution. YMMV.