My colleague and I are learning more about database design. We found a situation where we don't know what the next step is. I like to think of it of having 3 combo boxes. I choose the first 2 values and the last one is filtered accordingly to the first 2. Imagine we have 3 tables: Color Table, Material Table, Size Table.
Below is Color Table Design:
ID | Color_Name | Material_ID | Size_ID
In order to choose a Color, it depends on filtering by Material_ID and Size_ID. We have over 600 Colors to choose from. We have 6 different Materials. Lets say Color Red can be used by 4 Materials. So in the Color Table you would have at least 4 records for Red. So this table could technically have a max of 600 * 6 * (# of sizes).
The problem with this is that we would have to enter in all those records. The Color Table would be a static table(very rare we would enter in more colors). Would it be best practice to enter in every combination possible in this table?
Or do we use a matrix to find out every possible combination? I would assume the matrix would be a table, but not sure how you would create a matrix table that compares more than 2 fields(would have to create multidimensional tables).
We would like to follow best practice for designing the database which would help with maintenance. We are open to all suggestions/ideas on how this is handled in the real world. Thank you for your time!
Your Color table is poorly designed and in need of normalization. Every color should be recorded only once:
Color (ID PK, Color_Name)
And then the combinations of materials, sizes and colors can be represented as a ternary relation:
ColorMaterialSize (Color_ID, Material_ID, Size_ID)
Take some time to think about this one. Is any combination of one or two columns unique/identifying, or are all combinations of the three columns valid? Define your primary key on the smallest unique set of columns.
You can then select colors by joining and filtering:
SELECT c.ID, c.Color_Name
FROM Color AS c
INNER JOIN ColorMaterialSize AS cms ON c.ID = cms.Color_ID
WHERE cms.Material_ID = 1 AND cms.Size_ID = 2
Related
Caveat: very new to database design/modeling, so bear with me :)
I'm trying to design a simple database that stores information about images in an archive. Along with file_name (which is one distinct string), I have fields like genre and starring where each field might contains multiple strings (if an image is associated with multiple genres, and/or if an image has multiple actors in it).
Right now the database is just a single table keyed on file_name, and the fields like starring and genre just have multiple comma-separated values stored. I can query it fine by using wildcards and like and in operators, but I'm wondering if there's a more elegant way to break out the data such that it is easier to use/query. For instance, I'd like to be able to find how many unique actors are represented in the archive, but I don't think that's possible with the current model.
I realize this is a pretty elementary question about data modeling, but any guidance anyone can provide or reading you can direct me to would be greatly appreciated!
Thanks!
You need to create extra tables in order to stick with the normalization. In your situation you need 4 extra tables to represent these n->m relations(2 extra would be enough if the relations were 1->n).
Tables:
image(id, file_name)
genre(id, name)
image_genres(image_id, genre_id)
stars(id, name, ...)
image_stars(image_id, star_id)
And some data in tables:
image table
id
file_name
1
/users/home/song/empire.png
2
/users/home/song/promiscuous.png
genre table
id
name
1
pop
2
blues
3
rock
image_genres table
image_id
genre_id
1
2
1
3
2
1
stars table
id
name
1
Jay-Z
2
Alicia Keys
3
Nelly Furtado
4
Timbaland
image_stars table
image_id
star_id
1
1
1
2
2
3
2
4
For unique actor count in database you can simply run the sql query below
SELECT COUNT(name) FROM stars
I have a key column in two many to many related table and sample representation of data is -
(attaching sample version of the table to get the point across as there are other numerous columns not contributing to this visual)
table 1 -
table 2 -
I am making a line graph with date on x axis and the value1 and value 2 on y-axis. The value1 is true for all dates. It is basically a standard target value. Now I want all the value1 summed up to show in my visual as value1 and not just the ones for which I have data on those dates. To explain it better I want 1717 on the graph as well like the total in the table in the following image -
visual -
Is there a way to do this in power BI? I tried to make a shared dimension of all unique key as a separate table and connecting both the tables to that table but there is no change in visual due to that.
You can follow these below steps to achieve your required output-
Step-1 Create a custom column in your *table 1 as below-
value_1_sum =
CALCULATE(
SUM(table_2[value1]),
ALL(table_2)
)
Step-2 Configure your line chart as below. Remember, the aggregation for new custom column will be Average as shown in the image
And here below is the final output-
Additional Reference Here below is list of options you will get after right click on the measure name-
I'm using ssas tabular (powerpivot) and need to design a data-model and write some DAX.
I have 4 tables in my relational database-model:
Orders(order_id, order_name, order_type)
Spots (spot_id,order_id, spot_name, spot_time, spot_price)
SpotDiscount (spot_id, discount_id, discount_value)
Discounts (discount_id, discount_name)
One order can include multiple spots but one spot (spot_id 1) can only belong to one order.
One spot can include different discounts and every discount have one discount_value.
Ex:
Order_1 has spot_1 (spot_price 10), spot_2 (spot_price 20)
Spot_1 has discount_name_1(discount_value 10) and discount_name_2 (discount_value 20)
Spot_2 has discount_name_1(discount_value 15) and discount_name_3 (discount_value 30)
I need to write two measures: price(sum) and discount_value(average)
How do I correctly design a star schema with fact table (or maybe two fact tables) so that I in my powerpivot cube can get:
If i choose discount_name_1 I should get
order_1 with spot_1 and spot_2 and price on order_1 level will have value 50 and discount_value = 12,5
If I choose discount_name_3 I should get
order_1 with only spot_2 and price on order level = 20 and discount_value = 30
Fact(OrderKey, SpotKey, DiscountKey, DateKey, TimeKey Spot_Price, Discount_Value,...)
DimOrder, DimSpot, DimDiscount, etc....
TotalPrice:=
SUMX(
SUMMARIZE(
Fact
,Fact[OrderKey]
,Fact[SpotKey]
,Fact[Spot_Price]
)
,Fact[Spot_Price]
)
AverageDiscount:=
AVERAGE(Fact[Discount_Value])
Fact table is denormalized and you end up with the simplest star schema you can have.
First measure deserves some explanation. [Spot_Price] is duplicated for any spot with multiple discounts, and we would get wrong results with a simple SUM(). SUMMARIZE() does a group by on all the columns passed to it, following relationships (if necessary, we're looking at a single table here so nothing to follow).
SUMX() iterates over this table and accumulates the value of the expression in its second argument. The SUMMARIZE() has removed our duplicate [Spot_Price]s so we accumulate the unique ones (per unique combination of [OrderKey] and [SpotKey]) in a sum.
You say
One order can include multiple spots but one spot (spot_id 1) can only
belong to one order.
That's is not supported in the table definitions you give just above that statement. In the table definitions, one order has only one spot but (unless you've added a unique index to Orders on spot_id) each Spot can have multiple orders. Each Spot can also have multiple discounts.
If you want to have the relationship described in your words, the table definitions should be:
Orders(order_id, order_name, order_type)
OrderSpot(order_id, spot_id) -- with a Unique index on spot_id)
Spots (spot_id, spot_name, spot_time, price)
or:
Orders(order_id, order_name, order_type)
Spots (spot_id, spot_name, spot_time, order_id, price)
You can create the ssas cube with Order as the fact table, with one dimention in the Spot Table. If you then add the SpotDiscount and Discount tables with their relations (SpotDiscount to Spot, Discount to SpotDiscount) you have a 1 dimentional.
EDIT as per comments
Well, the Fact table would have order_id, order_name, order_type
The Dimension would be made up of the other 3 tables and have the columns you're interested in: probably spot_name, spot_time, spot_price, discount_name, discount_value.
If I have the following data:
Results Table
.[Required]
I want one grape
I want one orange
I want one apple
I want one carrot
I want one watermelon
Fruit Table
.[Name]
grape
orange
apple
What I want to do is essentially say give me all results where users are looking for a fruit. This is all just example, I am looking at a table with roughly 1 million records and a string field of 4000+ characters. I am expecting a somewhat slow result and I know that the table could DEFINITELY be structured better, but I have no control of that. Here is the query I would essentially have, but it doesn't seem to do what I want. It gives every record. And yes, [#Fruit] is a temp table.
SELECT * FROM [Results]
JOIN [#Fruit] ON
'%'+[Results].[Required]+'%' LIKE [#Fruit].[Name]
Ideally my output should be the following 3 rows:
I want one grape
I want one orange
I want one apple
If that kind of think is doable, I would try the other way round:
SELECT * FROM [Results]
JOIN [#Fruit] ON
[Results].[Required] LIKE '%'+[#Fruit].[Name]+'%'
This topic interests me, so I did a little bit of searching.
Suggestion 1 : Full Text Search
I think what you are trying to do is Full Text Search .
You will need Full-Text Index created on the table if it is not already there. ( Create FULLTEXT Index ).
This should be faster than performing "Like".
Suggestion 2 : Meta Data Search
Another approach I'd take is to create meta data table, and maintain the information myself when the [Result].Required values are updated(or created).
This looks more or less doable, but I'd start from the Fruit table just for conceptual clarity.
Here's roughly how I would structure this, ignoring all performance / speed / normalization issues (note also that I've switched around the variables in the LIKE comparison):
SELECT f.name, r.required
FROM fruits f
JOIN results r ON r.required LIKE CONCAT('%', f.name, '%')
...and perhaps add a LIMIT 10 to keep the query from wasting time while you're testing it out.
This structure will:
give you one record per "match" (per Result row that matches a Fruit)
exclude Result rows that don't have a Fruit
probably be ungodly slow.
Good luck!
I am implementing a voting feature to allow users to vote for their favourite images. They are able to vote for only 3 images. Nothing more or less. Therefore, I am using checkboxes to do validation for it. I need to store these votes in my database.
Here is what i have so far :
|voteID | name| emailAddress| ICNo |imageID
(where imageID is a foreign key to the Images table)
I'm still learning about database systems and I feel like this isn't a good database design considering some of the fields like email address and IC Number have to be repeated.
For example,
|voteID | name| emailAddress | ICNo | imageID
1 BG email#example.com G822A28A 10
2 BG email#example.com G822A28A 11
3 BG email#example.com G822A28A 12
4 MO email2#example.com G111283Z 10
You have three "things" in your system - images, people, and votes.
An image can have multiple votes (from different people), and a person can have multiple votes (for different images).
One way to represent this in a diagram is as follows:
So you store information about a person in one place (the Person table), about Images in one place (the Images table), and Votes in one place. The "chicken feet" relationships between them show that one person can have many votes, and one image can have many votes. ("Many" meaning "more than one").