how to design a database table with similar entries - database

I currently have a table that has the majority (like 99%) of the data of a field dependent on a single field but the other 1% is dependent on other fields.
For example, I have the following price table
product PK
color PK
cost
The following are some entries in that table
product|color|cost
pen|red|$1.00
pen|blue|$1.00
pen|green|$1.00
etc....
pen|black|$0.90
pen|white|$0.85
pencil|red|$0.50
pencil|blue|$0.50
pencil|green|$0.50
etc...
pencil|black|$0.35
pencil|gray|$0.40
The problem I'm having with this table is that whenever I have to add a single product or color I have to add hundreds of similar entries to this table.
I'm currently thinking of storing the data in the following way
pen|all_other_colors|$1.00
pen|black|$0.90
pen|white|$0.85
pencil|all_other_colors|$0.50
pencil|black|$0.35
pencil|gray|$0.40
Am I on the right track or is there a better database design that handles this problem? Any help or links would be appreciated. I can't get the right wording to google for this problem.

You need to normalize database tables
break it in three tables as below:
Products
id | product
colors
id | color
product_cost
id | Product_id | color_id | Cost

BaseProduct has all product names and their prices
Product has all available valid product-color combinations
ColorPrice is the difference (offset) from the base price.
ColorCharge has only rows with exception pricing (no rows for ColorPrice = 0)
To get info for a specific base product (specific_prodict_id)
select
b.ProductName
, c.ColorName
, b.ProductPrice + coalesce(x.ColorPrice, 0.0) as ProductPrice
from Product as p
join BaseProduct as b on b.BaseProductID = p.BaseProductID
join Color as c on c.ColorId = p.ColorId
left join ColorCharge as x on x.BaseProductID = p.BaseProductID and x.ColorID = p.ColorID
where p.BaseProductID = specific_prodict_id;

You could group colors together and then insert the price for the whole group:
And your example could be represented similarly to this...
PRICE:
PRODUCT_ID GROUP_ID PRICE
pen 1 $1.00
pen 2 $0.90
pen 3 $0.85
pencil 1 $0.50
pencil 2 $0.35
pencil 4 $0.40
GROUP:
GROUP_ID
1
2
3
4
COLOR_GROUP:
GROUP_ID COLOR_ID
1 red
1 blue
1 green
2 black
3 white
4 gray
COLOR:
COLOR_ID
red
blue
green
black
white
gray
Whether the increased complexity is worth it is up to you...

Related

Group By with Where clause in SQL

I got some difference in the results when using a DateTime column in groupby. Can someone explain why?
Query:
Select Name, Source, Description, CreatedDate
From testTable
Where Source like '%Validating err%'
And CreatedDate >='2016-12-01'
Group By Name , Source, Description, CreatedDate
Result : 15 rows
The above query return me some 15 results. But when i remove the CreatedDate column from groupby clause it returns only 4 results.
Query:
Select Name, Source, Description
From testTable
Where Source like '%Validating err%'
And CreatedDate >='2016-12-01'
Group By Name , Source, Description
Result : 4 rows
I am adding this answer just for the benefit of #John so he can visually understand why his two result sets have differing numbers of records.
Imagine a table called shirts, which has only two columns, size and color. Here is some sample data:
size | color
S | red
S | green
S | blue
M | red
M | green
M | blue
L | red
L | green
L | blue
In other words, there are three sizes of shirts, and each size has three possible colors.
Now, if you execute the following query:
SELECT size
FROM shirts
GROUP BY size
you will get three records back, containing only the three sizes. However, if you do the following:
SELECT size, color
FROM shirts
GROUP BY size, color
Then you would get back nine records, or groups. All that is happening here is that the addition of another column creates new possible group combinations, and hence more groups. And the same concept applies to what you are seeing in your two queries.

SQL Server table identity specification and composite key

hi I have two basic tables one is company and the next one is items reationship between these two tables are 1 to M (1 company has many items associated with it and one item belongs to one company only )
Company = {companyid,companyname}
_________
items = {itemid,itemname,companyid}
_______ ---------
I have set itemid identity specification to YES and now the item ID gets increased
if I have two companies id 1 & 2 A sample data table would show this
itemid itemname idcompany
----- ------- ---------
1 car 1
2 bus 2
3 bike 1
4 motorcycle 2
My issue is when showing company specific data I get this
company 1
itemid itemname idcompany
----- ------- ---------
1 car 1
3 bike 1
company 2
itemid itemname idcompany
----- ------- ---------
2 bus 2
4 motorcycle 2
how do I keep the item id sequential for each company ?
Thank you
Question
What does "sequential" even mean?
Suggestion
The Sequence possibly could change based on the business question. For instance, does sequence always mean in the order in which the row was inserted into the table, or does it mean the time at which the item was added? Regardless, you may want to implement the concept of a sequence independent of how the data is stored. For instance, based on your need, you could do something like this (which gives you the sequence of each item by company, based solely on the item_id itself):
select *,
itemsequence = row_number() over (
partition by (idcompany)
order by (itemid))
from items;
Hope this helps.

return a list of unmatched records between queries

I apologize if this has been asked already I couldn't find anything that was quite what I wanted ->
Is there a way to return a list of records that do match two queries for example
ID | Name | Color
1 crayon blue
2 marker red
3 paint green
"Select Id, Name, color from TableA" =
ID | Name | Color
1 crayon blue
2 marker red
3 paint green
"Select Id, Name, color from TableA where color = 'blue'" =
ID | Name | Color
1 crayon blue
I was hoping there was some functionality that would take the two queries above and provide a result set like:
ID | Name | Color
2 marker red
3 paint green
Being the records of the two queries that do not equal.
Thanks in advance!
I am going to assume that your queries are really more complicated and this is just an example. One way is by using left join:
with q1 as (<query1 here>),
q2 as (<query2 here>)
select q1.*
from q1 left join
q2
on q1.id = q2.id
where q2.id is null;
This assumes the match is on id. If there are more column that need to be the same, add them to the on clause.
What about this:
<query 1 here>
EXCEPT
<query 2 here>
Of course, use of EXCEPT assumes that:
The number and the order of the columns must be the same in both
queries.
The data types must be compatible.
You can try the following query to get your desired result set:
Select Id, Name, color from TableA
EXCEPT
Select Id, Name, color from TableA where color = 'blue'

One or four tables? (db structure)

Items are connected to:
one or more "region" and/or
one or more "county" and/or
one or more "city" and/or
one or more "place".
My question is how I should set up the relations:
id | thingID | regionId | countyId | cityId | placeId
or
4 tables?
id | thingId | regionId
id | thingId | countyId
id | thingId | cityId
id | thingId | placeId
or is there perhaps another better solution?
I may be overthinking this, but I think there's probably a relationship between "region", "county", "city" and "place" - an item that belongs to a "place" should also belong to the city, county and region.
You can solve this in both the designs you provide - but you need a fair amount of additional logic. In the first solution, you need to make sure that every time you insert a record, you populate the location from "left to right" - a record with only "place" is not valid.
In the second solution, you need to populate all relevant rows - an item in Chelsea must also have records for London, Middlesex and South East England.
There's another way...
Table: location
ID Name Parent
------------------------
1 South East England null
2 Middlesex 1
3 London 2
4 Chelsea 3
5 Kent 1
6 Canterbury 5
Table: item
Id name
-----------------
1 Posh Boy
2 Cricket ground
3 Rain
Table: item_location
ItemID LocationID
--------------------
1 4 //Posh boy in Chelsea
2 2 // Cricket ground in Middlesex
3 1 // Rain in the South East of England.
The second option is clearly better. It's a many-to-many relationship for each of these categories, and that's what the second option describes.
The first option would result in some very odd data. If you had a ThingId that was associated to all of the different types once, you would have one row that had all the columns filled in. Then if your ThingId needed to be tied to an additional city, you would have another row that had only the cityId filled in, with the other columns remaining null.
A table design that results in a lot of null values is usually (not always) a sign that your model is flawed.

SQL make rows into columns, PIVOT maybe

I have an MS SQL Server with a database for an E-commerce storefront.
This is some of the tables I have:
Products:
Id | Name | Price
ProductAttributeTypes: -Color, Size, Format
Id | Name
ProductAttributes: --Red, Green, 12x20 cm, Mirrored
Id | ProductAttributeTypeId | Name
Orders:
Id | DateCreated
OrderItems:
Id | OrderId | ProductId
OrderItemsToProductAttributes: --Relates an OrderItem to its product and selected attributes
OrderItemId | ProductAttributeId | ProductAttributeTypeId | ProductId
I want to select from the OrderItems table, to see which items have been purchased.
To see what kind of variants (ProductAtriibutes) was selected, I want those as "dynamic" columns in the resultset.
So the resultset should look like this:
OrderItemId | ProductId | ProductName | Color | Size | Format
1234 123 Mount. Bike Red 2x20 Mirror
I don't know if PIVOT is the thing to use? I'm not using any aggregate functions, so I guess not...
Is there any SQL Ninjas that can help me out?
If you are using sql2005 or 2008 you can use the pivot command. See here.
In the example below the OrderAttributes set will look like:
OrderItemId AttName AttValue
----- ------ -----
100 Color Red
100 Size Small
101 Color Blue
101 Size Small
102 Color Red
102 Size Small
103 Color Blue
103 Size Large
The final results after the PIVOT will be:
OrderItemId Size Color
----- ------ -----
100 Small Red
101 Small Blue
102 Small Red
103 Large Blue
WITH OrderAttributes(OrderItemId, AttName, AttValue)
AS (
SELECT
OrderItemId,
pat.Name AS AttName,
pa.Name AS AttValue
FROM OrderItemsToProductAttributes x
INNER JOIN ProductAttributes pa
ON x.ProductAttributeId = pa.id
INNER JOIN ProductAttributeTypes pat
ON pa.ProductAttributeTypeId = pat.Id
)
SELECT AttrPivot.OrderItemId,
[Size] AS [Size],
[Color] AS Color
FROM OrderAttributes
PIVOT (
MAX([AttValue])
FOR [AttName] IN ([Color],[Size])
) AS AttrPivot
ORDER BY AttrPivot.OrderItemId
There is a way to dynamically build the columns (i.e. the Color and Size columns), as can be seen here. Make sure your database compatibility level on your database is set to something greater than 2000 or you will get strange errors.
In the past, I've created physical tables for read purposes only. The structure you have above is GREAT for storage, but terrible for reporting.
So you could do the following:
Write a script (that is scheduled nightly) or a trigger (on data change) that does the following tasks:
First, you would dynamically go through each Product and build a static table "Product_[ProductName]"
Then go through each ProductAttributeTypes for each product and create/update/delete a physical column on the corresponding Product table.
Then, fill that table with the proper values based on OrderItemsToProductAttributes and ProductAttributes
This is just a rough idea. Make sure you are storing OrderID in the "Static"/"Flattened" tables. And make sure you do everything else you need to do. But after that, you should be able to start pulling from those flattened tables to get the data you need.
Pivot is your best bet, but what I did for reporting purposes, and to make it work well with SSIS is to create a view, which then has this query:
SELECT [InputSetID], [InputSetName], CAST([470] AS int) AS [Created By], CAST([480] AS datetime) AS [Created], CAST([479] AS int) AS [Updated By], CAST([460] AS datetime)
AS [Updated]
FROM (SELECT st.InputSetID, st.InputSetName, avt.InputSetID AS avtID, avt.AttributeID, avt.Value
FROM app.InputSetAttributeValue avt JOIN
app.InputSets st ON avt.InputSetID = st.InputSetID) AS p PIVOT (MAX(Value) FOR AttributeID IN ([470], [480], [479], [460])) AS pvt
Then I can just interact with the view, but, I have a trigger on the table that any new dynamic attributes must be added to, which recreates this view, so I can assume the view is always correct.

Resources