Table design for related products - sql-server

I’m new to SQL Server and I’ve been assigned to work on implementing “related products” functionality in our existing database. We have a table products that looks like this
ProductID int, PK
Name nvarchar(100)
Price decimal
CoverImageURL nvarchar(400)
-- couple more columns exist
I’m thinking about adding another column to existing database like RelatedProductID but not sure if this is good design.

Your solution will work only if each product can have exactly one related product. Otherwise, if one product can have more than one related products you need a referencing table that will hold all relations between products.
Try adding a new table like this:
CREATE TABLE dbo.RelatedProducts
(
FirstProductId int not null,
SecondProductId int not null
)
So when you need all related products for some product you can retrieve these with following query.
SELECT SecondProductId
FROM RelatedProducts
WHERE FirstProductId = #ProductID

Depends on what kind of relationship you need.
I suspect what you actually need is a "many to many" relationship, in which case you'll need an additional "junction" table:
CREATE TABLE ProductRelation (
ProductID1 int REFERENCES Product (ProductID),
ProductID2 int REFERENCES Product (ProductID),
PRIMARY KEY (ProductID1, ProductID2)
)

Related

How can I insert rows of one table into multiple tables using a SQL Server stored procedure?

I am interested in inserting my rows of tempDataTable into two tables.
This is the table design of my tempdatatable:
The two tables I want to create via the stored procedure from my TempDataTable (one in the image).
Design for the two new table would be something like;
Table one (Product): ProductID (PK), ProductName, Product URL
Table two (ProductPricing): ProductPricingID(PK), ProductId (FK), price, priceperunit, Date
It's been a complete day I am searching for a solution, and will kept doing this but I am unable to exact solution. I am not experience with SQL but this is something I have to do.
Okay, I'm not sure exactly where you are struggling, so here's a script that sort of does what you asked for. None of this is too hard to follow, so maybe have a scan through it, and then let me know which bits are confusing?
Set up the table structure:
CREATE TABLE tempDataTable (
TempProductId INT,
TempProductUrl VARCHAR(512),
TempProductPrice VARCHAR(50),
TempProductPricePerUnit VARCHAR(50),
TempProductName VARCHAR(512));
INSERT INTO tempDataTable SELECT 2491, 'https://yadayada1', '£1.65/unit', '46p/100g', 'Yeo Valley Little Yeos, blah';
INSERT INTO tempDataTable SELECT 2492, 'https://yadayada2', '60p/unit', '1p/ea', 'Sainsbury''s Little Ones, etc';
CREATE TABLE Product (
ProductId INT PRIMARY KEY,
ProductName VARCHAR(512),
ProductUrl VARCHAR(512));
CREATE TABLE ProductPricing (
ProductPricingId INT IDENTITY(1,1) PRIMARY KEY,
ProductId INT,
ProductPrice VARCHAR(50),
ProductPricePerUnit VARCHAR(50),
ProductPricingDate DATETIME);
ALTER TABLE ProductPricing ADD CONSTRAINT foreignkey$ProductPricing$Product FOREIGN KEY (ProductId) REFERENCES Product (ProductId);
This gives me three tables to play with, one with some temporary data in it, and two that you want to push the data into, with a couple of primary keys, and a foreign key constraint to ensure integrity between the two tables.
Good so far?
Now to split the data between the two tables is as simple as:
INSERT INTO Product (ProductId, ProductName, ProductUrl) SELECT TempProductId, TempProductName, TempProductUrl FROM tempDataTable;
INSERT INTO ProductPricing (ProductId, ProductPrice, ProductPricePerUnit, ProductPricingDate) SELECT TempProductId, TempProductPrice, TempProductPricePerUnit, GETDATE() FROM tempDataTable;
If you run that then you should end up with data in your two tables, like this:
Product
ProductId ProductName ProductUrl
2491 Yeo Valley Little Yeos, blah https://yadayada1
2492 Sainsbury's Little Ones, etc https://yadayada2
ProductPricing
ProductPricingId ProductId ProductPrice ProductPricePerUnit ProductPricingDate
1 2491 £1.65/unit 46p/100g 2020-04-27 14:29:14.657
2 2492 60p/unit 1p/ea 2020-04-27 14:29:14.657
Now there's a whole load of questions that arise from this:
how are you going to cope with running this more than once, because the second time you run it there will be primary key violations?
do you want to clear down the temporary data somehow on successful completion?
do you want to use the system date as the pricing date, or are there more columns off the edge of your image?
do you want to check the data for duplicates and deal with them before running the script, or it will just fail?
if you do get a duplicate then do you skip it, or update the data (MERGE)?
why do you want this as a stored procedure? I mean it's easy enough to make into one, but I don't see why this would need to be repeatable... without seeing the other "moving parts" in this system anyway.
I'm guessing that you are loading bulk data into that temporary table somehow, from an Excel workbook, or XML, or similar. So all you want is a way to "tear the data up" into multiple tables. If this is indeed the case, then using a tool like SSIS might be more practical?
Okay, so that's 90% there, but you need two other things:
cope with situations where the product id already exists - don't try to insert it a second time as it will fail;
where the product id already exists then update the price data.
This should handle the first tweak:
INSERT INTO Product (ProductId, ProductName, ProductUrl) SELECT t.TempProductId, t.TempProductName, t.TempProductUrl FROM tempDataTable t
WHERE NOT EXISTS (SELECT * FROM Product p WHERE p.ProductId = t.TempProductId);
...and to UPDATE prices where the data already exists, or INSERT them if they don't exist, well you can use a MERGE statement:
MERGE
ProductPricing AS [target]
USING (SELECT TempProductId, TempProductPrice, TempProductPricePerUnit, GETDATE() AS ProductPricingDate FROM tempDataTable)
AS [source] (
ProductId,
ProductPrice,
ProductPricePerUnit,
ProductPricingDate)
ON ([target].ProductId = [source].ProductId)
WHEN MATCHED THEN
UPDATE SET
ProductPrice = [source].ProductPrice,
ProductPricePerUnit = [source].ProductPricePerUnit,
ProductPricingDate = [source].ProductPricingDate
WHEN NOT MATCHED THEN
INSERT (
ProductId,
ProductPrice,
ProductPricePerUnit,
ProductPricingDate)
VALUES (
[source].ProductId,
[source].ProductPrice,
[source].ProductPricePerUnit,
[source].ProductPricingDate);
Actually, re-reading your comment, I don't think you even need a MERGE (but I'm going to leave it there anyway, as it took me a little effort to write it).
I think your second case is as simple as just letting the second INSERT always run. There's two scenarios:
if there's already an entry for that product - then just add a new row to the ProductPricing table, so you will have one product, and two (or more) prices, each with a different date;
if it's a new product - then add the product and the price, so you will have one product and one price (until a new price arrives).
...and I can't resist adding, this is because you are using a natural key, i.e. a key from your data, so it doesn't change as you load it. If you were using a surrogate key (e.g. an IDENTITY that you got when you inserted the Product) then this wouldn't work, you would need to go and look up the surrogate key, then use this so your foreign key constraint worked properly. It's probably best to not think about this too hard?

Should I use NEWID() as my primary key in the table? What datatype I should use for that column?

I have read few posts and articles about NEWID() in MS SQL. Before I decide should I use this method or not I would like to get some information. My Single Page App has few tables. One of the tables should store unique key for each customer. I'm wondering if I should use NEWID() also how I should store that id in the table? I was looking over dataypes and there is unique-identifier type. Few articles mentioned that I will have potential problems with performance especially if I would be joining over 100k to some other tables. I will have this scenario where I would have to join these records to different tables. If anyone can provide some answers or suggestions that would be great. Thanks in advance!
Here is example of my Table:
Column Name Data Type Allow Nulls
hm_id int Unchecked // auto-increment id
hm_studentID uniqueidentifier Unchecked // primary key
hm_firstName varchar(50) Checked
hm_lastName varchar(50) Checked
hm_dob datetime Checked

Relating a child table to multiple parent tables

I am trying to figure out the best way to relate these tables together. Suppose I have the following tables:
tblPerson
tblGroup
tblResource
Each row in each of these tables can have multiple email addresses associated with them so I would want a separate table and relate it back.
Are there methods to have a single table (tblEmail) relate back to each of the tables. I thought of using a uniqueidentifier field in each of the parent tables and using that as a key in the email table. It would be guaranteed unique. I just wouldn't be able to create a FK in the email table to preserve integrity. That is manageable though.
Is there a fancy way to do this? I am creating these tables in SQL 2008 R2.
Thank you
Karl
While it may be tempting to try and use a single email table with a ParentType (Person/Group/Resource) and ParentID, this is dangerous and means you can't have the relationship defined in SQL (unless there's some feature I'm unaware of?).
If you want to have referential integrity in SQL you really need to create 3 tables, one for each parent table.
CREATE TABLE dbo.PersonEmail (
ID int IDENTITY PRIMARY KEY,
PersonID int,
EmailAddress varchar(500)
)
CREATE TABLE dbo.GroupEmail (
ID int IDENTITY PRIMARY KEY,
GroupID int,
EmailAddress varchar(500)
)
CREATE TABLE dbo.ResourceEmail (
ID int IDENTITY PRIMARY KEY,
ResourceID int,
EmailAddress varchar(500)
)
If you think you might extend your Email table to later include a DisplayName, and perhaps a BounceCount and others, create a table for Email and create many-to-many join tables to link them to Person/Group/Resource.
Be aware that edits might impact multiple links, you'll have to decide how you want to handle that.
This is a core part of SQL. In a proper relational design, you don't relate email addresess to perosns, groups, or resources -- you relate the persons, groups, and resources TO the email.
So, with an email table of:
CREATE TABLE dbo.tblEmail (
emailID int IDENTITY PRIMARY KEY,
email varchar(500)
)
If you only need one email per entity, you would just insert an emailID on each of the other fields that model something that may need an email.
ALTER TABLE dbo.tblPerson
ADD emailID int REFERENCES dbo.tblEmail(emailID);
ALTER TABLE dbo.tblGroup
ADD emailID int REFERENCES dbo.tblEmail(emailID);
ALTER TABLE dbo.tblResource
ADD emailID int REFERENCES dbo.tblEmail(emailID);
If you need multiple email addresses per entity, you need to insert an additional table, to interpolate the set of email addresses to a particular address. (I wouldn't do this unless you have a technical reason to handle the addresses individually, such as a bulk-email system where you want to avoid duplicates if someone uses the same email for their own use and their organization's use.)
CREATE TABLE dbo.tblEmail (
emailID int IDENTITY PRIMARY KEY
)
CREATE TABLE dbo.tblEmailAddress (
eAddrID IDENTITY PRIMARY KEY,
eAddr varchar(500)
)
CREATE TABLE dbo.tblEmailSet (
emailID int REFERENCES dbo.tblEmail(emailID),
eAddrID int REFERENCES dbo.tblEmailAddresses(eAddrID),
)
In order to, say, return a list of all emails to any Person, Group, or Resource named "Smith", you'd run the query below:
SELECT DISTINCT A.eAddr
FROM (
SELECT emailID FROM dbo.tblPerson WHERE Name = 'Smith'
UNION
SELECT emailID FROM dbo.tblGroup WHERE Name = 'Smith'
UNION
SELECT emailID FROM dbo.tblResource WHERE Name = 'Smith'
) AS PGR
INNER JOIN dbo.tblEmailSet AS S
ON PGR.emailID = S.emailID
INNER JOIN dbo.tblEmailAddress AS A
ON S.eAddrID = A.eAddrID
That ugly UNION, btw, is one of the reasons why you really don't want to do this unless you have a technical need to retrieve the data uniquely. While I've done this sort of many-to-many-to-many join on occasion, in this particular instance it's kind of a "code smell" and an indicator that instead of tracking "People", "Groups", and "Resources", you should be tracking "Contacts" with a "type" indicator to tell if a contact is a Person, a Group, or a Resource.
(Or maybe you never need to grab a bunch of email addresses, and just want a single table of emails you can check for whitelisting...)
So you want to have possibly multiple Emails per Person/Group/Resource, and you want all those emails in one table, am I correct ?
To do that, I would create a table dbo.EmailAddress such as this :
CREATE TABLE dbo.EmailAddress
(
EmailID BIGINT IDENTITY(1,1) NOT NULL PRIMARY KEY
,EmailAddress VARCHAR(250) NOT NULL
CONSTRAINT UK_EmailAddress UNIQUE(EmailAddress) --to ensure that you never insert twice the same email address
)
Then I would create the relation between your Person/Group/Resource and you emails using another table :
CREATE TABLE dbo.EmailAddressParentXRef
(
EmailID INT NOT NULL REFERENCES dbo.EmailAddress(EmailID)
,ParentTypeID INT NOT NULL
,PersonID INT NULL REFERENCES dbo.tblPerson(PersonID)
,GroupID INT NULL REFERENCES dbo.tblGroup(GroupID)
,ResourceID INT NULL REFERENCES dbo.tblResource(ResourceID)
CONSTRAINT UK_EmailID_ParentTypeID UNIQUE(EmailID,ParentTypeID) --to make sure you don't put the same EmailID for the same type of Parent (e.g. EmailID=12 twice for an Account)
)
There you would have referential integrity + some checks to avoid duplicates when you load data. Note that I didn't put a check to make sure you actually fill in either the PersonID, GroupID or ResourceID. This can be added in different ways, but if you understand the principle of this table, you shouldn't load any line without those references (or they will just be useless).
A lot more checks can be added based on this, to take care of every type of duplication/error you might create when loading the data, but you get the point.

store equal records in multiple tables

im developing a simple access application that helps us to order the right products for a project. i have a table for each contractor containing its products. i have a table "favorite-products" that relates to products and gives additional information how and when they should be used.
normally id have a big table (containing all products) that has a contractor-column. i my favorite-products table i could then easyly relate to a product. but here i need to keep the products in separate tables. so whats the best way to connect my favorite-products table with the products in the contractor-tables?
thanks :)
This is not the best design.
You should UNION all contractor tables together and JOIN with the result:
SELECT *
FROM (
SELECT product
FROM contractor1
UNION ALL
SELECT product
FROM contractor2
UNION ALL
…
) c
JOIN favorite f
ON f.product = c.product
You better keep one single table for you products with contractor as a field.
It will be much easier to query and to manage.
I would create a contractors table, a product table and then a many-to-many linked table contractors to products. Also i would create a favorite-products table in which you can also have a many-to-many contractors to products link for those cases where a product can come from more than 1 contractor
So, you'll have a Contractor, Product and Contractor_Product table. Something like (in psuedo-sql):
create table Contractor {
id int primary key,
name varchar(50) not null,
...
}
create table Product {
id int primary key,
name varchar(50) not null,
...
}
create table Contractor_Product {
contractorid int references Contractor(id),
productid int references Product(id),
...,
primary key contractorid, productid
}
Now, I'm not 100% sure what you want from the "Favorites" table. It may not be a table, but rather a query. Or, maybe you want a table that similar to the Contractor_Product table? Or just another "isfavorite bool default=false" column on the Contractor_Product table?
Hope that helps!

Defining a one-to-one relationship in SQL Server

I need to define a one-to-one relationship, and can't seem to find the proper way of doing it in SQL Server.
Why a one-to-one relationship you ask?
I am using WCF as a DAL (Linq) and I have a table containing a BLOB column. The BLOB hardly ever changes and it would be a waste of bandwidth to transfer it across every time a query is made.
I had a look at this solution, and though it seems like a great idea, I can just see Linq having a little hissy fit when trying to implement this approach.
Any ideas?
One-to-one is actually frequently used in super-type/subtype relationship. In the child table, the primary key also serves as the foreign key to the parent table. Here is an example:
CREATE TABLE Organization
(
ID int PRIMARY KEY,
Name varchar(200),
Address varchar(200),
Phone varchar(12)
)
GO
CREATE TABLE Customer
(
ID int PRIMARY KEY,
AccountManager varchar(100)
)
GO
ALTER TABLE Customer
ADD FOREIGN KEY (ID) REFERENCES Organization(ID)
ON DELETE CASCADE
ON UPDATE CASCADE
GO
Why not make the foreign key of each table unique?
there is no such thing as an explicit one-to-one relationship.
But, by the fact that tbl1.id and tbl2.id are primary keys and tbl2.id is a foreign key referenceing tbl1.id, you have created an implicit 1:0..1 relationship.
Put 1:1 related items into the same row in the same table. That's where "relation" in "relational database" comes from - related things go into the same row.
If you want to reduce size of data traveling over the wire consider either projecting only the needed columns:
SELECT c1, c2, c3 FROM t1
or create a view that only projects relevant columns and use that view when needed:
CREATE VIEW V1 AS SELECT c1, c2, c3 FROM t1
SELECT * FROM t1
UPDATE v1 SET c1=5 WHERE c2=7
Note that BLOBs are stored off-row in SQL Server so you are not saving much disk IO by vertically-partitioning your data. If these were non-BLOB columns you may benefit form vertical partitioning as you described because you will do less disk IO to scan the base table.
How about this. Link the primary key in the first table to the primary key in the second table.
Tab1.ID (PK) <-> Tab2.ID (PK)
My problem was I have a 2 stage process with mandatory fields in both. The whole process could be classed as one episode (put in the same table) but there is an initial stage and final stage.
In my opinion, a better solution for not reading the BLOB with the LINQ query would be to create a view on the table that contains all the column except for the BLOB ones.
You can then create an EF entity based on the view.

Resources