Visual Studio Data Generation and Many-to-Many - sql-server

The data generation tool that comes with Visual Studio allows you to populate tables with random/patterned data. As an example, you can tell it to populate an Employees table with 50,000 rows and you can tell it to add 5 address rows to an Addresses table for each Employee row.
I've got a junction table much like the permissions example referenced in this Wikipedia article. Pseudo code below...
create table Users (
UserId int primary key,
Username varchar(50)
)
create table Permissions (
PermissionId int primary key,
Description varchar(50)
)
create table UserPermission (
UserPermissionId int primary key,
UserId int, -- foreign key to Users
PermissionId int -- foreign key to Permissions
)
Is it possible to use Visual Studio's Data Generation tool to populate Users, Permissions, and the associated junction table?
Using the GUI, I am only able to populate one of the two related tables by selecting it from the Related Table dropdown for the junction table.

This is a pretty good tool that allows you enforce foreign key referencial integrity when populating data. Give the trial a try, maybe it will do what you need...
http://www.red-gate.com/products/sql-development/sql-data-generator/

Related

SQL Regenerate Guid PK

Hello I have a scenario where I have multiple SQL databases, and a tool on a central database which connects to each user table on each database and builds a dataset.
The issue happens when say a user from one database is migrated to another. When the tool runs it encounters an issue because the user_id is a guid pk, and since users have been migrated across databases the dataset will end up having duplicate private keys in the final dataset.
My question, if I want to regenerate the user id guid for some user,
I of course have to also update all of the connecting foreign keys. Is
there a command in MS SQL to regenerate the GUID and also do so for
all connecting relationships?
ON UPDATE CASCADE is what you want, but you need to define it in the foreign key relationship ahead of time.
CREATE TABLE User (UserID UNIQUEIDENTIFIER, UserName varchar(30), ... )
CREATE TABLE UserData (DataID UNIQUEIDENTIFIER, UserID UNIQUEIDENTIIFER, ...)
ALTER TABLE UserData
ADD CONSTRAINT FK_UserData_UserID (UserID) REFERENCES User(UserID)
ON UPDATE CASCADE;

Relating a child table to multiple parent tables

I am trying to figure out the best way to relate these tables together. Suppose I have the following tables:
tblPerson
tblGroup
tblResource
Each row in each of these tables can have multiple email addresses associated with them so I would want a separate table and relate it back.
Are there methods to have a single table (tblEmail) relate back to each of the tables. I thought of using a uniqueidentifier field in each of the parent tables and using that as a key in the email table. It would be guaranteed unique. I just wouldn't be able to create a FK in the email table to preserve integrity. That is manageable though.
Is there a fancy way to do this? I am creating these tables in SQL 2008 R2.
Thank you
Karl
While it may be tempting to try and use a single email table with a ParentType (Person/Group/Resource) and ParentID, this is dangerous and means you can't have the relationship defined in SQL (unless there's some feature I'm unaware of?).
If you want to have referential integrity in SQL you really need to create 3 tables, one for each parent table.
CREATE TABLE dbo.PersonEmail (
ID int IDENTITY PRIMARY KEY,
PersonID int,
EmailAddress varchar(500)
)
CREATE TABLE dbo.GroupEmail (
ID int IDENTITY PRIMARY KEY,
GroupID int,
EmailAddress varchar(500)
)
CREATE TABLE dbo.ResourceEmail (
ID int IDENTITY PRIMARY KEY,
ResourceID int,
EmailAddress varchar(500)
)
If you think you might extend your Email table to later include a DisplayName, and perhaps a BounceCount and others, create a table for Email and create many-to-many join tables to link them to Person/Group/Resource.
Be aware that edits might impact multiple links, you'll have to decide how you want to handle that.
This is a core part of SQL. In a proper relational design, you don't relate email addresess to perosns, groups, or resources -- you relate the persons, groups, and resources TO the email.
So, with an email table of:
CREATE TABLE dbo.tblEmail (
emailID int IDENTITY PRIMARY KEY,
email varchar(500)
)
If you only need one email per entity, you would just insert an emailID on each of the other fields that model something that may need an email.
ALTER TABLE dbo.tblPerson
ADD emailID int REFERENCES dbo.tblEmail(emailID);
ALTER TABLE dbo.tblGroup
ADD emailID int REFERENCES dbo.tblEmail(emailID);
ALTER TABLE dbo.tblResource
ADD emailID int REFERENCES dbo.tblEmail(emailID);
If you need multiple email addresses per entity, you need to insert an additional table, to interpolate the set of email addresses to a particular address. (I wouldn't do this unless you have a technical reason to handle the addresses individually, such as a bulk-email system where you want to avoid duplicates if someone uses the same email for their own use and their organization's use.)
CREATE TABLE dbo.tblEmail (
emailID int IDENTITY PRIMARY KEY
)
CREATE TABLE dbo.tblEmailAddress (
eAddrID IDENTITY PRIMARY KEY,
eAddr varchar(500)
)
CREATE TABLE dbo.tblEmailSet (
emailID int REFERENCES dbo.tblEmail(emailID),
eAddrID int REFERENCES dbo.tblEmailAddresses(eAddrID),
)
In order to, say, return a list of all emails to any Person, Group, or Resource named "Smith", you'd run the query below:
SELECT DISTINCT A.eAddr
FROM (
SELECT emailID FROM dbo.tblPerson WHERE Name = 'Smith'
UNION
SELECT emailID FROM dbo.tblGroup WHERE Name = 'Smith'
UNION
SELECT emailID FROM dbo.tblResource WHERE Name = 'Smith'
) AS PGR
INNER JOIN dbo.tblEmailSet AS S
ON PGR.emailID = S.emailID
INNER JOIN dbo.tblEmailAddress AS A
ON S.eAddrID = A.eAddrID
That ugly UNION, btw, is one of the reasons why you really don't want to do this unless you have a technical need to retrieve the data uniquely. While I've done this sort of many-to-many-to-many join on occasion, in this particular instance it's kind of a "code smell" and an indicator that instead of tracking "People", "Groups", and "Resources", you should be tracking "Contacts" with a "type" indicator to tell if a contact is a Person, a Group, or a Resource.
(Or maybe you never need to grab a bunch of email addresses, and just want a single table of emails you can check for whitelisting...)
So you want to have possibly multiple Emails per Person/Group/Resource, and you want all those emails in one table, am I correct ?
To do that, I would create a table dbo.EmailAddress such as this :
CREATE TABLE dbo.EmailAddress
(
EmailID BIGINT IDENTITY(1,1) NOT NULL PRIMARY KEY
,EmailAddress VARCHAR(250) NOT NULL
CONSTRAINT UK_EmailAddress UNIQUE(EmailAddress) --to ensure that you never insert twice the same email address
)
Then I would create the relation between your Person/Group/Resource and you emails using another table :
CREATE TABLE dbo.EmailAddressParentXRef
(
EmailID INT NOT NULL REFERENCES dbo.EmailAddress(EmailID)
,ParentTypeID INT NOT NULL
,PersonID INT NULL REFERENCES dbo.tblPerson(PersonID)
,GroupID INT NULL REFERENCES dbo.tblGroup(GroupID)
,ResourceID INT NULL REFERENCES dbo.tblResource(ResourceID)
CONSTRAINT UK_EmailID_ParentTypeID UNIQUE(EmailID,ParentTypeID) --to make sure you don't put the same EmailID for the same type of Parent (e.g. EmailID=12 twice for an Account)
)
There you would have referential integrity + some checks to avoid duplicates when you load data. Note that I didn't put a check to make sure you actually fill in either the PersonID, GroupID or ResourceID. This can be added in different ways, but if you understand the principle of this table, you shouldn't load any line without those references (or they will just be useless).
A lot more checks can be added based on this, to take care of every type of duplication/error you might create when loading the data, but you get the point.

How do I configure Schema Compare to produce separate files for foreign keys?

To work with our database project in VS 2010, we make schema changes directly into our local project database using SSMS, then when we are ready to check in we do a Schema Compare, the local database vs the project, which identifies our changes. Then when we Write Changes, it alters or creates schema object scripts into our database project. If all looks well, we can then check in those changes to TFS.
Our standard on foreign keys and indices is to have those saved separately. That is, even though I define a column in a new table by saying something like this:
CREATE TABLE Billing.EntryPointProduct
(
EntryPointProductId INT IDENTITY(1,1) PRIMARY KEY,
EntryPointId INT FOREIGN KEY REFERENCES Billing.EntryPoint(EntryPointId),
ProductId INT FOREIGN KEY REFERENCES ProductCatalog.Product(ProductID)
)
What we really want, in the end, is a file for the EntryPointProduct table and a file for each of the Foreign Key objects. However, right now the schema compare is producing it all in one table script. I swear I have done this before with schema compare, but I can't seem to find the way to configure it to do this. Can anyone advise?
Can you change your DDL so it looks like this:
CREATE TABLE Billing.EntryPointProduct
(
EntryPointProductId INT IDENTITY(1,1),
EntryPointId INT,
ProductId INT,
CONSTRAINT [PK_EntryPointProduct] PRIMARY KEY CLUSTERED (EntryPointProductId)
)
ALTER TABLE Billing.EntryPointProduct
WITH CHECK ADD CONSTRAINT FK_EntryPointProduct_EntryPoint FOREIGN KEY(EntryPointId) REFERENCES Billing.EntryPoint(EntryPointId)
ALTER TABLE Billing.EntryPointProduct
WITH CHECK ADD CONSTRAINT FK_EntryPointProduct_ProductCatalog FOREIGN KEY(ProductId) REFERENCES ProductCatalog.Product(ProductID)
That way you'd have 3 different files, and your FK's would have real names (FK_*) instead of system-generated names which will be randomly generated each time they are created and therefore won't match if you did a schema compare between 2 separately scripted out databases. (Same reason why I modified the PK code)

SQL Server primary / foreign keys

I am developing a system in which I have a table Employees with multiple columns related to employees. I have a column for the JobTitle and another column for Department.
In my current design, the JobTitle & the Department columns are compound foreign keys in the Employees table and they are linked with the Groups table which has 2 columns compound primary key (JobTitle & Department) and an extra column for the job description.
I am not happy about this design because I think that linking 2 tables using 2 compound varchar columns is not good for the performance, and I think it would be better to have an Integer column (autonumber) JobTitleID used as the primary key in the Groups table and as a foreign key in the Employees table instead of the the textual JobTitle & the Department columns.
But I had to do this because when I import the employees list (Excel) into my Employees table it can just be directly mapped (JobTitle --> JobTitle & Department --> Department). Otherwise if I am using an integer index as primary key I would have then to manually rename the textual JobTitle column in the excel sheet to a number based on the generated keys from the Groups table in order to import.
Is it fine to keep my database design like this (textual compound primary key linked with textual compound foreign key)? If not, then if I used an integer column in the Groups table as primary key and the same as a foreign key in the Employees table then how can I import the employees list from excel directly to Employees table?
Is it possible to import the list from Excel to SQL Server in a way that the textual JobTitle from the excel sheet will be automatically translated to the corespondent JobTitleID from the Groups table? This would be the best solution, I can then add JobTitleID column in the Groups table as a primary key and as a foreign key in the Employees table.
Thank you,
It sounds like you are trying to make the database table design fit the import of the excel file which is not such a good idea. Forget the excel file and design your db tables first with correct primary keys and relationships. This means either int, bigint or guids for primary keys. This will keep you out of trouble unless you absolutely know the key is unique such as in a SSN. The when you import, then populate the departments and job titles into their respective tables creating their primary keys. Now that they are populated, add those keys to the excel file that can be imported into the employees table.
This is just an example of how I would solve this problem. It is not wrong to use multiple columns as the key but it will definitely keep you out of harms way if you stick with int, bigint or guids for your primary keys.
Look at the answer in this post: how-to-use-bulk-insert...
I would create a simple Stored Procedure that imports your excel data into a temporary unrestricted STAGING table and then do the INSERT into your real table by doing the corresponding table joins to get the right foreign keys and dump the rows that failed to import into an IMPORT FAIL table. Just some thoughts...

Defining a one-to-one relationship in SQL Server

I need to define a one-to-one relationship, and can't seem to find the proper way of doing it in SQL Server.
Why a one-to-one relationship you ask?
I am using WCF as a DAL (Linq) and I have a table containing a BLOB column. The BLOB hardly ever changes and it would be a waste of bandwidth to transfer it across every time a query is made.
I had a look at this solution, and though it seems like a great idea, I can just see Linq having a little hissy fit when trying to implement this approach.
Any ideas?
One-to-one is actually frequently used in super-type/subtype relationship. In the child table, the primary key also serves as the foreign key to the parent table. Here is an example:
CREATE TABLE Organization
(
ID int PRIMARY KEY,
Name varchar(200),
Address varchar(200),
Phone varchar(12)
)
GO
CREATE TABLE Customer
(
ID int PRIMARY KEY,
AccountManager varchar(100)
)
GO
ALTER TABLE Customer
ADD FOREIGN KEY (ID) REFERENCES Organization(ID)
ON DELETE CASCADE
ON UPDATE CASCADE
GO
Why not make the foreign key of each table unique?
there is no such thing as an explicit one-to-one relationship.
But, by the fact that tbl1.id and tbl2.id are primary keys and tbl2.id is a foreign key referenceing tbl1.id, you have created an implicit 1:0..1 relationship.
Put 1:1 related items into the same row in the same table. That's where "relation" in "relational database" comes from - related things go into the same row.
If you want to reduce size of data traveling over the wire consider either projecting only the needed columns:
SELECT c1, c2, c3 FROM t1
or create a view that only projects relevant columns and use that view when needed:
CREATE VIEW V1 AS SELECT c1, c2, c3 FROM t1
SELECT * FROM t1
UPDATE v1 SET c1=5 WHERE c2=7
Note that BLOBs are stored off-row in SQL Server so you are not saving much disk IO by vertically-partitioning your data. If these were non-BLOB columns you may benefit form vertical partitioning as you described because you will do less disk IO to scan the base table.
How about this. Link the primary key in the first table to the primary key in the second table.
Tab1.ID (PK) <-> Tab2.ID (PK)
My problem was I have a 2 stage process with mandatory fields in both. The whole process could be classed as one episode (put in the same table) but there is an initial stage and final stage.
In my opinion, a better solution for not reading the BLOB with the LINQ query would be to create a view on the table that contains all the column except for the BLOB ones.
You can then create an EF entity based on the view.

Resources