Migrating a database with modifiable primary keys to MSSQL - sql-server

I have to migrate a Firebird database to MS SQL, and the original author designed the database in the way that it has mutable primary keys. For example, we have a tables
PRODUCTS with primary key (NAME,CATEGORY,SHORTNAME) and some data
PRICE_LISTING (NAME,CATEGORY,SHORTNAME) - foreign key to PRODUCT, and some data like price
PRODUCT_ALTERNATIVES (ORIGINAL_NAME,ORIGINAL_CATEGORY,ORIGINAL_SHORTNAME; REPLACEMENT_NAME,REPLACEMENT_CATEGORY,REPLACEMENT_SHORTNAME) - foreign keys to PRODUCTS
as you can see, whenever product name changes, lot of data in other tables has to change. The original author solved this with cascading update on the foreign keys. For some reasons, MS SQL doesn't allow me to use cascading update on the foreign key, so the database cannot be migrated so easily.
What makes things worse is that the software has a lot of editing screens, that use DataSets, DataGridViews and binding in Windows Forms, so the editor works pretty much automagically. For example, editor for PRODUCTS is a simple windows form, and whenever user renames a product (changes its NAME), the changes are propagated through database (via cascading update) to PRICE_LISTING and PRODUCT_ALTERNATIVES tables. And also, if the app wants to show price listings, it does simple SELECT * FROM PRICE_LISTING that includes product names and price.
One approach might be to add surrogate keys - ID identity columns to each table, then modify for example PRICE_LISTING table to (PRODUCT_ID, price) etc. But then, to show price, I'd have to define view that will
SELECT *,PRODUCT.NAME,PRODUCT.CATEGORY,PRODUCT.SHORTNAME FROM PRICE_LISTING join PRODUCT on ...
then, the automatically generated DataSets etc. will have to be modified to select from view, update the view.
In a nutshell and from the original author's perspective, we are breaking the program and making it more complicated. The only single thing that does not work when ported in a straightforward manner is the cascading update, but I see that the database design is very flawed and it will be a maintenance nightmare if not fixed.
What to do?

Related

Maintaining rows in a SQL Server junction table when rows in other table are inserted and updated

Can you show sample coding to create a trigger or stored procedure that maintains rows a SQL Server junction table when changes are made to the Authors and BookTitles tables such as inserting and updating rows in those tables?
We have the following tables:
Authors:
ID
NAME
ZIP
AND SOME MORE COLUMNS
BookTitles:
ID
TITLE
ISBN
AND SOME MORE COLUMNS
This is the table we will use as our junction table:
AuthorTitles:
ID
AUTHOR_ID
BOOK_TITLE_ID
We would like to do this in a trigger instead of doing the coding in our VB.Net form.
All help will be appreciated.
The above table structures were simplified to show what we are trying to do.
We are implementing a junction table for teachers and programs.
Here is a photo of the actual system:
Unless you have Foreign Key constraints that require at least one Book per Author and/or vice-versa, then the only cases that you should need special handling are for the Deletes to BookTitles or Authors. They can be done like this:
CREATE PROC BookTitle_Delete(#Book_ID As INT) As
-- First remove any children in the junction table
DELETE FROM AuthorTitles WHERE BOOK_TITLE_ID = #Book_ID
-- Now, remove the parent record on BookTitles
DELETE FROM BookTitles WHERE ID = #Book_ID
go
In general, you want to resist the temptation to do Table Maintenance and other things like this in Triggers. As triggers are invisible, add additional overhead, can cause maintenance problems for the DBA's, and can lead to many subtle transactional/locking complexities and performance issues. Triggers should be reserved for simple things that really should be hidden from the client application (like auditing) and that cannot be practically implemented in some other way. This is not one of those cases.
If you really want an "invisible" way of doing this then just implement a Cascading Foreign-Key. I do not recommend this either, but it is still preferable to a trigger.

How to implement this data structure in SQL tables

I have a problem that can be summarized as follow:
Assume that I am implementing an employee database. For each person depends on his position, different fields should be filled. So for example if the employee is a software engineer, I have the following columns:
Name
Family
Language
Technology
CanDevelopWeb
And if the employee is a business manager I have the following columns:
Name
Family
FieldOfExpertise
MaximumContractValue
BonusRate
And if the employee is a salesperson then some other columns and so on.
How can I implement this in database schema?
One way that I thought is to have some related tables:
CoreTable:
Name
Family
Type
And if type is one then the employee is a software developer and hence the remaining information should be in table SoftwareDeveloper:
Language
Technology
CanDevelopWeb
For business Managers I have another table with columns:
FieldOfExpertise
MaximumContractValue
BonusRate
The problem with this structure is that I am not sure how to make relationship between tables, as one table has relationship with several tables on one column.
How to enforce relational integrity?
There are a few schools of thought here.
(1) store nullable columns in a single table and only populate the relevant ones (check constraints can enforce integrity here). Some people don't like this because they are afraid of NULLs.
(2) your multi-table design where each type gets its own table. Tougher to enforce with DRI but probably trivial with application or trigger logic.
The only problem with either of those, is as soon as you add a new property (like CanReadUpsideDown), you have to make schema changes to accommodate for that - in (1) you need to add a new column and a new constraint, in (2) you need to add a new table if that represents a new "type" of employee.
(3) EAV, where you have a single table that stores property name and value pairs. You have less control over data integrity here, but you can certainly constraint the property names to certain strings. I wrote about this here:
What is so bad about EAV, anyway?
You are describing one ("class per table") of the 3 possible strategies for implementing the category (aka. inheritance, generalization, subclass) hierarchy.
The correct "propagation" of PK from the parent to child tables is naturally enforced by straightforward foreign keys between them, but ensuring both presence and the exclusivity of the child rows is another matter. It can be done (as noted in the link above), but the added complexity is probably not worth it and I'd generally recommend handling it at the application level.
I would add a field called EmployeeId in the EmployeeTable
I'd get rid of Type
For BusinessManager table and SoftwareDeveloper for example, I'll add EmployeeId
From here, you can then proceed to create Foreign Keys from BusinessManager, SoftwareDeveloper table to Employee
To further expand on your one way with the core table is to create a surrogate key based off an identity column. This will create a unique employee id for each employee (this will help you distinguish between employees with the same name as well).
The foreign keys preserve your referential integrity. You wouldn't necessarily need EmployeeTypeId as someone else mentioned as you could filter on existence in the SoftwareDeveloper or BusinessManagers tables. The column would instead act as a cached data point for easier querying.
You have to fill in the types in the below sample code and rename the foreign keys.
create table EmployeeType(
EmployeeTypeId
, EmployeeTypeName
, constraint PK_EmployeeType primary key (EmployeeTypeId)
)
create table Employees(
EmployeeId int identity(1,1)
, Name
, Family
, EmployeeTypeId
, constraint PK_Employees primary key (EmployeeId)
, constraint FK_blahblah foreign key (EmployeeTypeId) references EmployeeType(EmployeeTypeId)
)
create table SoftwareDeveloper(
EmployeeId
, Language
, Technology
, CanDevelopWeb
, constraint FK_blahblah foreign key (EmployeeId) references Employees(EmployeeId)
)
create table BusinessManagers(
EmployeeId
, FieldOfExpertise
, MaximumContractValue
, BonusRate
, constraint FK_blahblah foreign key (EmployeeId) references Employees(EmployeeId)
)
No existing SQL engine has solutions that make life easy on you in this situation.
Your problem is discussed at fairly large in "Practical Issues in Database Management", in the chapter on "entity subtyping". Commendable reading, not only for this particular chapter.
The proper solution, from a logical design perspective, would be similar to yours, but for the "type" column in the core table. You don't need that, since you can derive the 'type' from which non-core table the employee appears in.
What you need to look at is the business rules, aka data constraints, that will ensure the overall integrity (aka consistency) of the data (of course whether any of these actually apply is something your business users, not me, should tell you) :
Each named employee must have exactly one job, and thus some job detail somewhere. iow : (1) no named employees without any job detail whatsoever and (2) no named employees with >1 job detail.
(3) All job details must be for a named employee.
Of these, (3) is the only one you can implement declaratively if you are using an SQL engine. It's just a regular FK from the non-core tables to the core table.
(1) and (2) could be defined declaratively in standard SQL, using either CREATE ASSERTION or a CHECK CONSTRAINT involving references to other tables than the one the CHECK CONSTRAINT is defined on, but neither of those constructs are supported by any SQL engine I know.
One more thing about why [including] the 'type' column is a rather poor choice to make : it changes how constraint (3) must be formulated. For example, you can no longer say "all business managers must be named employees", but instead you'd have to say "all business managers are named employees whose type is <type here>". Iow, the "regular FK" to your core table has now become a reference to a VIEW on your core table, something you might want to declare as, say,
CREATE TABLE BUSMANS ... REFERENCES (SELECT ... FROM CORE WHERE TYPE='BM');
or
CREATE VIEW BM AS (SELECT ... FROM CORE WHERE TYPE='BM');
CREATE TABLE BUSMANS ... REFERENCES BM;
Once again something SQL doesn't allow you to do.
You can use all fields in the same table, but you'll need an extra table named Employee_Type (for example) and here you have to put Developer, Business Manager, ... of course with an unique ID. So your relation will be employee_type_id in Employee table.
Using PHP or ASP you can control what field you want to show depending the employee_type_id (or text) in a drop-down menu.
You are on the right track. You can set up PK/FK relationships from the general person table to each of the specialized tables. You should add a personID to all the tables to use for the relationship as you do not want to set up a relationship on name because it cannot be a PK as it is not unique. Also names change, they are a very poor choice for an FK relationship as a name change could cause many records to need to change. It is important to use separate tables rather than one because some of those things are in a one to many relationship. A Developer for instnce may have many differnt technologies and that sort of thing should NEVER be stored in a comma delimted list.
You could also set up trigger to enforce that records can only be added to a specialty table if the main record has a particular personType. However, be wary of doing this as you wil have peopl who change roles over time. Do you want to lose the history of wha the person knew when he was a developer when he gets promoted to a manager. Then if he decides to step back down to development (A frequent occurance) you would have to recreate his old record.

what are the advantages of defining a foreign key

What is the advantage of defining a foreign key when working with an MVC framework that handles the relation?
I'm using a relational database with a framework that allows model definitions with relations. Because the foreign keys are defined through the models, it seems like foreign keys are redundant. When it comes to managing the database of an application in development, editing/deleting tables that are using foreign keys is a hassle.
Is there any advantage to using foreign keys that I'm forgoing by dropping the use of them altogether?
Foreign keys with constraints(in some DB engines) give you data integrity on the low level(level of database).
It means you can't physically create a record that doesn't fulfill relation.
It's just a way to be more safe.
It gives you data integrity that's enforced at the database level. This helps guard against possibly error in application logic that might cause invalid data.
If any data manipulation is ever done directly in SQL that bypasses your application logic, it also guards against bad data that breaks those constraints.
An additional side-benefit is that it allows tools to automatically generating database diagrams with relationships inferred from the schema itself. Now in theory all the diagramming should be done before the database is created, but as the database evolves beyond its initial incarnation these diagrams often aren't kept up to date, and the ability to generate a diagram from an existing database is helpful both for reviewing, as well as for explaining the structure to new developers joining a project.
It might be a helpful to disable FKs while the database structure is still in flux, but they're good safeguard to have when the schema is more stabilized.
A foreign key guarantees a matching record exists in a foreign table. Imagine a table called Books that has a FK constraint on a table called Authors. Every book is guaranteed to have an Author.
Now, you can do a query such as:
SELECT B.Title, A.Name FROM Books B
INNER JOIN Authors A ON B.AuthorId = A.AuthorId;
Without the FK constraint, a missing Author row would cause the entire Book row to be dropped, resulting in missing books in your dataset.
Also, with the FK constraint, attempting to delete an author that was referred to by at least one Book would result in an error, rather than corrupting your database.
Whilst they may be a pain when manipulating development/test data, they have saved me a lot of hassle in production.
Think of them as a way to maintain data integrity, especially as a safeguard against orphaned records.
For example, if you had a database relating many PhoneNumber records to a Person, what happens to PhoneNumber records when the Person record is deleted for whatever reason?
They will still exist in the database, but the ID of the Person they relate to will no longer exist in the relevant Person table and you have orphaned records.
Yes, you could write a trigger to delete the PhoneNumber whenever a Person gets removed, but this could get messy if you accidentally delete a Person and need to rollback.
Yes, you may remember to get rid of the PhoneNumber records manually, but what about other developers or methods you write 9 months down the line?
By creating a Foreign Key that ensures any PhoneNumber is related to an existing Person, you both insure against destroying this relationship and also add 'clues' as to the intended data structure.
The main benefits are data integrity and cascading deletes. You can also get a performance gain when they're defined and those fields are properly indexed. For example, you wouldn't be able to create a phone number that didn't belong to a contact, or when you delete the contact you can set it to automatically delete all of their phone numbers. Yes, you can make those connections in your UI or middle tier, but you'll still end up with orphans if someone runs an update directly against the server using SQL rather than your UI. The "hassle" part is just forcing you to consider those connections before you make a bulk change. FKs have saved my bacon many times.

Clone a SQL Server Database w/ all new Primary Keys

We need to create an automated process for cloning small SQL Server databases, but in the destination database all primary keys should be distinct from the source (we are using UNIQUEIDENTIFIER ids for all primary keys). We have thousands of databases that all have the same schema, and need to use this "clone" process to create new databases with all non-key data matching, but referential integrity maintained.
Is there an easy way to do this?
Update - Example:
Each database has ~250 transactional tables that need to be cloned. Consider the following simple example of a few tables and their relationships (each table has a UniqueIdentifier primary key = id):
location
doctor
doctor_location (to doctor.id via doctor_id, to location.id via location_id)
patient
patient_address (to patient.id via patient_id)
patient_medical_history (to patient.id via patient_id)
patient_doctor (to patient.id via patient_id, to doctor.id via doctor_id)
patient_visit (to patient.id via patient_id)
patient_payment (to patient.id via patient_id)
The reason we need to clone the databases is due to offices being bought out or changing ownership (due to partnership changes, this happens relatively frequently). When this occurs, the tax and insurance information changes for the office. Legally this requires an entirely new corporate structure, and the financials between offices need to be completely separated.
However, most offices want to maintain all of their patient history, so they opt to "clone" the database. The new database will be stripped of financial history, but all patient/doctor data will be maintained. The old database will have all information up to the point of the "clone".
The reason new GUIDs are required is that we consolidate all databases into a single relational database for reporting purposes. Since all transactional tables have GUIDs, this works great ... except for the cases of the clones.
Our only solution so far has been to dump the database to a text file and search and replace GUIDs. This is ridiculously time consuming, so were hoping for a better way.
I'd do this by creating a basic restore of the database, and updating all values in the primary key to a new GUID.
To make this automatically update all the foreign keys you need to add constraints to the database with the CASCADE keyword i.e.
CREATE TABLE Orders
(
OrderID uniqueidentifier,
CustomerID uniqueidentifier REFERENCES Customer(CustomerID) ON UPDATE CASCADE,
etc...
Now when you update the Customer table's CustomerID the Order table's CustomerID is updated too.
You can do this to a whole table using a simple update query:
UPDATE TABLE Customer SET CustomerID = NewID();
You'd need to do this to each table with a uniqueidentifier as it's primary key.
You could create an Integration Services (SSIS) package to do this. You would create the new database in the control flow, then copy the data from the source to the destination using the data flow, which would also replace the GUIDs or make other needed transformations along the way.
If the DBs have a large number of tables, and only a few of them need to be modified, then you might be better off just making a copy of the MDF/LDF files, re-attaching them with a new DB name, and using a script to update the IDs.
The advantage of using SSIS is that it's easier to fully automate. The downside is that it might take a little longer to get things set up.

Database design

Our database is part of a (specialized) desktop application.
The primary goal is to keep data about certain events.
Events happen every few minutes.
The data collected about events changes frequently with new data groups being added in and old ones swapped out almost monthly (the data comes in definite groups).
I have to put together a database to track the events. A first stab at that might be to simply have a single big table where each row is an event and that is basically what our data looks like, but this seems undesirable because of our constantly changing groups of data (i.e. the number of columns would either keep growing perpetually or we would constantly having this months database incompatible with last months database - ugh!). Because of this I am leadning toward the following even though it creates circular references. (But maybe this is a stupid idea)
Create tables like
Table Events
Table Group of the Month 1
Table Group of the Month 2
...
Table Events has:
A primary key whose deletion cascade to delete rows with foreign keys referencing it
A nullable foreighn key for each data group table
Each data group table has:
A primary key, whose deletion cascades to null out foreign keys referencing it
Columns for the data in that group
A non-nullable foreign key back to the event
This still leaves you with a growing, changing Event Table (as you need to add new foreign key columns for each new data group), just much less drastically. However it seems more modular to me than one giant table. Is this a good solution to this situation? If not, what is?
Any suggestions?
P.S. We are using SQL Express or SQL Compact (we are currently experimenting with which one suits us best)
Why not use basically the single table approach and store the changing event data as XML in an XML column? You can even use XSD schemas to account for the changing data types, and you can add indexes on XML data if fast query performance on some XML data is required.
A permanently changing DB schema wouldn't really be an option if I were to implement such a database.
You should not have foreign keys for each data group table in the event table.
The event table already has an event_id which is in each data group table. So you can get from event to the child tables. Furthermore there will be old rows in the event table that are not in the latest data group table. So you really can't have a foreign key.
That said, I would wonder whether there is additional structure in the data group tables that can be used to clean up your design. Without knowing anything about what they look like I can't say. But if there is, consider taking advantage of it! (A schema that changes every month is a pretty bad code smell.)
Store your data at as granular a level as possible. It might be as simple as:
EventSource int FK
EventType int FK
Duration int
OccuredOn datetime
Get the data right and as simple as possible in the first place, and then
Aggregate via views or queries. Your instincts are correct about the ever changing nature of the columns - better to control that in T-SQL than in DDL.
I faced this problem a number of years ago with logfiles for massive armies of media players, and where I ultimately ended up was taking this data and creating a OLAP cube out of it. OLAP is another approach to database design where the important thing is optimizing it for reporting and "sliceability". It sounds like you're on that track, where it would be very useful to be able to look at a quick month's view of data, then a quarter's, and then back down to a week's. This is what OLAP is for.
Microsoft's technology for this is Analysis Services, which comes as part of Sql Server. If you didn't want to take the entire plunge (OLAP has a pretty steep learning curve), you could also look at doing a selectively denormalized database that you populated each night with ETL from your source database.
HTH.

Resources