SQL Server - Temporal Table - Selected Columns Only

SQL Server - Temporal Table - Selected Columns Only - sql-server

One of the requirements of a recent project I was working on, was maintaining history of database table data as part of an audit trail. My first thought about the technical solution was to use triggers, but after some research I learned about SQL Server temporal tables (Part of core SQL Server 2016). I did a lot of research around this and see that Temporal tables can be put to good use.
More on temporal tables: Managing Temporal Table History in SQL Server 2016
However, I want the data in temporal tables to be created only when few columns are changed.
CREATE TABLE dbo.Persons
(
ID BIGINT IDENTITY(1,1) NOT NULL,
FirstName NVARCHAR(50) NOT NULL,
LastName NVARCHAR(50),
PhoneNumber NVARCHAR(20)
)
Now if I create the temporal table on top of this (SYSTEM_VERSIONING = On), I want the data to be inserted in the Temporal table only when Phone Number is changed and not the first name and last name.

Unfortunately, that's not the way it works. Like the link in your post says, "system versioning is all-or-nothing". Honestly, your first instinct is likely your best option - every other method of doing it (CDC, replication, system versioning..) will capture more data than you want and you will have to pare the results down after the fact.
If you really want to use system versioning, you'd just have to use one of the options presented in the provided link: delete unwanted rows and/or update unwanted columns to NULL values.
I would recommend going with your first instinct and use triggers to implement something like a type 4 slowly changing dimension. It's the most straightforward method of getting the specific data you want.

You could create one table for the attributes you want history for (and you'll set system_versioning = ON) and a second table with the attributes you don't want history for. Between the two tables you'll have a 1-to-1 relation.

Related

Data Versioning/Auditing in SQL Database best patterns

I have a Job table where I post the Job description, posted date, qualifications etc.. with below schema
Job(Id ##Identity PK, Description varchar (200), PostedOn DateTime, Skills Varchar(50))
Other attributes of jobs we would like to track such as Department, team etc will be stored in another table as Attriibutes of Job
JobAttributesList(Id ##Identity PK, AttributeName varchar(50))
JobAttributes(JobID ##Identity PK, AttributeID FK REFERENCES JobAttributesList.Id, AttributeValue varchar(50))
Now if a job description has changed, we do not want to lose old one and hence keep track of versioning.What are the best practices? we may have to scale later by adding more versioning tables
A strategy would be to use a History table for all the tables we want to enable versioning but that would add more and more tables as we add versioning requirements and I feel its schema duplication.

There is a difference between versioning and auditing. Versioning only requires that you keep the old versions of the data somewhere. Auditing typically requires that you also know who made a change.
If you want to keep the old versions in the database, do create an "old versions" table for each table you want to version, but don't create a new table for every different column change you want to audit.
I mean, you can create a new table for every column, whose only columns are audit_id, key, old_column_value, created_datetime, and it can save disk space if the original table is very wide, but it makes reconstructing the complete row for a given date and time extraordinarily expensive.
You could also keep the old data in the same table, and always do inserts, but over time that becomes a performance problem as your OLTP table gets way, way too big.
Just have a single table with all the columns of the original table, which you always insert into, which you can do inside an update, delete trigger on the original table. You can tell which columns have changed either by adding a bit flag for every column, or just determine that at select time by comparing data in one row with data in the previously audited row for the given key.
I would absolutely not recommend creating a trigger which concatenates all of the values cast to varchar and dumps it all into a single, universal audit table with an "audited_data" column. It will be slow to write, and impossible to usefully read.
If you want to use this for actual auditing, and not just versioning, then either the user making the change must be captured in the original table so it is available to the trigger, or you need people to be connecting with specific logins, in which case you can use transport information like original_login(), or you need to set a value like context_info or session_context on the client side.

Maintaining rows in a SQL Server junction table when rows in other table are inserted and updated

Can you show sample coding to create a trigger or stored procedure that maintains rows a SQL Server junction table when changes are made to the Authors and BookTitles tables such as inserting and updating rows in those tables?
We have the following tables:
Authors:
ID
NAME
ZIP
AND SOME MORE COLUMNS
BookTitles:
ID
TITLE
ISBN
AND SOME MORE COLUMNS
This is the table we will use as our junction table:
AuthorTitles:
ID
AUTHOR_ID
BOOK_TITLE_ID
We would like to do this in a trigger instead of doing the coding in our VB.Net form.
All help will be appreciated.
The above table structures were simplified to show what we are trying to do.
We are implementing a junction table for teachers and programs.
Here is a photo of the actual system:

Unless you have Foreign Key constraints that require at least one Book per Author and/or vice-versa, then the only cases that you should need special handling are for the Deletes to BookTitles or Authors. They can be done like this:
CREATE PROC BookTitle_Delete(#Book_ID As INT) As
-- First remove any children in the junction table
DELETE FROM AuthorTitles WHERE BOOK_TITLE_ID = #Book_ID
-- Now, remove the parent record on BookTitles
DELETE FROM BookTitles WHERE ID = #Book_ID
go
In general, you want to resist the temptation to do Table Maintenance and other things like this in Triggers. As triggers are invisible, add additional overhead, can cause maintenance problems for the DBA's, and can lead to many subtle transactional/locking complexities and performance issues. Triggers should be reserved for simple things that really should be hidden from the client application (like auditing) and that cannot be practically implemented in some other way. This is not one of those cases.
If you really want an "invisible" way of doing this then just implement a Cascading Foreign-Key. I do not recommend this either, but it is still preferable to a trigger.

Allowing individual columns not to be tracked in Merge Replication

Using Merge Replication, I have a table that for the most part is synchronized normally. However, the table contains one column is used to store temporary, client-side data which is only meaningfully edited and used on the client, and which I don't have any desire to have replicated back to the server. For example:
CREATE TABLE MyTable (
ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY,
Name NVARCHAR(200),
ClientCode NVARCHAR(100)
)
In this case, even if subscribers make changes to the ClientCode column in the table, I don't want those changes getting back to the server. Does Merge Replication offer any means to accomplish this?
An alternate approach, which I may fall back on, would be to publish an additional table, and configure it to be "Download-only to subscriber, allow subscriber changes", and then reference MyTable.ID in that table, along with the ClientCode. But I'd rather not have to publish an additional table if I don't absolutely need to.
Thanks,
-Dan

Yes, when you create the article in the publication, don't include this column. Then, create a script that adds this column back to the table, and in the publication properties, under snapshot, specify that this script executes after the snapshot is applied.
This means that the column will exist on both the publisher and subscriber, but will be entirely ignored by replication. Of course, you can only use this technique if the column(s) to ignore are nullable.

Clone a SQL Server Database w/ all new Primary Keys

We need to create an automated process for cloning small SQL Server databases, but in the destination database all primary keys should be distinct from the source (we are using UNIQUEIDENTIFIER ids for all primary keys). We have thousands of databases that all have the same schema, and need to use this "clone" process to create new databases with all non-key data matching, but referential integrity maintained.
Is there an easy way to do this?
Update - Example:
Each database has ~250 transactional tables that need to be cloned. Consider the following simple example of a few tables and their relationships (each table has a UniqueIdentifier primary key = id):
location
doctor
doctor_location (to doctor.id via doctor_id, to location.id via location_id)
patient
patient_address (to patient.id via patient_id)
patient_medical_history (to patient.id via patient_id)
patient_doctor (to patient.id via patient_id, to doctor.id via doctor_id)
patient_visit (to patient.id via patient_id)
patient_payment (to patient.id via patient_id)
The reason we need to clone the databases is due to offices being bought out or changing ownership (due to partnership changes, this happens relatively frequently). When this occurs, the tax and insurance information changes for the office. Legally this requires an entirely new corporate structure, and the financials between offices need to be completely separated.
However, most offices want to maintain all of their patient history, so they opt to "clone" the database. The new database will be stripped of financial history, but all patient/doctor data will be maintained. The old database will have all information up to the point of the "clone".
The reason new GUIDs are required is that we consolidate all databases into a single relational database for reporting purposes. Since all transactional tables have GUIDs, this works great ... except for the cases of the clones.
Our only solution so far has been to dump the database to a text file and search and replace GUIDs. This is ridiculously time consuming, so were hoping for a better way.

I'd do this by creating a basic restore of the database, and updating all values in the primary key to a new GUID.
To make this automatically update all the foreign keys you need to add constraints to the database with the CASCADE keyword i.e.
CREATE TABLE Orders
(
OrderID uniqueidentifier,
CustomerID uniqueidentifier REFERENCES Customer(CustomerID) ON UPDATE CASCADE,
etc...
Now when you update the Customer table's CustomerID the Order table's CustomerID is updated too.
You can do this to a whole table using a simple update query:
UPDATE TABLE Customer SET CustomerID = NewID();
You'd need to do this to each table with a uniqueidentifier as it's primary key.

You could create an Integration Services (SSIS) package to do this. You would create the new database in the control flow, then copy the data from the source to the destination using the data flow, which would also replace the GUIDs or make other needed transformations along the way.
If the DBs have a large number of tables, and only a few of them need to be modified, then you might be better off just making a copy of the MDF/LDF files, re-attaching them with a new DB name, and using a script to update the IDs.
The advantage of using SSIS is that it's easier to fully automate. The downside is that it might take a little longer to get things set up.

T-SQL Add Column In Specific Order

Im a bit new to T-SQL, Coming from a MySQL background Im still adapting to the different nuances in the syntax.
Im looking to add a new column AFTER a specific one. I've found out that AFTER is a valid keyword but I don't think it's the right one for the job.
ALTER TABLE [dbo].[InvStockStatus]
ADD [Abbreviation] [nvarchar](32) DEFAULT '' NOT NULL ;
This is my current query, which works well, except it adds the field at the end of the Table, Id prefer to add it after [Name]. What's the syntax im looking for to represent this?

You can't do it like that
for example if you have a table like this
create table TestTable(id1 int,id3 int)
and you want to add another column id2 between id1 and id3 then here is what SQL Server does behind the scene if you use the designer
BEGIN TRANSACTION
SET QUOTED_IDENTIFIER ON
SET ARITHABORT ON
SET NUMERIC_ROUNDABORT OFF
SET CONCAT_NULL_YIELDS_NULL ON
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
COMMIT
BEGIN TRANSACTION
GO
CREATE TABLE dbo.Tmp_TestTable
(
id1 int NULL,
id2 int NULL,
id3 int NULL
) ON [PRIMARY]
GO
ALTER TABLE dbo.Tmp_TestTable SET (LOCK_ESCALATION = TABLE)
GO
IF EXISTS(SELECT * FROM dbo.TestTable)
EXEC('INSERT INTO dbo.Tmp_TestTable (id1, id3)
SELECT id1, id3 FROM dbo.TestTable WITH (HOLDLOCK TABLOCKX)')
GO
DROP TABLE dbo.TestTable
GO
EXECUTE sp_rename N'dbo.Tmp_TestTable', N'TestTable', 'OBJECT'
GO
COMMIT
As you can see if you have a lot of data this can be problematic, why does it matter where the column is located? just use
select col1,col2,col3 from table

The sequence of columns is really irrelevant in a strict (functional) sense, in any RDBMS - it's just a "nicety" to have for documentation or humans to look at.
SQL Server doesn't support any T-SQL commands to order the columns in any way. So there is no syntax in T-SQL to accomplish this.
The only way to change that is to use the visual table designer in SSMS, which really recreates the whole table from scratch, when you move around columns or insert columns in the middle of a table.

While from a functional database perspective it is correct that the tuples can appear in any order.
However a database does not exist inside a vacuum. There is always a human that will want to read the table schema (dbas, devs) and get there head around it and maintain or write queries against it.
In the past, I've used conventions for a table's columns such as ordering a table with
primary key(s) first
then foreign keys
then frequently used columns
then other columns
and lastly audit related columns
and these help when scanning a table. Unfortunately, it appears you have to jump through hoops to maintain any order, so now I have to question whether it's worth having and maintaining these conventions. My new rule is just add it to the end.
If you are really worried about order from a readability perspective, you should create your own 'readability' views (perhaps in a different schema) in any order you feel like. You could have multiple views of the same table (one for just the core columns and another including stuff that usually isn't relevant).
It would be nice to be able to be able to re-order columns in SQL Server database diagrams (as a display thing only), but this isn't possible.

You should always add fields only at the end. You should select fields in the order you want, but never restructure an existing table to add a column inthe middle. This is likely to break some things (where people did dumb things like select * or inserts without specifying the columns granted people shouldn't do those things, but they do).
Recreating the table can be a long time-consuming process for no gain whatsoever and can cause lots of user complaints and lockups while it is going on.

The schema comparison tools I have seen will create a new table with the desired ordering and then copy the data from the old table to the new one (with some renaming magic to make the new one resemble the old). Given how akward this approach is, I figure there isn't a T-SQL statement to add a new column in a specific place.

This is a safe work around without using temp table. After you add the column towards the end, simply go to SQL Sever Management Studio. Click on the table, select -> Design (or Modify) and drag the last column to where ever position you want.
This way you do not have to worry about loosing data and indexes. The only other option is to recreate the table which can be problematic if you have large data.
This answer is for helping other people and not intended to be accepted as answer.

Technically, or perhaps I should say, academically, the order in which columns are added to a table, or the order in which they are stored in the database's internal storage model, should not be of any concern to you. You can simply list the columns in the Select clause of your SQL queries to control the order that columns or computed expressions appear in the output of any query you run. Internally the database is free to store the actual data any way it sees fit to optimize storage, and or help align data elements with disk and/or memory boundaries.

A fair amount of time has passed since this question was posted but it's worth noting that while the underlying raw SQL code to place columns in a specific order hasn't changed the process of generating scripts to do it is taken care of if you choose to use SQL Server Data Tools within Visual Studio to manage your database. The database you deploy to will always have the columns in the order that you specify in your project.