Maintain Historical data changes in Parent-child table

Maintain Historical data changes in Parent-child table - sql-server

1 Employee has N Address. Here I need to maintain the historical information of Employee and Address changes if any changes is done by any users in these two table.
Table Employee:
Employee(
EmpID BIGINT PRIMARY KEY IDENTITY(1,1),
Name varchar(200),
EmpNumber varchar(200),
Createddate Datetime2)
Address Table :
Address(
AddID BIGINT PRIMARY KEY IDENTITY(1,1),
AddressLine1 varchar(300),
AddressLine2 varchar(300),
EmpID BIGINT NULL,
AddressType varchar(100),
Createddate Datetime2)
Above,EmpID is a foreign Key to the Employee table
Scenario I have to satisfy :
I should be able to track the changes of an individual address(Child table records) record of any employee.
I should be able to track the track the changes of a Employee(Parent table records) with child address record.
I thought following way:
Suppose, Initially it is in the state shown in image below
Solution 1:
Case : when child table gets updated
Now, I update a Add0001 Address Record, So i insert a new record in address table making previous record inactive as:
Case : when Parent Table gets updated
Now, When Parent Table gets update, I have history table for the Parent Table and i am moving old data to the history table and update the current records into the parent table as shown:
Solution 2 :
Case : When child table gets updated
Same as in solution 1
Case : When Parent Table gets updated
We insert a new record in the parent table making previous records inactive. In this case we get a new ID and that ID, we update as foreign key to the child tables as shown below:
Is this the best way of maintaining historical data of parent-child table together?
or is there any way i can keep the design so that i should be able to track the changes altogether of parent and child records data ?

There are quite a few ways to go about this sort of thing and what you're proposing is a perfectly valid approach... At least you appear to be pointed in the right direction.
There are a couple of changes that I would suggest...
1) Get rid of the "status" flag and use "begin" and "end" dates. The
specific names don't matter so long as you have them.
2) Both the begin and end date columns should be defined as "NOT
NULL" and begin should have a default constraint of GETDATE() or
CURRENT_TIMESTAMP. The end date should be defaulted to '99991231'.
Trust me and fight the urge to make the end date NULLable and giving "active" rows NULL end dates. '99991231' is, for all
practical purposes, the end of time. and can be used to to easily
identify the currently active rows.
3) I would suggest adding a trigger to the following:
a) prevent updates and/or deletes. Ideally this would be an insert
only table.
b) When new rows are inserted, update (yea I know what
"a)" says) the the "existing current" rows end date with the "new
current" rows begin date. By doing this, you will have a continuous,
gap free history.
Hope this helps. :)

Are you able to use Temporal tables and history tables introduced with SQL Server 2016?
These enable data professionals to keep history of data on related table, so you don't need to think about parent or child, etc.

If the parent data changes are not that frequent then you can maintain the history record of the parent also in the same table and update the foreign keys of the child tables.
Before Changes to Parent
Now if you change the name of the employee and add a new address, then update the employee id in the child table(Address).
After Changes to Parent
You can always get the addresses of the employee before the name has changed using the valid time. This way, we need not create an additional history table. But it may be little complex to fetch the history doing all the date comparisons.
Any suggestions are welcome.

Related

Adding last change timestamp to a table in snowflake

I have a lots of tables in Snowflake that I am updating them ( basically re-creating them) every day with a python script.
I can see the timestamp of the last time those tables have been changed in information schema of my database but how can I add the column or that information to one of our tables?
Assume that I have a table customer and I want to be able to see when was the last time that each row of that table has been changed. I can see this timestamp here:
SELECT CONVERT_TIMEZONE('Etc/GMT+9','UTC',last_altered) AS last_changed
FROM "XXXX"."INFORMATION_SCHEMA"."TABLES"
WHERE table_name='CUSTOMERS';
how to add this information to customer table?

If you would like to see that information your python program should add that information as additional columns in each row. We used to call these columns as 'WHO COLUMNS', below are the WHO COLUMNS that we added to each table in the final schema
Last Updated TimeStamp
Last Updated User
Creation Timestamp

The best option would be to add an additional audit column to the customer table with a default value as current_timestamp
Example:
CREATE TABLE CUSTOMER (column1 varchar, insert_date timestamp default current_timestamp())
In this example you can use insert_date to track when that record is inserted. The column would be auto-populated whenever you are inserting a row like this.
INSERT INTO CUSTOMER(column1) VALUES ('test')

try to add a new column as foreign key in existing table with data and existing data manipulation

A very simple example. I have web API with a table in the database
Employees
---------
Id
---------
Name
and for example, I have 50 records.
Now I have to Implement a feature to add extra info about the department. Because I have one to many relationships the new database schema is with department id
Employees Department
---------- -----------
Id Id
--------- -----------
Name Name
---------
DepartmentId
for this, I run the query (i use SQL server)
alter table Employees add constraint fk_employees_departmentid
foreign key (DepartmentId) references Department(Id);
But now I have some issues to handle
1)Now I have the 50 existing records without departmentId. However, I must add manually this value? What is the best practice? For 50 records it is possible but for 2000 records and more?
2) when I add departmentId column I set this column to have null values(is correct?), but as a foreign key, I don't want to allow null values. Can I change it or how can I handle it?

1)Now I have the 50 existing records without departmentId. However, I must add manually this value? What is the best practice? For 50 records it is possible but for 2000 records and more?
It depends. You could set up a new department for "unassigned" and assign them all to that; you could send out a spreadsheet to HR saying "the following employees don't have an assigned department; what department are they in? ps; don't remove the EmployeeID column from the sheet before you send it back; i need it to update the DB". It's very much a business contextual question, not a technical one. X thousand records is easy to handle.. It'll just take a bit of time to work through if you (or someone else) is doing it manually. This information is likely to be available somewhere else; you could perhaps send a list out to all department heads saying "are any of these guys yours? Please remove all the names you don't have in your team from this spreadsheet and send it back to me" then update the DB based on what you get back
As this is a one time operation you don't need anything particularly whizz for it - you can just get your Excel sheet back and in an empty column put:
="UPDATE emp SET departmentID = 5 WHERE id = " & A1
And fill it down to generate a bunch of update statements, copy the text into your query tool and hit go; don't need to get all fancy loading the sheet into a table, doing update joins etc - just hacky style sling together something in excel that will write the SQL for you, copy/paste/run. If HR have sent back the sheet with a list of department names, then put the dept name and id somewhere else on the sheet and use VLOOKUP or XLOOKUP to turn the name into the department number, then compose your SQL based on that
2) when I add departmentId column I set this column to have null values(is correct?), but as a foreign key, I don't want to allow null values. Can I change it or how can I handle it?
Foreign keyed columns are allowed to have NULL values - it isn't the FK that imposes a "No Nulls" restriction, it's the nullability of the column (alter the column to departmantid INT NOT NULL) that imposes that. A FK references a primary key and the primary key may not be null (or in some DB, at most one record can have a [partly] null PK), but you could just leave those departments null. If you do alter the column to be not null, you'll need to correct the NULL values first or the change will fail

Only allow current date/time on SQL Server insertion

I need a way to enforce a single value only within an inserted field, more precisely a DateTime field, that should always be set to the current date/time at insertion.
I am working on a university exercise and they want all the constraints to be done within the DB, ordinarily i would just put on a DEFAULT GetDate() and always use DEFAULT on inserts, but the exercise requirements prevent this.
Now for an integer i can do this(i've omitted the other fields, since they are irrelevant to the issue at hand) :
CREATE TABLE tester(
d INTEGER not null DEFAULT 3,
CONSTRAINT chkd CHECK(d = 3)
);
However what i want is the following :
CREATE TABLE tester(
d DATETIME not null DEFAULT GETDATE(),
CONSTRAINT chkd CHECK(d = ????????)
);
Re-iterating GetDate() in the check() will trigger an error on inserts, because the microseconds will cause a mismatch.
So i guess the first question is, is this possible? and if so(i hope so) how?

Don't track the date/time in the tester table. Instead, have a separate table with a column that references the ID of the tester table as a foreign key constraint. The new table will have one other column, a DateTime column. On insertion into the tester table, a trigger can be fired that will insert a row into the new table containing the ID of the newly-created tester row as well as the current date/time.

Based upon Ryan's comment got to this answer which is working
CREATE TRIGGER trigger_date ON [dbo].[tester]
FOR INSERT
AS
BEGIN
UPDATE tester SET d = GETDATE() WHERE id IN (SELECT id FROM INSERTED);
END
GO

Relation between Inserted and Deleted table's in a trigger

Let's say I have this table :
Car
----------------------
Name|Date|Color
The primary key is a combination of Name and Date.
On the update, if the initial Color of the updated row is Blue and the new one is Red, I want to keep a trace of this update.
This is what I did :
ALTER TRIGGER TraceTrigger
ON Car
FOR UPDATE
AS
BEGIN
INSERT INTO TraceTable
SELECT
del.Name,
del.Date,
del.Color,
ins.Name,
ins.Date,
ins.Color
FROM deleted as del
INNER JOIN inserted as ins
ON del.Name = ins.Name AND del.Date = ins.Date
WHERE del.color = 'Blue' AND ins.Color = 'Red'
END
This example is pretty simple. It show that I need to keep a trace of X old value and X new value from the updated row.
But imagine if the Name can be modified (I know we should not modify PK, but in this situation, it is possible). Given that the primary key can change, sometimes, the relation between the INSERTED and DELETED table's will just not work.
So, it is possible to keep the relation between the deleted row and the inserted row when the PK can be updated to a different value ?

You needn't bother recording both INSERTED and DELETED. Just INSERTED is what I usually do, otherwise you'd end up with 2 of every bit of information. You'll record it when its inserted, then you'll record the identical data when its deleted.
Say you've got a table that just has an ID and a Name field, the trace for that recording both INSERTED and DELETED would look like:
OldID OldName NewID NewName
1 Harry 1 Henry
1 Henry 1 James
1 James 1 Thomas
As you can see, you're doubling up data. The left 2 columns are identical to the right columns except shifted up a row.
In terms of the primary key, if you know you might have to change the PK whilst wanting to maintain a history, I'd strongly recommend adding a surrogate key to the table (e.g ID) that you NEVER change, that way you are free to alter the name column as you wish.

You never really change a primary key; logically, you actually create a new entity (record / row ). It is, in effect, a completely new thing.
There are a number of ways to keep track of this change, but here are two:
Create a row identifier like an IDENTITY column. It's not really a surrogate key, because a surrogate key should always be 1-1 with the proper natural key. Use this if name + date is not really the primary key and you can't create one (yuck - you have a database design issue).
Update the data in your trace table to match the new value anytime a value in the PK changes. This is the proper solution if your database design is correct. You may be able to implement this with an ON UPDATE CASCADE foreign key constraint.

Will a trigger be able to copy a primary key that's an identity id?

I want to do after I an INSERT in table X to copy that record into another History table immediately.
Now the table has the primary key as an Identity column, so the record won't have an primary key id until it is actually inserted.
My question is if I do a trigger on this record will I get the identity id for that record or will it still be blank?

Yes the identity is available in the trigger but make sure you get that id correctly.
##identity, SCOPE_IDENTITY etc are NOT what you want to do in a trigger!
SELECT #id = id FROM inserted
Is also a bad idea.
Always write your triggers to expect multiple changes being made simultaneously. The above approaches will all cause subtle but important errors when you insert more than one record into the table at a time.
The correct approach is to insert into your audit table FROM the inserted table
i.e.
INSERT INTO myAuditTable(Id, Datetime, user)
SELECT id, GETDATE(), USER_NAME())
FROM inserted

if you do the 'after insert' trigger, the record is already there with a value for the identity column.

Just make sure you declare the trigger as "AFTER" insert, not "FOR" or "INSTEAD OF" (guess you wouldn't use the last one... ;)
http://msdn.microsoft.com/en-us/library/ms189799.aspx

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight