How to implement Auditing/versioning of Table Modifications on PostgreSQL - database

We're implementing a New system using Java/Spring/Hibernate on PostgreSQL. This system needs to make a copy of Every Record as soon as a modification/deletion is done on the record(s) in the Tables(s). Later, the Audit Table(s) will be queried by Reports to display the data to the users.
I was planning to implement this auditing/versioning feature by having a trigger on the table(s) which would make a copy of the modified row(deleted row) "TO" a TABLE called ENTITY_VERSIONS which would have about 20 columns called col1, col2, col3, col4, etc which would store the columns from the above Table(s); However, the problem is that if there is more than 1 Table to be versioned and ONLY 1 TARGET table(ENTITY_VERSIONS) to store all the tables' versions, how do I design the TARGET table ?
OR is it better that there will be a COPY of the VERSION Table for each Table that needs versioning ?
It will be bonus if some pointers towards PostgreSQL Triggers (and associated Stored Procedure ) code for implementing the auditing/versioning can be shared.
P.S : I looked at Suggestions for implementing audit tables in SQL Server? and kinda like the answer except I would NOT know what type should OldValue and NewValue be ?
P.P.S : If the Tables use SOFT DELETEs (phantom deletes) instead of HARD deletes, do any of your advice change ?

I would have a copy of each table to hold the versions of that table you wish to keep. It sounds like a bit of a nightmare to maintain and use a global versioning table.
This link in the Postgres documentation shows some audit trigger examples in Postgres.

In global table all columns can be stored in single column as hstore type. I just tried audit and I it is works great, I recommend it. Awesome audit table example tracks all changes in single table by simply adding a trigger onto the tables you want to begin to keep audit history on. all changes are stored in as hstore type- works for v 9.1+
this link

Related

Rename table or column in SQL server without breaking existing apps

I have an existing database in MS SQL server and want to rename some tables and columns because the names currently used aren't accurate to what it represents.
I have multiple web and desktop applications that access the database, using Entity Framework (code first). Too many to update in one go and cannot afford for all apps to start working.
I was thinking it was nice is SQL server allowed a 'permanent' alias for tables and columns but I don't think this feature exists.
Or I was wondering if there was a way in EF to have two names for the same property?
For the tables, you could rename them and then create a synonym with the old name pointing to the new name.
For the columns, changing their name will break your application.You could create computed columns with the old name as well, that simply display the value of the new named column though (but this seems a little silly).
Note, however, that a computed column cannot reference another computed column, so you would have to duplicate the column in its entirety. That could lead to problems down the line if you don't update the definition of both columns.
A view containing a simple select statement acts exactly like a table. You really need to fix this properly across the database and applications. However if you want to go the view route, I suggest you do this:
Say you have a table called MyTable that you rename TheTable and with a column called MyColumn that you want to rename to TheColumn
Create a schema, say, new
Move the original table into it with this ALTER SCHEMA new TRANSFER MyTable
Rename the table and column.
Now you have a table called new.TheTable with a column called TheColumn. Everything is broken
Lastly, create a view that looks just like the old table
CREATE VIEW dbo.MyTable
AS
SELECT Column1, Column2, Column3, TheColumn As MyColumn
FROM new.TheTable;
Now everything works again.
All your fixed 'new' tables are in the new schema
However now everything is extra complicated
This is basically an illustration that you should just fix it properly across the whole app one at a time with careful change management. Definitely don't complicate it with triggers
Since you are using code first with multiple web and desktop applications, you are likely managing database changes from one place through migrations and ignoring changes other places.
You can create an empty migration and add code that will change the table name and column names to what you want. The migration should then create a view that will select from that table with the original table and column names. When you apply this migration, everything should still be working as normal from all applications. There are no model changes since you didn’t touch the model classes. Inserts, updates, and deletes will still happen through the view. There is no need for potentially buggy triggers or synonyms on the table in this option.
Now that you have the table changed, you can focus on the application code. If it helps, you can add annotations over the column and table names and start refactoring the code. You need to make sure you don’t make model changes that will break the other apps. If apps ignore model changes, you can get away with adding annotations over the columns and classes on all the apps before refactoring. You can get rid of the view sooner this way.

How to use the pre-copy script from the copy activity to remove records in the sink based on the change tracking table from the source?

I am trying to use change tracking to copy data incrementally from a SQL Server to an Azure SQL Database. I followed the tutorial on Microsoft Azure documentation but I ran into some problems when implementing this for a large number of tables.
In the source part of the copy activity I can use a query that gives me a change table of all the records that are updated, inserted or deleted since the last change tracking version. This table will look something like
PersonID Age Name SYS_CHANGE_OPERATION
---------------------------------------------
1 12 John U
2 15 James U
3 NULL NULL D
4 25 Jane I
with PersonID being the primary key for this table.
The problem is that the copy activity can only append the data to the Azure SQL Database so when a record gets updated it gives an error because of a duplicate primary key. I can deal with this problem by letting the copy activity use a stored procedure that merges the data into the table on the Azure SQL Database, but the problem is that I have a large number of tables.
I would like the pre-copy script to delete the deleted and updated records on the Azure SQL Database, but I can't figure out how to do this. Do I need to create separate stored procedures and corresponding table types for each table that I want to copy or is there a way for the pre-copy script to delete records based on the change tracking table?
You have to use a LookUp activity before the Copy Activity. With that LookUp activity you can query the database so that you get the deleted and updated PersonIDs, preferably all in one field, separated by comma (so its easier to use in the pre-copy script). More information here: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
Then you can do the following in your pre-copy script:
delete from TableName where PersonID in (#{activity('MyLookUp').output.firstRow.PersonIDs})
This way you will be deleting all the deleted or updated rows before inserting the new ones.
Hope this helped!
In the meanwhile the Azure Data Factory provides the meta-data driven copy task. After going through the dialogue driven setup, a metadata table is created, which has one row for each dataset to be synchronized. I solved this UPSERT problem by adding a stored procedure as well as a table type for each dataset to be synchronized. Then I added the relevant information in the metadata table for each row like this
{
"preCopyScript": null,
"tableOption": "autoCreate",
"storedProcedure": "schemaname.UPSERT_SHOP_SP",
"tableType": "schemaname.TABLE_TYPE_SHOP",
"tableTypeParameterName": "shops"
}
After that you need to adapt the sink properties of the copy task like this (stored procedure, table type, table type parameter name):
#json(item().CopySinkSettings).storedProcedure
#json(item().CopySinkSettings).tableType
#json(item().CopySinkSettings).tableTypeParameterName
If the destination table does not exist, you need to run the whole task once before adding the above variables, because auto-create of tables works only as long as no stored procedure is given in the sink properties.

Change tracking -- simplest scenario

I am coding in ASP.NET C# 4. The database is SQL Server 2012.
I have a table that has 2000 rows and 10 columns. I want to load this table in memory and if the table is updated/inserted in any way, I want to refresh the in-memory copy from the DB.
I looked into SQL Server Change Tracking, and while it does what I need, it appears I have to write quite a bit of code to select from the change functions -- more coding than I want to do for a simple scenario that I have.
What is the best (simplest) solution for this problem? Do I go with CacheDependency?
I currently have a similar problem: I'm implementing a rest service that returns a table with 50+ columns and I want to cache the data on the client to reduce trafic.
I'm thinking about this implementation:
All my tables have the fields
ID AutoIncrement (primary key)
Version RowVersion (a numeric value that will be incremented
every time the record is updated)
To calculate a "fingerprint" of the table I use the select
select count(*), max(id), sum(version) from ...
Deleting records changes the first value, inserting the second value and updating the third value.
So if one of the three values changes, i have to reload the table.

trigger insertions into same table

I have many tables in my database which are interrelated. I have a table (table one) which has had data inserted and the id auto increments. Once that row has an ID i want to insert this into a table (table three) with another set of ID's which comes from a form(this data will also be going into a table, so it could from from that table), the same form as the data which went into the first table came from.
The two ID's together make the primary key of the third table.
How can I do this, its to show that more than one ID is joined to a single ID for something else.
Thanks.
You can't do that through a trigger as the trigger only has available to it the data that you already inserted not data that is currenlty only residing in your user interface.
Normally how you handle this situation is that you write a stored proc that inserts the meeting, returns the id value (using scope_identity() in SQL Server, but I'm sure other databases would have method to return the auto-generated id as well). Then you would use that value to insert to the other table with the other values you need for that table. You would of course want to wrap the whole thing in a transaction.
I think you can probably do what you're describing (just write the INSERTs to table 3) in the table 1 trigger) but you'll have to put the additional info for the table 3 rows into your table 1 row, which isn't very smart.
I can't see why you would do that instead of writing the INSERTs in your code, where someone reading it can see what's happening.
The trouble with triggers is that they make it easy to hide business logic in the database. I think (and I believe I'm in the majority here) that it's easier to understand, manage, maintain and generally all-round deal with an application where all the business rules exist in the same general area.
There are reasons to use triggers (for propagating denormalised values, for example) just as there are reasons for useing stored procedures. I'm going to assert that they are largely related to performance-critical areas. Or should be.

Updateable view in mssql with multiple tables and computed values

Huge database in mssql2005 with big codebase depending on the structure of this database.
I have about 10 similar tables they all contain either the file name or the full path to the file. The full path is always dependent on the item id so it doesn't make sense to store it in the database. Getting useful data out of these tables goes a little like this:
SELECT a.item_id
, a.filename
FROM (
SELECT id_item AS item_id
, path AS filename
FROM xMedia
UNION ALL
-- media_path has a different collation
SELECT item_id AS item_id
, (media_path COLLATE SQL_Latin1_General_CP1_CI_AS) AS filename
FROM yMedia
UNION ALL
-- fullPath contains more than just the filename
SELECT itemId AS item_id
, RIGHT(fullPath, CHARINDEX('/', REVERSE(fullPath))-1) AS filename
FROM zMedia
-- real database has over 10 of these tables
) a
I'd like to create a single view of all these tables so that new code using this data-disaster doesn't need to know about all the different media tables. I'd also like use this view for insert and update statements. Obviously old code would still rely on the tables to be up to date.
After reading the msdn page about creating views in mssql2005 I don't think a view with SCHEMABINDING would be enough.
How would I create such an updateable view?
Is this the right way to go?
Scroll down on the page you linked and you'll see a paragraph about updatable views. You can not update a view based on unions, amongst other limitations. The logic behind this is probably simple, how should Sql Server decide on what source table/view should receive the update/insert?
You can modify partitioned views, provided they satisfy certain conditions.
These conditions include having a partitioning column as a part of the primary key on each table, and having a set on non-overlapping check constraints for the partitioning column.
This seems to be not your case.
In your case, you may do either of the following:
Recreate you tables as views (with computed columns) for your legacy soft to work, and refer to the whole table from the new soft
Use INSTEAD OF triggers to update the tables.
If a view is based on multiple base tables, UPDATE statement on the view may or may not work depending on the UPDATE statement. If the UPDATE statement affects multiple base tables, SQL server throws an error. Whereas, if the UPDATE affects only one base table in the view then the UPDATE will work (Not correctly always). The insert and delete statements will always fail.
INSTEAD OF Triggers, are used to correctly UPDATE, INSERT and DELETE from a view that is based on multiple base tables. The following links has examples along with a video tutorial on the same.
INSTEAD OF INSERT Trigger
INSTEAD OF UPDATE Trigger
INSTEAD OF DELETE Trigger

Resources