Postgresql Database Design Questions (Trigger vs Function) - database

I am building a database for a CMS system and I am at a point where I am no longer sure which way to go anymore, noting that all of the business logic is in the database layer (We use PostgreSQL 13 and the application is planned to be a SaaS):
1- The application has folders and documents associated with them, so if we move a folder (Or a group of folders in bulk) from its parent folder to another, then the permissions of the folder as well as the underlying documents must follow the permissions of the new location (An update to a permissions table is sent), is this better enforced via an after statement trigger, or do we need to force all of the code to call a single method to move the folder, documents and update their permissions.
2- Wouldn't make more sense to have an AFTER statement trigger rather than an AFTER row trigger in all cases since they do the same thing, but with statement triggers you can process all affected rows in bulk (Thus done more efficiently) , so if I was to enforce inserting a record in another table if an update or an insert takes place, it will have a similar performance for a a single row, but will be a lot faster if they were 1000 rows in the statement level trigger (Since I can easily do INSERT INTO .. SELECT * FORM new_table).

You need a row level trigger or a statement level trigger with transition tables, so that you know which rows were affected by the statement. To avoid repetition, the latter might be a better choice.
Rather than modifying permissions whenever you move an object, you could figure out the permissions when you query the table by recursively following the chain of containment. The question here is if you prefer to do the extra work when you modify the data or when you query the data.

Related

Stored procedure to update different columns

I have an API that i'm trying to read that gives me just the updated field. I'm trying to take that and update my tables using a stored procedure. So far the only way I have been able to figure out how to do this is with dynamic SQL but i would prefer to not do that if there is a way not to.
If it was just a couple columns, I'd just write a proc for each but we are talking about 100 fields and any of them could be updated together. One ticket might just need a timestamp updated at this time, but the next ticket might be a timestamp and who modified it while the next one might just be a note.
Everything I've read and have been taught have told me that dynamic SQL is bad and while I'll write it if I have too, I'd prefer to have a proc.
YOU CAN PERHAPS DO SOMETHING LIKE THIS:::
IF EXISTS (SELECT * FROM NEWTABLE NOT IN (SELECT * FROM OLDTABLE))
BEGIN
UPDATE OLDTABLE
SET OLDTABLE.OLDRECORDS = NEWTABLE.NEWRECORDS
WHERE OLDTABLE.PRIMARYKEY= NEWTABLE.PRIMARYKEY
END
The best way to solve your problem is using MERGE:
Performs insert, update, or delete operations on a target table based on the results of a join with a source table. For example, you can synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table.
As you can see your update could be more complex but more efficient as well. Using MERGE requires some proficiency, but when you start to use it you'll use it with pleasure again and again.
I am not sure how your business logic works that determines what columns are updated at what time. If there are separate business functions that require updating different but consistent columns per function, you will probably want to have individual update statements for each function. This will ensure that each process updates only the columns that it needs to update.
On the other hand, if your API is such that you really don't know ahead of time what needs to be updated, then building a dynamic SQL query is a good idea.
Another option is to build a save proc that sets every user-configurable field. As long as the calling process has all of that data, it can call the save procedure and pass every updateable column. There is no harm in having a UPDATE MyTable SET MyCol = #MyCol with the same values on each side.
Note that even if all of the values are the same, the rowversion (or timestampcolumns) will still be updated, if present.
With our software, the tables that users can edit have a widely varying range of columns. We chose to create a single save procedure for each table that has all of the update-able columns as parameters. The calling processes (our web servers) have all the required columns in memory. They pass all of the columns on every call. This performs fine for our purposes.

Using triggers to maintain data integrity from similar tables in two databases

I have two SQL Server databases.
One is being used as the back-end for a Ruby-On-Rails system that we are transitioning from but is still in use because of the Ruby apps we are rewriting in ASP.NET MVC.
The databases have similar tables, but not identical, for the Users, Roles and Roles-Users tables.
I want to create some type of trigger to update the user and roles-users tables on each database when a modification is made on the other database of the same table.
I can't just use the users table on the original database because Ruby has a different hash function for the passwords, but I want to ensure that changes on one system are reflected on the other instanter.
I also want to avoid the obvious problem that an update on the one database triggers an update on the other which triggers an update on the first and the process repeats itself until the server crashes or something similarly undesirable happens or a deadlock occurs.
I do not want to use database replication.
Is there a somewhat simple way to do this on a transaction per transaction basis?
EDIT
The trigger would be conceptually something like this:
USE Original;
GO
CREATE TRIGGER dbo.user_update
ON dbo.user WITH EXECUTE AS [cross table user identity]
AFTER UPDATE
AS
BEGIN
UPDATE Another.dbo.users SET column1=value1, etc., WHERE inserted.ID = Another.dbo.users.ID;
END
The problem I am trying to avoid is a recursive call.
Another.dbo.users will have a similar trigger in place on it because the two databases have different types of applications, Ruby-On-Rails on the one and ASP.NET MVC on the other that may be working on data that should be the same on the two databases.
I would add a field to both tables if possible. When adding or updating a table the 'check' field would be set to 0. The trigger would look at this field and if it is 0, having been generated by an application event, then the trigger fires the insert/update into the second table but the check field would have a 1 instead of 0.
So when the trigger fires on the second table it will skip the insert back into table one.
This will solve the recursive problem.
If for some reason you can not add the check field, you can use a separate table with the primary key to the table and the check field. This need more coding but would work also.

Add DATE column to store when last read

We want to know what rows in a certain table is used frequently, and which are never used. We could add an extra column for this, but then we'd get an UPDATE for every SELECT, which sounds expensive? (The table contains 80k+ rows, some of which are used very often.)
Is there a better and perhaps faster way to do this? We're using some old version of Microsoft's SQL Server.
This kind of logging/tracking is the classical application server's task. If you want to realize your own architecture (there tracking architecture) do it on your own layer.
And in any case you will need application server there. You are not going to update tracking field it in the same transaction with select, isn't it? what about rollbacks? so you have some manager who first run select than write track information. And what is the point to save tracking information together with entity info sending it back to DB? Save it into application server file.
You could either update the column in the table as you suggested, but if it was me I'd log the event to another table, i.e. id of the record, datetime, userid (maybe ip address etc, browser version etc), just about anything else I could capture and that was even possibly relevant. (For example, 6 months from now your manager decides not only does s/he want to know which records were used the most, s/he wants to know which users are using the most records, or what time of day that usage pattern is etc).
This type of information can be useful for things you've never even thought of down the road, and if it starts to grow large you can always roll-up and prune the table to a smaller one if performance becomes an issue. When possible, I log everything I can. You may never use some of this information, but you'll never wish you didn't have it available down the road and will be impossible to re-create historically.
In terms of making sure the application doesn't slow down, you may want to 'select' the data from within a stored procedure, that also issues the logging command, so that the client is not doing two roundtrips (one for the select, one for the update/insert).
Alternatively, if this is a web application, you could use an async ajax call to issue the logging action which wouldn't slow down the users experience at all.
Adding new column to track SELECT is not a practice, because it may affect database performance, and the database performance is one of major critical issue as per Database Server Administration.
So here you can use one very good feature of database called Auditing, this is very easy and put less stress on Database.
Find more info: Here or From Here
Or Search for Database Auditing For Select Statement
Use another table as a key/value pair with two columns(e.g. id_selected, times) for storing the ids of the records you select in your standard table, and increment the times value by 1 every time the records are selected.
To do this you'd have to do a mass insert/update of the selected ids from your select query in the counting table. E.g. as a quick example:
SELECT id, stuff1, stuff2 FROM myTable WHERE stuff1='somevalue';
INSERT INTO countTable(id_selected, times)
SELECT id, 1 FROM myTable mt WHERE mt.stuff1='somevalue' # or just build a list of ids as values from your last result
ON DUPLICATE KEY
UPDATE times=times+1
The ON DUPLICATE KEY is right from the top of my head in MySQL. For conditionally inserting or updating in MSSQL you would need to use MERGE instead

What is a good approach to preloading data?

Are there best practices out there for loading data into a database, to be used with a new installation of an application? For example, for application foo to run, it needs some basic data before it can even be started. I've used a couple options in the past:
TSQL for every row that needs to be preloaded:
IF NOT EXISTS (SELECT * FROM Master.Site WHERE Name = #SiteName)
INSERT INTO [Master].[Site] ([EnterpriseID], [Name], [LastModifiedTime], [LastModifiedUser])
VALUES (#EnterpriseId, #SiteName, GETDATE(), #LastModifiedUser)
Another option is a spreadsheet. Each tab represents a table, and data is entered into the spreadsheet as we realize we need it. Then, a program can read this spreadsheet and populate the DB.
There are complicating factors, including the relationships between tables. So, it's not as simple as loading tables by themselves. For example, if we create Security.Member rows, then we want to add those members to Security.Role, we need a way of maintaining that relationship.
Another factor is that not all databases will be missing this data. Some locations will already have most of the data, and others (that may be new locations around the world), will start from scratch.
Any ideas are appreciated.
If it's not a lot of data, the bare initialization of configuration data - we typically script it with any database creation/modification.
With scripts you have a lot of control, so you can insert only missing rows, remove rows which are known to be obsolete, not override certain columns which have been customized, etc.
If it's a lot of data, then you probably want to have an external file(s) - I would avoid a spreadsheet, and use a plain text file(s) instead (BULK INSERT). You could load this into a staging area and still use techniques like you might use in a script to ensure you don't clobber any special customization in the destination. And because it's under script control, you've got control of the order of operations to ensure referential integrity.
I'd recommend a combination of the 2 approaches indicated by Cade's answer.
Step 1. Load all the needed data into temp tables (on Sybase, for example, load data for table "db1..table1" into "temp..db1_table1"). In order to be able to handle large datasets, use bulk copy mechanism (whichever one your DB server supports) without writing to transaction log.
Step 2. Run a script which as a main step will iterate over each table to be loaded, if needed create indexes on newly created temp table, compare the data in temp table to main table, and insert/update/delete differences. Then as needed the script can do auxillary tasks like the security role setup you mentioned.

How to track data changes in a database table

What is the best way to track changes in a database table?
Imagine you got an application in which users (in the context of the application not DB users ) are able to change data which are store in some database table. What's the best way to track a history of all changes, so that you can show which user at what time change which data how?
In general, if your application is structured into layers, have the data access tier call a stored procedure on your database server to write a log of the database changes.
In languages that support such a thing aspect-oriented programming can be a good technique to use for this kind of application. Auditing database table changes is the kind of operation that you'll typically want to log for all operations, so AOP can work very nicely.
Bear in mind that logging database changes will create lots of data and will slow the system down. It may be sensible to use a message-queue solution and a separate database to perform the audit log, depending on the size of the application.
It's also perfectly feasible to use stored procedures to handle this, although there may be a bit of work involved passing user credentials through to the database itself.
You've got a few issues here that don't relate well to each other.
At the basic database level you can track changes by having a separate table that gets an entry added to it via triggers on INSERT/UPDATE/DELETE statements. Thats the general way of tracking changes to a database table.
The other thing you want is to know which user made the change. Generally your triggers wouldn't know this. I'm assuming that if you want to know which user changed a piece of data then its possible that multiple users could change the same data.
There is no right way to do this, you'll probably want to have a separate table that your application code will insert a record into whenever a user updates some data in the other table, including user, timestamp and id of the changed record.
Make sure to use a transaction so you don't end up with cases where update gets done without the insert, or if you do the opposite order you don't end up with insert without the update.
One method I've seen quite often is to have audit tables. Then you can show just what's changed, what's changed and what it changed from, or whatever you heart desires :) Then you could write up a trigger to do the actual logging. Not too painful if done properly...
No matter how you do it, though, it kind of depends on how your users connect to the database. Are they using a single application user via a security context within the app, are they connecting using their own accounts on the domain, or does the app just have everyone connecting with a generic sql-account?
If you aren't able to get the user info from the database connection, it's a little more of a pain. And then you might look at doing the logging within the app, so if you have a process called "CreateOrder" or whatever, you can log to the Order_Audit table or whatever.
Doing it all within the app opens yourself up a little more to changes made from outside of the app, but if you have multiple apps all using the same data and you just wanted to see what changes were made by yours, maybe that's what you wanted... <shrug>
Good luck to you, though!
--Kevin
In researching this same question, I found a discussion here very useful. It suggests having a parallel table set for tracking changes, where each change-tracking table has the same columns as what it's tracking, plus columns for who changed it, when, and if it's been deleted. (It should be possible to generate the schema for this more-or-less automatically by using a regexed-up version of your pre-existing scripts.)
Suppose I have a Person Table with 10 columns which include PersonSid and UpdateDate. Now, I want to keep track of any updates in Person Table.
Here is the simple technique I used:
Create a person_log table
create table person_log(date datetime2, sid int);
Create a trigger on Person table that will insert a row into person_log table whenever Person table gets updated:
create trigger tr on dbo.Person
for update
as
insert into person_log(date, sid) select updatedDTTM, PersonSID from inserted
After any updates, query person_log table and you will be able to see personSid that got updated.
Same you can do for Insert, delete.
Above example is for SQL, let me know in case of any queries or use this link :
https://web.archive.org/web/20211020134839/https://www.4guysfromrolla.com/webtech/042507-1.shtml
A trace log in a separate table (with an ID column, possibly with timestamps)?
Are you going to want to undo the changes as well - perhaps pre-create the undo statement (a DELETE for every INSERT, an (un-) UPDATE for every normal UPDATE) and save that in the trace?
Let's try with this open source component:
https://tabledependency.codeplex.com/
TableDependency is a generic C# component used to receive notifications when the content of a specified database table change.
If all changes from php. You may use class to log evry INSERT/UPDATE/DELETE before query. It will be save action, table, column, newValue, oldValue, date, system(if need), ip, UserAgent, clumnReference, operatorReference, valueReference. All tables/columns/actions that need to log are configurable.

Resources