Schema for tracking SQL Server table updates - sql-server

I have a set of 16 SQL Server tables, which get updated constantly from a Web UI. I need to track every change happening to these tables and call a separate system every 10 minutes sending each inserted or updated row through a Windows service.
I can duplicate the schema and create another set of 16 similar tables to track changes in the original set. There will be triggers that insert a new row into tracking tables (plus ins/upd flag, timestamp etc fields) every time a corresponding source table is modified.
I am wondering is there any better way I can do this using 1 (or few) common tables that can hold data from multiple tables? Something that does not force me to maintain a duplicate set of tracking tables?

If you have Enterprise Edition, Change Data Capture is an option which is available to you.
Other editions have Change Tracking, which doesn't track history but can get you the net changes.
Comparison: https://technet.microsoft.com/en-us/library/Cc280519(v=SQL.105).aspx

Related

Temporal Tables Manually Update Data

Using SQL Server 2019, can I push data (snapshot data) from the Current (Temporal Table) to the History Table only when I want to rather than it happening automatically after every row commit? I understand that Temporal Tables are designed to record all data changes to a row - great for auditing. But what if I don't want to save all changes? What If I only want to 'baseline' data on a set of tables every week, (or when the user wants to) and I don't care what changes are made during the week? I know you can disable and enable the temporal tables, but that is more of a high level control, and the architecture is multi-tenanted,and different tenants will snapshot at different times.
Or perhaps Temporal Tables is the wrong tool for me? My use case is as follows - A user creates a mathematical model altering many parameters, they do this many times over many days, persisting to the database with every change. When they get it right they press 'Baseline' Everything is stored. They then continue with the next changes to the next baseline. At any point they can compare the difference between any two baselines. I only retain the data at the date of 'Baseline'. This would require that I move the data to the temporal history table manually..or let it go automatically and purge everything in between two baselines, seems a waste of DB resources.

"master-slave" table replication in Oracle

Would there be something similar as the master-slave database but at the table level in the database?
For example, I have the following scenario:
I have a table with millions of records and the reason is because the system is more than 15 years old.
I only want to show the records of the last year (2019-2020).
I decided to create a view that only shows the records of that range (1 year) from the information of that table that contains millions of records.
Thanks to the view, the loading time of that system screen is lighter, thanks to the fact that I have less load of records.
The problem: What if the user adds a new record to the table that contains millions of records? how do I make my view modify when the other table are modified ...
I can use triggers to update the view I think, but, is there a functionality in oracle that allows me something similar to what I just asked (master-slave) where the "slave" table is updated as the "master" table suffers changes?
First of all, you misunderstood views. View is not a physical table, and does not store any information. If you insert data into view, you are actually inserting into the source table.
Since the view is not physical, you are just filtering the data. This does not have any performance benefits.
For the big tables, you can use partitioning which drastically improves performance. And if you still need archival you can archive the partitioned data.
Partitioning is generally the best method, because you can typically archive data by simply doing an "exchange" command to archive off old data.
Data doesn't "move" in that scenario, it simply gets 'detached' from the table via data dictionary manipulation.
Would there be something similar as the master-slave database but at the table level in the database
If you are asking about master/slave replication on a table level, then,
I suppose, table/materialized view relationship is appropriate to call as a master-slave. Quote from Oracle Docs:
A materialized view is a database object that contains the results of a query. The FROM clause of the query can name tables, views, and other materialized views. Collectively these objects are called master tables (a replication term)...
When you need to "update" or, more appropriately - refresh the mview, you can use different options:
update mview periodically and refresh it periodically
update mview each time the data in the master table is changed and commited.
update manually calling DBMS_MVIEW.REFRESH or DBMS_SNAPSHOT.REFRESH
Mview could be faster then view because each time you select from a mview you select from a different "table" which was replicated from the original one. Especially if you have complex logic in a sql, you can put the logic to mview definition.
The drawbacks are you need extra disk space for mview, and there will be a delay of refreshing the data.

Find out the recently selected rows from a Oracle table and can I update a LAST_ACCESSED column whenever the table is accessed

I have a database table which have more than 1 million records uniquely identified by a GUID column. I want to find out which of these record or rows was selected or retrieved in the last 5 years. The select query can happen from multiple places. Sometimes the row will be returned as a single row. Sometimes it will be part of a set of rows. there is select query that does the fetching from a jdbc connection from a java code. Also a SQL procedure also fetches data from the table.
My intention is to clean up a database table.I want to delete all rows which was never used( retrieved via select query) in last 5 years.
Does oracle DB have any inbuild meta data which can give me this information.
My alternative solution was to add a column LAST_ACCESSED and update this column whenever I select a row from this table. But this operation is a costly operation for me based on time taken for the whole process. Atleast 1000 - 10000 records will be selected from the table for a single operation. Is there any efficient way to do this rather than updating table after reading it. Mine is a multi threaded application. so update such large data set may result in deadlocks or large waiting period for the next read query.
Any elegant solution to this problem?
Oracle Database 12c introduced a new feature called Automatic Data Optimization that brings you Heat Maps to track table access (modifications as well as read operations). Careful, the feature is currently to be licensed under the Advanced Compression Option or In-Memory Option.
Heat Maps track whenever a database block has been modified or whenever a segment, i.e. a table or table partition, has been accessed. It does not track select operations per individual row, neither per individual block level because the overhead would be too heavy (data is generally often and concurrently read, having to keep a counter for each row would quickly become a very costly operation). However, if you have you data partitioned by date, e.g. create a new partition for every day, you can over time easily determine which days are still read and which ones can be archived or purged. Also Partitioning is an option that needs to be licensed.
Once you have reached that conclusion you can then either use In-Database Archiving to mark rows as archived or just go ahead and purge the rows. If you happen to have the data partitioned you can do easy DROP PARTITION operations to purge one or many partitions rather than having to do conventional DELETE statements.
I couldn't use any inbuild solutions. i tried below solutions
1)DB audit feature for select statements.
2)adding a trigger to update a date column whenever a select query is executed on the table.
Both were discarded. Audit uses up a lot of space and have performance hit. Similary trigger also had performance hit.
Finally i resolved the issue by maintaining a separate table were entries older than 5 years that are still used or selected in a query are inserted. While deleting I cross check this table and avoid deleting entries present in this table.

Add DATE column to store when last read

We want to know what rows in a certain table is used frequently, and which are never used. We could add an extra column for this, but then we'd get an UPDATE for every SELECT, which sounds expensive? (The table contains 80k+ rows, some of which are used very often.)
Is there a better and perhaps faster way to do this? We're using some old version of Microsoft's SQL Server.
This kind of logging/tracking is the classical application server's task. If you want to realize your own architecture (there tracking architecture) do it on your own layer.
And in any case you will need application server there. You are not going to update tracking field it in the same transaction with select, isn't it? what about rollbacks? so you have some manager who first run select than write track information. And what is the point to save tracking information together with entity info sending it back to DB? Save it into application server file.
You could either update the column in the table as you suggested, but if it was me I'd log the event to another table, i.e. id of the record, datetime, userid (maybe ip address etc, browser version etc), just about anything else I could capture and that was even possibly relevant. (For example, 6 months from now your manager decides not only does s/he want to know which records were used the most, s/he wants to know which users are using the most records, or what time of day that usage pattern is etc).
This type of information can be useful for things you've never even thought of down the road, and if it starts to grow large you can always roll-up and prune the table to a smaller one if performance becomes an issue. When possible, I log everything I can. You may never use some of this information, but you'll never wish you didn't have it available down the road and will be impossible to re-create historically.
In terms of making sure the application doesn't slow down, you may want to 'select' the data from within a stored procedure, that also issues the logging command, so that the client is not doing two roundtrips (one for the select, one for the update/insert).
Alternatively, if this is a web application, you could use an async ajax call to issue the logging action which wouldn't slow down the users experience at all.
Adding new column to track SELECT is not a practice, because it may affect database performance, and the database performance is one of major critical issue as per Database Server Administration.
So here you can use one very good feature of database called Auditing, this is very easy and put less stress on Database.
Find more info: Here or From Here
Or Search for Database Auditing For Select Statement
Use another table as a key/value pair with two columns(e.g. id_selected, times) for storing the ids of the records you select in your standard table, and increment the times value by 1 every time the records are selected.
To do this you'd have to do a mass insert/update of the selected ids from your select query in the counting table. E.g. as a quick example:
SELECT id, stuff1, stuff2 FROM myTable WHERE stuff1='somevalue';
INSERT INTO countTable(id_selected, times)
SELECT id, 1 FROM myTable mt WHERE mt.stuff1='somevalue' # or just build a list of ids as values from your last result
ON DUPLICATE KEY
UPDATE times=times+1
The ON DUPLICATE KEY is right from the top of my head in MySQL. For conditionally inserting or updating in MSSQL you would need to use MERGE instead

What are good strategies for updating a live database table?

I have a db table that gets entirely re-populated with fresh data periodically. This data needs to be then pushed into a corresponding live db table, overwriting the previous live data.
As the table size increases, the time required to push the data into the live table also increases, and the app would look like its missing data.
One solution is to push the new data into a live_temp table and then run an SQL RENAME command on this table to rename it as the live table. The rename usually runs in sub-second time. Is this the "right" way to solve this problem?
Are there other strategies or tools to tackle this problem? Thanks.
I don't like messing with schema objects in this way - it can confuse query optimizers and I have no idea what will happen to any transactions that are going on while you execute the rename.
I much prefer to add a version column to the table, and have a separate table to hold the current version.
That way, the client code becomes
select *
from myTable t,
myTable_currentVersion tcv
where t.versionID = tcv.CurrentVersion
This also keeps history around - which may or not be useful; if it's not delete old records after setting the CurrentVersion column.
Create a duplicate table - exact copy.
Create a new table that does nothing more than keep track of the "up to date" table.
MostCurrent (table)
id (column) - holds name of table holding the "up to date" data.
When repopulating, populate the older table and update MostCurrent.id to reflect this table.
Now, in your app where you bind the data to the page, bind the newest table.
Would it be appropriate to only push changes to the live db table? For most applications I have worked with changes have been minimal. You should be able to apply all the changes in a single transaction. Committing the transaction will make them visible with no outage on the table.
If the data does change entirely, then you could configure the database so that you can replace all the data in a single transaction.

Resources