I have a db table that gets entirely re-populated with fresh data periodically. This data needs to be then pushed into a corresponding live db table, overwriting the previous live data.
As the table size increases, the time required to push the data into the live table also increases, and the app would look like its missing data.
One solution is to push the new data into a live_temp table and then run an SQL RENAME command on this table to rename it as the live table. The rename usually runs in sub-second time. Is this the "right" way to solve this problem?
Are there other strategies or tools to tackle this problem? Thanks.
I don't like messing with schema objects in this way - it can confuse query optimizers and I have no idea what will happen to any transactions that are going on while you execute the rename.
I much prefer to add a version column to the table, and have a separate table to hold the current version.
That way, the client code becomes
select *
from myTable t,
myTable_currentVersion tcv
where t.versionID = tcv.CurrentVersion
This also keeps history around - which may or not be useful; if it's not delete old records after setting the CurrentVersion column.
Create a duplicate table - exact copy.
Create a new table that does nothing more than keep track of the "up to date" table.
MostCurrent (table)
id (column) - holds name of table holding the "up to date" data.
When repopulating, populate the older table and update MostCurrent.id to reflect this table.
Now, in your app where you bind the data to the page, bind the newest table.
Would it be appropriate to only push changes to the live db table? For most applications I have worked with changes have been minimal. You should be able to apply all the changes in a single transaction. Committing the transaction will make them visible with no outage on the table.
If the data does change entirely, then you could configure the database so that you can replace all the data in a single transaction.
Related
I have a table named abcTbl, the data in there is populated
from other tables from a different database. Every time I am loading
data to abcTbl, I am doing a delete all to it and loading the buffer
data into it.
This package runs daily. My question is how do I avoid losing data
from the table abcTbl if we fail to load the data into it. So my
first step is deleting all the data in the abcTbl and then
selecting the data from various sources into a buffer and then
loading the buffer data into abcTbl.
Since we can encounter issues like failed connections, package
stopping prematurely, supernatural forces trying to stop/break my
package from running smoothly, etc. which will end up with the
package losing all the data in the buffer after I have already
deleted the data from abcTbl.
My first intuition was to save the data from the abcTbl into a
backup table and then deleting the data in the abcTbl but my DBAs
wouldn't be too thrilled about creating a backup table for in every
environment for the purpose of this package, and giving me juice to
create backup tables on the fly and then deleting it again is out of
the question too. This data is not business critical and can be repopulated
again if lost.
But, what is the best approach here? What are the best practices for this issue?
For backing up your table, instead of loading data from one table (Original) to another table (Backup), you can just rename your original table to something (back-up table), create original table again like the back-up table and then drop the renamed table only when your data load is successful. This may save some time to transfer data from one table to another. You may want to test which approach is faster for you depending on your data/table structure etc., But what I wanted to mention is, this is also one of the way to do it. If you have lot of data in that table below approach may be faster.
sp_rename 'abcTbl', 'abcTbl_bkp';
CREATE TABLE abcTbl ;
While creating this table, you can keep similar table structure as that of abcTbl_bkp
Load your new data to abcTbl table
DROP TABLE abcTbl_bkp;
Trying to figure this out but I think what you are asking for is a method to capture the older data before loading the new data. I would agree with your DBA's that a seperate table for every reload would be extremely messy and not very usable if you ever need it.
Instead, create a table that copies your load table but adds a single DateTime field(say history_date). Each load you would just flow all the data in your primary table to the backup table. Use a Derived Column task in the Data Flow to add the history_date value to the backup table.
Once the backup table is complete, either truncate or delete the contents of the current table. Then load the new data.
Instead of created additional tables you can set the package to execute as a single transaction. By doing this, if any component fails all the tasks that have already executed will be rolled back and subsequent ones will not run. To do this, set the TransactionOption to Required on the package. This will allow that the package will begin a transaction. After this set all this property to Supported for all components that you want to succeed or fail together. The Supported level will have these tasks join a transaction that is already in progress by the parent container, being the package in this case. If there are other components in the package that you want to commit or rollback independent of these tasks you can place the related objects in a Sequence container, and apply the Required level to the Sequence instead. An important thing to note is that if anything performs a TRUNCATE then all other components that access the truncated object will need to have the ValidateExternalMetadata option set to false to avoid the known blocking issue that is a result of this.
I'm using tracing to log all delete or update queries run through the system. The problem is, if I run a query like DELETE FROM [dbo].[Artist] WHERE ArtistId>280, I know how many rows were deleted but I'm unable to find out which rows were deleted (the data they had).
I'm thinking of doing this as a logging system so it would be useful to see which rows were affected and what data they had if at all possible. I don't really want to use triggers for this job but I will if I have to (and if it's feasible).
If you need the original data and are planning on storing all the deleted data in a separate table why not just logically delete the original data rather than physically delete it? i.e.
UPDATE dbo.Artist SET Artist_deleted = 1 WHERE ArtistId>280
Then you only need add one column to your current table rather than creating new tables and scripts to support these. You could then partition the current table based on the deleted flag if you are worried about disk space/performance etc.
Is there an automatic way of knowing which rows are the latest to have been added to an OpenEdge table? I am working with a client and have access to their database, but they are not saving ids nor timestamps for the data.
I was wondering if, hopefully, OpenEdge is somehow doing this out of the box. (I doubt it is but it won't hurt to check)
Edit: My Goal
My goal from this is to be able to only import the new data, i.e. the delta, of a specific table. Without having which rows are new, I am forced to import everything because I have no clue what was aded.
1) Short answer is No - there's no "in the box" way for you to tell which records were added, or the order they were added.
The only way to tell the order of creation is by applying a sequence or by time-stamping the record. Since your application does neither, you're out of luck.
2) If you're looking for changes w/out applying schema changes, you can capture changes using session or db triggers to capture updates to the db, and saving that activity log somewhere.
3) If you're just looking for a "delta" - you can take a periodic backup of the database, and then use queries to compare the current db with the backup db and get the differences that way.
4) Maintain a db on the customer site with the contents of the last table dump. The next time you want to get deltas from the customer, compare that table's contents with the current table, dump the differences, then update the db table to match the current db's table.
5) Personally. I'd talk to the customer and see if (a) they actually require this functionality, (b) find out what they think about adding some fields and a bit of code to the system to get an activity log. Adding a few fields and some code to update them shouldn't be that big of a deal.
You could use database triggers to meet this need. In order to do so you will need to be able to write and deploy trigger procedures. And you need to keep in mind that the 4GL and SQL-92 engines do not recognize each other's triggers. So if updates are possible via SQL, 4GL triggers will be blind to those updates. And vice-versa. (If you do not use SQL none of this matters.)
You would probably want to use WRITE triggers to catch both insertions and updates to data. Do you care about deletes?
Simple-minded 4gl WRITE trigger:
TRIGGER PROCEDURE FOR WRITE OF Customer. /* OLD BUFFER oldCustomer. */ /* OLD BUFFER is optional and not needed in this use case ... */
output to "customer.dat" append.
export customer.
output close.
return.
end.
What is the best way to track changes in a database table?
Imagine you got an application in which users (in the context of the application not DB users ) are able to change data which are store in some database table. What's the best way to track a history of all changes, so that you can show which user at what time change which data how?
In general, if your application is structured into layers, have the data access tier call a stored procedure on your database server to write a log of the database changes.
In languages that support such a thing aspect-oriented programming can be a good technique to use for this kind of application. Auditing database table changes is the kind of operation that you'll typically want to log for all operations, so AOP can work very nicely.
Bear in mind that logging database changes will create lots of data and will slow the system down. It may be sensible to use a message-queue solution and a separate database to perform the audit log, depending on the size of the application.
It's also perfectly feasible to use stored procedures to handle this, although there may be a bit of work involved passing user credentials through to the database itself.
You've got a few issues here that don't relate well to each other.
At the basic database level you can track changes by having a separate table that gets an entry added to it via triggers on INSERT/UPDATE/DELETE statements. Thats the general way of tracking changes to a database table.
The other thing you want is to know which user made the change. Generally your triggers wouldn't know this. I'm assuming that if you want to know which user changed a piece of data then its possible that multiple users could change the same data.
There is no right way to do this, you'll probably want to have a separate table that your application code will insert a record into whenever a user updates some data in the other table, including user, timestamp and id of the changed record.
Make sure to use a transaction so you don't end up with cases where update gets done without the insert, or if you do the opposite order you don't end up with insert without the update.
One method I've seen quite often is to have audit tables. Then you can show just what's changed, what's changed and what it changed from, or whatever you heart desires :) Then you could write up a trigger to do the actual logging. Not too painful if done properly...
No matter how you do it, though, it kind of depends on how your users connect to the database. Are they using a single application user via a security context within the app, are they connecting using their own accounts on the domain, or does the app just have everyone connecting with a generic sql-account?
If you aren't able to get the user info from the database connection, it's a little more of a pain. And then you might look at doing the logging within the app, so if you have a process called "CreateOrder" or whatever, you can log to the Order_Audit table or whatever.
Doing it all within the app opens yourself up a little more to changes made from outside of the app, but if you have multiple apps all using the same data and you just wanted to see what changes were made by yours, maybe that's what you wanted... <shrug>
Good luck to you, though!
--Kevin
In researching this same question, I found a discussion here very useful. It suggests having a parallel table set for tracking changes, where each change-tracking table has the same columns as what it's tracking, plus columns for who changed it, when, and if it's been deleted. (It should be possible to generate the schema for this more-or-less automatically by using a regexed-up version of your pre-existing scripts.)
Suppose I have a Person Table with 10 columns which include PersonSid and UpdateDate. Now, I want to keep track of any updates in Person Table.
Here is the simple technique I used:
Create a person_log table
create table person_log(date datetime2, sid int);
Create a trigger on Person table that will insert a row into person_log table whenever Person table gets updated:
create trigger tr on dbo.Person
for update
as
insert into person_log(date, sid) select updatedDTTM, PersonSID from inserted
After any updates, query person_log table and you will be able to see personSid that got updated.
Same you can do for Insert, delete.
Above example is for SQL, let me know in case of any queries or use this link :
https://web.archive.org/web/20211020134839/https://www.4guysfromrolla.com/webtech/042507-1.shtml
A trace log in a separate table (with an ID column, possibly with timestamps)?
Are you going to want to undo the changes as well - perhaps pre-create the undo statement (a DELETE for every INSERT, an (un-) UPDATE for every normal UPDATE) and save that in the trace?
Let's try with this open source component:
https://tabledependency.codeplex.com/
TableDependency is a generic C# component used to receive notifications when the content of a specified database table change.
If all changes from php. You may use class to log evry INSERT/UPDATE/DELETE before query. It will be save action, table, column, newValue, oldValue, date, system(if need), ip, UserAgent, clumnReference, operatorReference, valueReference. All tables/columns/actions that need to log are configurable.
The table doesn't have a last updated field and I need to know when existing data was updated. So adding a last updated field won't help (as far as I know).
SQL Server 2000 does not keep track of this information for you.
There may be creative / fuzzy ways to guess what this date was depending on your database model. But, if you are talking about 1 table with no relation to other data, then you are out of luck.
You can't check for changes without some sort of audit mechanism. You are looking to extract information that ha not been collected. If you just need to know when a record was added or edited, adding a datetime field that gets updated via a trigger when the record is updated would be the simplest choice.
If you also need to track when a record has been deleted, then you'll want to use an audit table and populate it from triggers with a row when a record has been added, edited, or deleted.
You might try a log viewer; this basically just lets you look at the transactions in the transaction log, so you should be able to find the statement that updated the row in question. I wouldn't recommend this as a production-level auditing strategy, but I've found it to be useful in a pinch.
Here's one I've used; it's free and (only) works w/ SQL Server 2000.
http://www.red-gate.com/products/SQL_Log_Rescue/index.htm
You can add a timestamp field to that table and update that timestamp value with an update trigger.
OmniAudit is a commercial package which implments auditng across an entire database.
A free method would be to write a trigger for each table which addes entries to an audit table when fired.