Archiving Production DB Insert/Update with SQL Server 2008 - sql-server

I have a production database and an archive database in a second SQL Server instance.
When I insert or update (NOT DELETE) data in the production database, I need to insert or update the same data in the archive database.
What is the good way for do that?
Thanks

If they are in the same db instance, a trigger would be trivial assuming it's not a lot of tables.
If the size of this grows, you'll probably want to look into SQL Server replication. Microsoft has spent a lot of time and money to do it right.

If you are considering using triggers for this, then you may want to take into account the load sizes for your production database. If it is very intensive database, consider using some high availability solution such as Replication or Mirroring or Log shipping. Depending on your needs, either of the solution could serve you right.
Also at the same time, you should consider your "cold" recovery solutions which would need to be changed in accordance to what you implement.

Replication will replicate your deletions as well. However, not deleting the deletions from your archive database may cause problems down the line on unique indexes, where a value is valid in the production database but not valid in the archive database because the values already exist there. If your design means that this is not an issue, then a simple trigger in the production table will do this for you:
CREATE TRIGGER TR_MyTable_ToArchive ON MyTable FOR INSERT, UPDATE AS
BEGIN
SET ROW_COUNT OFF
-- First inserts
SET IDENTITY_INSERT ArchiveDB..MyTable ON -- Only if identity column is used
INSERT INTO ArchiveDB..MyTable(MyTableKey, Col1, Col2, Col3, ...)
SELECT MyTableKey, Col1, Col2, Col3, ...
FROM inserted i LEFT JOIN deleted d ON i.MyTableKey = d.MyTableKey
WHERE d.MyTableKey IS NULL
SET IDENTITY_INSERT ArchiveDB..MyTable OFF -- Only if identity column is used
-- then updates
UPDATE t SET Col1 = i.col1, col2 = i.col2, col3 = i.col3, ...
FROM ArchiveDB..MyTable t INNER JOIN inserted i ON t.MyTableKey = i.MyTableKey
INNER JOIN deleted d ON i.MyTableKey = d.MyTableKey
END
This assumes that your archive database resides on the same server as your production database. If this is not the case, you'll need to create a linked server entry, and then replace ArchiveDB..MyTable with ArchiveServer.ArchiveDB..MyTable, where ArchiveServer is the name of the linked server.
If there is a lot of load on your production database already, however, bear in mind that this will double it. To circumvent this, you can add an update flag field in each of your tables, and run a scheduled task at a time when the database load is at a minimum, like 1am. Your trigger would then set the field to I for an insert or U for an update in the production database, and the scheduled task would perform then update or insert in the archive database, depending on the value of this field, and then reset the field to NULL once it has finished.

Related

Optimizer stats on a busy table with large inserts and deletes

Environment: Oracle database 19C
The table in question has a few number data type columns and one column of CLOB data type. The table is properly indexed and there is a nightly gather stats job as well.
Below are the operations on the table-
A PL/SQL batch procedure inserts 4 to 5 million of records from a flat file presented as an external table
After the insert operation, another batch process reads the rows and updates some of the columns
A daily purge process deletes rows that are no longer needed
My question is - should gather stats be triggered immediately after the insert and/or delete operations on the table ?
Per this Oracle doc Online Statistics Gathering for Bulk Loads, bulk loads only gather online statistics automatically when the object is empty. My process will not benefit from it as the table is not empty when I load data.
But online statistics gathering works for insert into select operations on empty segments using direct path. So next I am going to try append hint. Any thoughts... ?
Before Oracle 12c, it was best practise to gather statistics immediately after a bulk load. However, according to Oracle's SQL Tuning Guide, many applications failed to do so, therefore they automated this for certain operations.
I would recommend to have a look at the dictionary views DBA_TAB_STATISTICS, DBA_IND_STATISTICS and DBA_TAB_MODIFICATIONS and see how your table behaves:
CREATE TABLE t AS SELECT * FROM all_objects;
CREATE INDEX i ON t(object_name);
SELECT table_name, num_rows, stale_stats
FROM DBA_TAB_STATISTICS WHERE table_name='T'
UNION ALL
SELECT index_name, num_rows, stale_stats
FROM DBA_IND_STATISTICS WHERE table_name='T';
TABLE_NAME NUM_ROWS STALE_STATS
T 67135 NO
I 67135 NO
If you insert data, the statistics are marked as stale:
INSERT INTO t SELECT * FROM all_objects;
TABLE_NAME NUM_ROWS STALE_STATS
T 67138 YES
I 67138 YES
SELECT inserts, updates, deletes
FROM DBA_TAB_MODIFICATIONS
WHERE table_name='T';
INSERTS UPDATES DELETES
67140 0 0
Likewise for updates and delete:
UPDATE t SET object_id = - object_id WHERE object_type='TABLE';
4,449 rows updated.
DELETE FROM t WHERE object_type = 'SYNONYM';
23,120 rows deleted.
INSERTS UPDATES DELETES
67140 4449 23120
When you gather statistics, stale_stats becomes 'NO' again, and `DBA_TAB_MODIFICATIONS* goes back to zero (or an empty row)
EXEC DBMS_STATS.GATHER_TABLE_STATS(NULL, 'T');
TABLE_NAME NUM_ROWS STALE_STATS
T 111158 YES
I 111158 YES
Please note, that `INSERT /*+ APPEND */ gathers only statistics if the table (or partition) is empty. The restriction is documented here.
So, I would recommend in your code, after the inserts, updates and deletes are done, to check if the table(s) appear in USER_TAB_MODIFICATIONS. If the statistics are stale, I'd gather statistics.
I would also look into partitioning. Check if you can insert, update and gather stats in a fresh new partition, which would be a bit faster. And check if you can purge your data by dropping a whole partition, which would be a lot faster.

How to copy large number of data from one table to another in same database?

I have two tables with same column structure in the same database: TableA and TableB.
TableA doesn't have any indexes, but TableB has a non-clustered unique index.
TableA has 290 Million rows of data that needs to be copied to TableB.
As they both have same structure, I've tried
INSERT INTO TableB
SELECT *
FROM TableA;
It was executing for hours and produced a huge log file that filled the disk. As a result the disk ran out of space and the query was killed.
I can shrink the log file. How can I copy these many rows of data to another table efficiently?
First of all, disable the index on TableB before inserting the rows. You can do it using T-SQL:
ALTER INDEX IX_Index_Name ON dbo.TableB DISABLE;
Make sure to disable all the constraints (foreign keys, check constraints, unique indexes) on your destination table.
Re-enable (and rebuild) them after the load is complete.
Now, there's a couple of approaches to solve the problem:
You have to be OK with a slight chance of data loss: use the INSERT INTO ... SELECT ... FROM ... syntax you have but switch your database to Bulk-logged recovery mode first (read before switching). Won't help if you're already in Bulk-logged or Simple.
With exporting the data first: you can use the BCP utility to export/import the data. It supports loading data in batches. Read more about using the BCP utility here.
Fancy, with exporting the data first: With SQL 2012+ you can try exporting the data into binary file (using the BCP utility) and load it by using the BULK INSERT statement, setting ROWS_PER_BATCH option.
Old-school "I don't give a damn" method: to prevent the log from filling up you will need to perform the
inserts in batches of rows, not everything at once. If your database
is running in Full recovery mode you will need to keep log backups
running, maybe even trying to increase the frequency of the job.
To batch-load your rows you will need a WHILE (don't use them in
day-to-day stuff, just for batch loads), something like the
following will work if you have an identifier in the dbo.TableA
table:
DECLARE #RowsToLoad BIGINT;
DECLARE #RowsPerBatch INT = 5000;
DECLARE #LeftBoundary BIGINT = 0;
DECLARE #RightBoundary BIGINT = #RowsPerBatch;
SELECT #RowsToLoad = MAX(IdentifierColumn) dbo.FROM TableA
WHILE #LeftBoundary < #RowsToLoad
BEGIN
INSERT INTO TableB (Column1, Column2)
SELECT
tA.Column1,
tB.Column2
FROM
dbo.TableA as tA
WHERE
tA.IdentifierColumn > #LeftBoundary
AND tA.IdentifierColumn <= #RightBoundary
SET #LeftBoundary = #LeftBoundary + #RowsPerBatch;
SET #RightBoundary = #RightBoundary + #RowsPerBatch;
END
For this to work effectively you really want to consider creating an
index on dbo.TableA (IdentifierColumn) just for the time you're
running the load.

Using Triggers in SQL Server to keep a history

I am using SQL Server 2012
I have a table called AMOUNTS and a table called AMOUNTS_HIST
Both tables have identical columns:
CHANGE_DATE
AMOUNT
COMPANY_ID
EXP_ID
SPOT
UPDATE_DATE [system date]
The Primary Key of AMOUNTS is COMPANY_ID and EXP_ID.
The Primary Key pf AMOUNTS_HIST is COMPANY_ID, EXP_ID and CHANGE_DATE
Whenever I add a row in the AMOUNTS table, I would like to create a copy of it in the AMOUNTS_HIST table. [Theoretically, each time a row is added to 'AMOUNTS', the COMPANY_ID, EXP_ID, CHANGE_DATE will be unique. Practically, if they are not, the relevant row in AMOUNTS_HIST would need to be overridden. The code below does not take the overriding into account.]
I created a trigger as follows:
CREATE TRIGGER [MYDB].[update_history] ON [MYDB].[AMOUNTS]
FOR UPDATE
AS
INSERT MYDB.AMOUNTS_HIST (
CHANGE_DATE,
COMPANY_ID,
EXP_ID,
SPOT
UPDATE_DATE
)
SELECT e.CHANGE_DATE,
e.COMPANY_ID,
e.EXP_ID
e.REMARKS,
e.SPOT,
e.UPDATE_DATE
FROM MYDB.AMOUNTS e
JOIN inserted ON inserted.company_id = e.company_id
AND inserted.exp_id=e.exp_id
I don't understand why it does nothing at all in my AMOUNTS_HIST table.
Can anyone help?
Thanks,
Probably because the trigger, the way it's currently written, will only get fired when an Update is done, not an insert.
Try changing it to:
CREATE TRIGGER [MYDB].[update_history] ON [MYDB].[AMOUNTS]
FOR UPDATE, INSERT
I just wanted to chime in. Have you looked at CDC (change data capture).
http://msdn.microsoft.com/en-us/library/bb522489(v=sql.105).aspx
"Change data capture is designed to capture insert, update, and delete activity applied to SQL Server tables, and to make the details of the changes available in an easily consumed relational format. The change tables used by change data capture contain columns that mirror the column structure of a tracked source table, along with the metadata needed to understand the changes that have occurred.
Change data capture is available only on the Enterprise, Developer, and Evaluation editions of SQL Server."
As far as your trigger goes, when you update [MYDB].[AMOUNTS] does the trigger throw any errors?
Also I believe you can get all your data from Inserted table without needed to do the join back to mydb.amounts.

order hint for openquery?

I need to execute the following SQL (SQL Server 2008) in a scheduled job periodically. The Query plan shows 53% cost is sort after the data is pulled from the oracle server. However, I've ordered the data in the openquery. How to force the query not to sort when merge joining?
merge target as t
using (select * from openquery(oracle, '
select * from t1 where UpdateTime > ''....'' order by k1, k2')
) as s on s.k1=t.k1 and s.k2=t.K2 -- the clustered PK of "target" is K1,k2
when matched then ......
when not matched then ......
Is there something like bulk insert's "with (order( { column [ ASC | DESC ] } [ ,...n ] ))"? will it help improve the query plan of the merge statement if it exists?
If the oracle table already have PK on K1,K2, will just using oracle.db.owner.tablename as target better? (will SQL Server figure out the index from oracle meta information?)
Or the best I can do is stored the oracle data in a local temp table and create a clustered primary key on K1,k2? I am trying to avoid to create a temp table because sometime the returned openquery data set can be large.
I think a table is the best way to go because then you can create whatever indexes you need, but there's no reason why it should be temporary; why not create a permanent staging table? A local join using local indexes will probably be much more efficient than a join on the results of a remote query, although the only way to know for sure is to test it and see.
If you're worried about the large number of rows, you can look into only copying over new or changed rows. If the Oracle table already has columns for row creation and update times, that would be quite easy.
Alternatively, you could consider using SSIS instead of a scheduled job. I understand that if you're not already using SSIS you may not want to invest time in learning it, but it's a very powerful tool and it's designed for moving large amounts of data into MSSQL. You would create a package with the following workflow:
Delete existing rows from the staging table (only if you can't populate it incrementally)
Copy the data from Oracle
Execute the MERGE statement

Updating redundant/denormalized data automatically in SQL Server

Use a high level of redundant, denormalized data in my DB designs to improve performance. I'll often store data that would normally need to be joined or calculated. For example, if I have a User table and a Task table, I would store the Username and UserDisplayName redundantly in every Task record. Another example of this is storing aggregates, such as storing the TaskCount in the User table.
User
UserID
Username
UserDisplayName
TaskCount
Task
TaskID
TaskName
UserID
UserName
UserDisplayName
This is great for performance since the app has many more reads than insert, update or delete operations, and since some values like Username change rarely. However, the big draw back is that the integrity has to be enforced via application code or triggers. This can be very cumbersome with updates.
My question is can this be done automatically in SQL Server 2005/2010... maybe via a persisted/permanent View. Would anyone recommend another possibly solution or technology. I've heard document-based DBs such as CouchDB and MongoDB can handle denormalized data more effectively.
You might want to first try an Indexed View before moving to a NoSQL solution:
http://msdn.microsoft.com/en-us/library/ms187864.aspx
and:
http://msdn.microsoft.com/en-us/library/ms191432.aspx
Using an Indexed View would allow you to keep your base data in properly normalized tables and maintain data-integrity while giving you the denormalized "view" of that data. I would not recommend this for highly transactional tables, but you said it was heavier on reads than writes so you might want to see if this works for you.
Based on your two example tables, one option is:
1) Add a column to the User table defined as:
TaskCount INT NOT NULL DEFAULT (0)
2) Add a Trigger on the Task table defined as:
CREATE TRIGGER UpdateUserTaskCount
ON dbo.Task
AFTER INSERT, DELETE
AS
;WITH added AS
(
SELECT ins.UserID, COUNT(*) AS [NumTasks]
FROM INSERTED ins
GROUP BY ins.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount + added.NumTasks)
FROM dbo.[User] usr
INNER JOIN added
ON added.UserID = usr.UserID
;WITH removed AS
(
SELECT del.UserID, COUNT(*) AS [NumTasks]
FROM DELETED del
GROUP BY del.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount - removed.NumTasks)
FROM dbo.[User] usr
INNER JOIN removed
ON removed.UserID = usr.UserID
GO
3) Then do a View that has:
SELECT u.UserID,
u.Username,
u.UserDisplayName,
u.TaskCount,
t.TaskID,
t.TaskName
FROM User u
INNER JOIN Task t
ON t.UserID = u.UserID
And then follow the recommendations from the links above (WITH SCHEMABINDING, Unique Clustered Index, etc.) to make it "persisted". While it is inefficient to do an aggregation in a subquery in the SELECT as shown above, this specific case is intended to be denormalized in a situation that has higher reads than writes. So doing the Indexed View will keep the entire structure, including the aggregation, physically stored so each read will not recalculate it.
Now, if a LEFT JOIN is needed if some Users do not have any Tasks, then the Indexed View will not work due to the 5000 restrictions on creating them. In that case, you can create a real table (UserTask) that is your denormalized structure and have it populated via either a Trigger on just the User Table (assuming you do the Trigger I show above which updates the User Table based on changes in the Task table) or you can skip the TaskCount field in the User Table and just have Triggers on both tables to populate the UserTask table. In the end, this is basically what an Indexed View does just without you having to write the synchronization Trigger(s).

Resources