Why do a periodic full data refresh? - sql-server

Is there a benefit from doing a periodic full table refresh when you regularly insert/update/delete incrementally?
To clarify, this question is in regards to ETL processes.

If you are 100% certain that your incremental updates are capturing all CRUD operations, there is no reason to flush and fill. If your incrementals have room for error beyond the tolerance of the business rules governing the process, then you should consider period flush and fills.
It all depends on your source system, your target system, your ETL process, and your tolerance for error.

I'm not sure what you mean by 'data refresh', so I will take some liberties in assuming that you mean rebuilding indexes. Good maintenance involves rebuilding indexes periodically over time in order to eliminate fragmentation of any indexes on tables that are the result of INSERT/UPDATE/DELETE.
For more information, read: https://dba.stackexchange.com/questions/4283/when-should-i-rebuild-indexes
If you mean to say a full backup, then that is to truncate the transaction log and create a more recent database backup that you can fully restore from without having to restore the last full backup plus all incremental partial database backups and the transaction log backup.
For more information, read this: https://learn.microsoft.com/en-us/sql/relational-databases/backup-restore/full-database-backups-sql-server
and this: https://technet.microsoft.com/en-us/library/2009.07.sqlbackup.aspx

Related

Minimizing Pre Staging DB Transaction Log growth

I have an app that loads data from a csv file into a pre-staging table and then performs some operations on it by calling some SPs before moving to staging:
The data is truncated before inserts are done
I am using a Simple Recovery Mode, since data recovery does not matter at this stage
The entire process is done daily
Two SPs are used before moving to Staging: one that bulk inserts into the table, and another one that removes quoting marks.
The problem is that the csv file typically has around 1.5 million rows. So that means that it truncates a 1.5 million row table using truncate table, and then it does quote removal line by line. I'm obviously convinced these two are the ones contributing to the transaction log size.
I have been researching ways to do quote removal in our back-end instead of using an SP to do so, so hopefully that will help minimize growth. However, what could I do on our DB that I haven't done already so that the log doesn't increase so dramatically over time?
You may want to consider using In-Memory OLTP for this. You can create tables, with SCHEMA_ONLY durability. A SCHEMA_ONLY durability avoids both transaction logging and checkpoint, which can significantly reduce I/O operations and transaction log will not grow.

Mix transactional replication and log shipping?

I've replicated a large database (close to 1TB) to three remote servers using pushing transactional replication. The subscribers are read-only. A lot of data is inserted and updated (from other sources) in one day every month. It always fail the replication after the day and we manually initialize the replication from backup every month.
Is it possible to switch to log shipping before the inserting day and switch back to transactional replication after the bulk insertions are log shipped? So there is no need to copy the big backup file for re-initialization?
No. Transactional replication is logical while log shipping is physical. You can't switch at will between the two. But if your subscribers are read only to start with then transactional replication can be replaced out of the box with log shipping, at the cost of a slight delay in updates and having to disconnect readers on the stand-by sites every time a log is being applied (usually this is nowhere near as bad as it sounds). Given how much more efficient and less problematic log shipping is compared to transactional replication, I would not hesitate for a single second in doing this replace for good.
I question your need to re-initialize on a scheduled basis. I've had replication topologies go for a really long time without having the need to re-init. And when we did, it was only because there was a schema change that didn't play nice. When you say that the large amount of data fails replication, what does that mean? Replication will gladly deliver large data changes that to the subscribers. If you're running afoul of latency limits, you can either increase those or break down large transactions into smaller ones at the publisher. You also have the option of setting the MaxCmdsInTran option for the log reader agent to have it break up your transactions for you.

Huge transaction in Sql Server, are there any problems?

I have a program which does many bulk operations on an SQL Server 2005 or 2008 database (drops and creates indexes, creates columns, full table updates etc), all in one transaction.
Are there any problems to be expected?
I know that the transaction log expands even in Simple recovery mode.
This program is not executed during normal operation of the system, so locking and concurrency is not an issue.
Are there other reasons to split the transaction into smaller steps?
In short,
Using smaller transactions provides more robust recovery from failure.
Long transactions may also unnecessarily hold locks on objects for extended periods of time that other processes may require access to i.e. blocking.
Consider that if at any point between the time the transaction started and finished, your server experienced a failure, in order to be bring the database online SQL Server would have to perform the crash recovery process which would involve rolling back all uncommitted transactions from the log.
Supposing you developed a data processing solution that is intelligent enough to pick up from where it left off. By using a single transaction this would not be an option available to you because you would need to start the process from the begging once again.
If the transaction causes too many database log entries (updates) the log can hit what is known as the "high water mark". It's the point at which the log reaches (about) half of its absolute maximum size, when it must then commence rolling back all updates (which will consume about the same amount of disk as it took to do the updates.
Not rolling back at this point would mean risking eventually reaching the maximum log size and still not finishing the transaction or hitting a rollback command, at which point the database is screwed because there's not enough log space to rollback.
It isn't really a problem until you run out of disk space, but you'll find that rollback will take a long time. I'm not saying to plan for failure of course.
However, consider the process not the transaction log as such. I'd consider separating:
DDL into a separate transaction
Bulk load staging tables with a transaction
Flush data from staging to final table in another transaction
If something goes wrong I'd hope that you have rollback scripts and/or a backup.
Is there really a need to do everything atomically?
Depending on the complexity of your update statements, I'd recommend to do this only on small tables of, say, a few 100 rows. Especially if you have only a small amount of main memory available. Otherwise, for instance, updates on big tables can take a very long time and even appear to hang. Then it's difficult to figure out what the process (spid) is doing and how long it might take.
I'm not sure whether "Drop index" is transaction-logged operation anyway. See this question here on stackoverflow.com.

Restoring two databases to precisely the same time

In SQL Server 2008, I have my parent table in one database, and the child table in another database, with FK relationship maintained by triggers. I cannot change it, cannot move both tables into one DB and have a regular FK constraint. When I restored both databases from full backups, I had orphans in my child table, because the full backups were not taken at the same time. I also have transaction logs.
In case of disaster recovery, can I restore both databases to precisely the same moment, so that the two databases are consistent?
Restoring at the same moment in time is possible as long as the databases are in full recovery mode and regular log backups are taken. See How to: Restore to a Point in Time (Transact-SQL).
However point in time recovery will not ensure cross-db transactional consistency on their own, you also need to had been used transactions on all operations that logically spanned the database boundary. Triggers have probably ensured this for deletes and updates because they run in the context of the parent operation, thus implicitly wrapping the cross db boundary operation in a transaction, but for inserts your application usually has to wrap the insert into parent and insert into child into a single transaction.
Consistency of recovery operations is the biggest hurdle with application split between different databases.
I cannot see the full solution for your problem, but you can use full backups with backups of transaction log.
first, you restore full backups on poth bases WITH NORECOVERY option, and then resore transaction-log backups WITH STOPAT='xxxxxxxx' on both bases. So you can get both databases restored on same point of time.
The best way to do this is to fix it at the point you're doing the backup. Most multi-database apps do this:
Prior to backup, execute a command to write a marked transaction in the transaction log of each database involved. (BEGIN TRANSACTION WITH MARK) Then do the backups.
That way, you can later do a RESTORE WITH STOPAT MARK to get them all to the same point in time. It's not perfect but much closer than other methods.

Does the Full Recovery Model Generate Additional Transaction Logs?

I read some Books Online about recovery/backup, one stupid question, if I use full database backup/full recovery model, for backup operation itself, will it generate any additional transaction log to source database server? Will full recovery operation generate additional transaction log to destination database?
A more useful view of this might be to say that Full Recovery prevents the contents of the transaction log from being overwritten without some other action allowing them to be overwritten
SQL Server will log most transactions (e.g. bulk load and a few others aside) and when running in simple recovery mode, effectively discard the newly created log contents at the end of the transaction associated with the creation of the same. When running in Full Recovery mode the contents of the trans log are retained until marked as available to be overwritten. To mark them as available to be overwritten one normally performs a backup (either Full or Trans Log).
If there is no space in the trans log and no logs contents marked as available to be overwritten then SQL Server will attempt to increase the size of the logs.
In practical terms Full Recovery requires you to manage your transaction logs, generally by performing a trans log backup every so often (every 1 hour is probably a good rule of thumb if you have no SLA to work to or other driver to determine how often to do this)
I'm not sure I completely understand your question, but here goes. Keeping your DB in Full Recovery mode can make your transaction logs grow to be very large. The trade off is that you can restore to the point of recovery.
The reason that the transaction logs are larger than normal is ALL transactions are fully logged. This can include bulk-logged operations, index creation, etc.
If drive space is not a concern (and with drives being so inexpensive, it shouldn't be), this is the recommended backup approach.

Resources