AWS Database Migration Service causing problem - SQL Server as Source

AWS Database Migration Service causing problem - SQL Server as Source - sql-server

I have a problem using the AWS Database Migration Service for implementing a transactional replication from SQL Server as a source database engine, a help is highly appreciated.
The 'safeguardPolicy' connection attribute defaults to 'RELY_ON_SQL_SERVER_REPLICATION_AGENT'. The tools will start mimicking a transaction in the database for preventing the log to be reused and to be able to read as much changes from the active log.
But what is the intended behavior of these safeguard transaction? Will those sessions be stopped at some point? What is the mechanism to start / run for some time / stop such a transaction?
The production databases I manage are in Full recovery mode, with Log backups on each half an hour. The log grows to an enormous size due to the inability for a valid truncation procedure to succeed and because of those safeguard transactions initiated by the DMS tool.
The only solution to a full transaction log due to LOG_SCAN caused by such behavior of DMS for now is to stop the DMS tasks and run a manual truncation of the log, to release space not used. But it is not a solution at all if we need to stop the replication each time such a problem occurs, knowing that it will occur often.
Please share some internals about the tool if possible.
Thanks

Related

Tempdb transaction log full

Unable to connect to SQL Server because:
'tempdb transaction log was full due to active transaction'.
There was no way to login to SQL Server to troubleshoot. only option was to restart SQL Server.
I want to know ow to avoid this issue in future.
The log file and data files were limited to max size.
But is there a way I can be notified before tempdb log file reaches maximum?
Is there a way i can get alert or monitor tempdb space usage when it reaches 85% full get notified. this issue cause a big impact as well. so any advise how it can avoided in future will be very helpful.

What are atomikos transaction logs used for?

I've inherited an application that uses Atomikos for transaction handling in Spring on top of an Oracle database. In production deployments transaction logging has always been enabled by setting com.atomikos.icatch.enable_logging=true but the truth is I can't find any info on what exactly these logs are used for.
The atomikos site states "this should never be disabled on production or data integrity cannot be guaranteed" and I found a comment in a jta.properties on that site that said there is a "risk of losing data after restart or crash" if it is disabled.
We don't enable this in our development environments and are able to use the application normally. I thought they might be used in the case of the application crashing but if so I'm not sure how they'd be used. Maybe automatically during the next startup or manually in some way? In terms of data integrity I know Oracle enables it's own data recovery but maybe these transaction logs hold data that Oracle hasn't seen yet, e.g. if Spring were to crash.

http://fogbugz.atomikos.com/default.asp?community.6.1950.6 seems to indicate that the transaction logs are used for recovery only and can be disabled if you don't need them for recovery.

These logs maintain transaction information in the latest revision that may not be known yet to your database. without this set, recovery after a crash/restart will probably be incorrect.
HTH
Guy

Before I answer you question you need to read the begining of this post here How would you tune Distributed ( XA ) transaction for performance? to get the therminology.
The Atomikos is acting as Transaction coordinator who coordinates across the participants which are the different databases. As a coordinator it orchestrate the process of transactions accross the different databases. It is essentialy the same work that a Policemen is doing at the middle of a crossroad.
Atomikos writes its log file in order to know where exactyly in the process of the distributed transaction it is. In case of failure it can trace its uncommited transactions progress and attempt from the place it was previously interrupted. As such the transaction log is very important for the transaction recovery process.

SQL Server 2008 Backup Transaction Logs

I understand that the transaction logs keep a record of historical transactions in order to facilitate a restore if needed. However do I need to keep creating transaction log backups for inactive databases that are hanging around on the server? No DDL statements are run against them and they are just used for reference.
I am just a bit worried that I might run out of log space if I get this wrong.

Have you considered changing the recovery model of your databases to the SIMPLE recovery model? Doing so would negate the need to backup the transaction log as it would be automatically re-used in the "unlikely" event that you need it to be.
I would still advise that regular FULL database backups be taken.
Also, if these database are indeed true read only databases then why not consider setting them to be so. This action would have the advantage of immediately highlighting any queries/users that are "still" issuing DML operations when you believe there to be none.
Other options for identifying queries that are performing more than just READ operations include running a Profiler Trace of activity on your database server and also an aggressive option would be to revoke all data modification rights from the relevant database Users.

Transaction logs are actually truncated when they're backed up. So, if these databases are actually inactive, you shouldn't be backing up any transaction logs for them since the logs would be empty.
Also, common practice for "inactive" databases would be to make them READ ONLY with a SIMPLE recovery model.

Is it possible to have secondary server available read-only in a log shipping scenario?

I am looking into using log shipping in a SQL Server 2005 environment. The idea was to set up frequent log shipping to a secondary server. The intent: Use the secondary server to serve report queries, thereby offloading the primary db server.
I came across this on a sqlservercentral forum thread:
When you create the log shipping you have 2 choices. You can configure restore log operation to be done with norecovery or with standby option. If you use the norecovery option, you can not issue select statements on it. If instead of norecovery you use the standby option, you can run select queries on the database.
Bear in mind with the standby option when log file restores occur users will be kicked out without warning by the restore process. Acutely when you configure the log shipping with standby option, you can also select between 2 choices – kill all processes in the secondary database and perform log restore or don’t perform log restore if the database is being used. Of course if you select the second option, the restore operation might never run if someone opens a connection to the database and doesn’t close it, so it is better to use the first option.
So my questions are:
Is the above true? Can you really not use log shipping in the way I intend?
If it is true, could someone explain why you cannot execute SELECT statements to a database while the transaction log is being restored?
EDIT:
First question is duplicate of this serverfault question. But I still would like the second question answered: Why is it not possible to execute SELECT statements while the transaction log is being restored?

could someone explain why you cannot
execute SELECT statements to a
database while the transaction log is
being restored?
Short answer is that RESTORE statement takes an exclusive lock on the database being restored.
For writes, I hope there is no need for me to explain why they are incompatible with a restore. Why does it not allow reads either? First of all, there is no way to know if a session that has a lock on a database is going to do a read or a write. But even if it would be possible, restore (log or backup) is an operation that updates directly the data pages in the database. Since these updates go straight to the physical location (the page) and do not follow the logical hierarchy (metadata-partition-page-row), they would not honor possible intent locks from other data readers, and thus have the possibility to change structures as they are read. A SELECT table scan following the page next-prev pointers would be thrown into disarray, resulting in a corrupted read.

Well yes and no.
You can do exactly what you wish to do, in that you may offload reporting workloads to a secondary server by configuring Log Shipping to a read only copy of a database. I have set this type of architecture up on a number of occasions previously and it works very well indeed.
The caveat is that in order to perform a restore of a Transaction Log Backup file there must be no other connections to the database in question. Hence the two choices being, when the restore process runs it will either fail, thereby prioritising user connections, or it will succeed by disconnecting all user connection in order to perform the restore.
Dependent on your restore frequency this is not necessarily a problem. You simply educate your users to the fact that, say every hour at 10 past the hour, there is a possibility that your report may fail. If this happens simply re-run the report.
EDIT: You may also want to evaluate alternative architeciture solutions to your business need. For example, Transactional Replication or Database Mirroring with a Database Snapshot

If you have enterprise version, you can use database mirroring + snapshot to create read-only copy of the database, available for reporting, etc. Mirroring uses "continuous" log shipping "under the hood". It is frequently used in scenario you have described.

Yes it's true.
I think the following happens:
While the transaction log is being restored, the database is locked, as large portions of it are being updated.
This is for performance reasons more then anything else.
I can see two options:
Use database mirroring.
Schedule the log shipping to only occur when the reporting system is not in use.

Slight confusion in that, the norecovery flag on the restore means your database is not going to be brought out of a recovery state and into an online state - that is why the select statements will not work - the database is offline. The no-recovery flag is there to allow you to restore multiple log files in a row (in a DR type scenario) without bringing the database back online.
If you did not want to log ship / have the disadvantages you could swap to a one way transactional replication, but the overhead / set-up will be more complex overall.

Would peer-to-peer replication work. Then you can run queries on one instance and so save the load on the original instance.

Continuous database backups?

I have the following scenario:
Our system is running a SQL Server Express 2005 database locally (on each users desktop, if you will). The system is storing a lot of production data from a machine. There are high demands on the safety of the data, and doing a backup each night, or even each hour is not enough. We need a backup strategy that will ensure almost instantaneous/continuous backup of the database.
Is there anyone out there that has successfully implemented a system similar to this, and/or has got some ideas of how to accomplish it? The only thing I can think of right now is to have mirrored drives (raid) to hold the data, but that would be complicated and expensive.
I would appreciate any and all thoughts on this, since it is a real issue for me and my company. Thanks in advance!
Update:
I was not clear enough in my description of the scenario. The system is storing data in a vehicle that has no connection to anything. A centralized database is therefor not possible. Neither can we use a standard/enterprise version of SQL Server, since it would be to expensive (each vehicle would need a license). Thanks for your input!

Switch your database into "Full" recovery mode. Do full backup every night and do delta backup after major user action. The delta backups can be done to the flash memory or different hard-drive, and all data can be synchronized with server when online.
Another simple way is to trace all user changes and important data in a text file that stored on a separate drive. If SQL database crashes the user or other operator can repeat steps to restore data.

One way I've seen this done is by using DoubleTake.

I will assume that a central database on a server is not feasible because your systems are running standalone and are not connected to anything. So this is what I would do
Set up RAID on the computer. This insures you against simple disk failure.
Any SQLSever database can be recovered to the point of the last commited transaction if you have a full database backup and a set of transaction logs available. Basically you simply restore the last full backup then apply the transaction logs going forward. See these links.
http://www.enterpriseitplanet.com/storage/features/article.php/11318_3776361_3
https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=132
So what you need to do is set up a periodic full backup of both the database and transaction logs, and more regular transaction log backups (and ensure that your transaction log can never run out of space).
In the event of failure you restore the last full backup, then apply the transaction logs going forward.
Myself, if these are critical systems, I would be inclined to add an additional drive to the system and make sure that the backups are copied over to that. This is because as good as raid is it does sometimes have issues - raid controllers fail, disks get wiped accidentally in parallel, disk failures go unnoticed so your just running on one disk etc. If you ensure backups are copied to a separate disk then you can always recover to the last transaction log backup. You should also ensure tape backups of course, but they are generally a last resort in the event of trouble.
If for some reason you cannot set up raid then you should still install a second disk, but place the database file on one drive and the transaction log on the other and copy backups to both disks. In the event of failure of the C drive, or some other software issue crashing the database you can still recover to the last commited transaction. Failure on the D drive limits you to the last transaction log backup (Oracle used to allow you to mirror the transaction log from the database, which again would completely cover you, but I don't think this facility exists in SQL Server)
If you are looking for a scheduler for SQL Server Express (which doesn't come with one) then I've been using SQLScheduler quite happily without problems, and it's free.

The most obvious answer would be to ditch SQL Server Express running locally and use a single source for your data (such as a standard SQL server install on a central storage location). Unless your system requires individual back ups of every single person's own individual instance of SQL Server Express.
If your requirements are so stringent as to call for instantaneous backups on every operation, you should definitely think about a different method of storage than local instances of SQL Server Express.

Wouldn't it be easier to just use one centralized SQL Server and back that up every hour or so? If you truly need instantaneous backup, your company (which seems not to want to spend money by installing Express on each machine) will need to spring for two servers and two SQL Server Enterprise licenses to implement Mirroring.

Raid isn't that expensive, but it is also not the best option. If you really want high availability data you should upgrade to sql server standard on a remote server where each user connects to and use transaction based replication to an sql server (express) instance on another machine. Raid doesn't always protect you from dataloss. If the data is that important for you then the costs should not be that much of an issue.
Update in response to the question update.
If you can't use remote servers then there a couple of options:
You write a trigger which initiates a backup script on each insert or update and stores it on a seperate harddrive.
You use raid. But beware that if the raid controller fails that you still got a problem.

RAID is not expensive. Use RAID to protect against hard drive failure. You also need monitoring though. No point in having this if you let both drives fail.
Also, implement hourly incremental backups, then daily incremental backups and finally weekly full backups.
You need all of these strategies working together because they protect against different things. RAID does not protect against human or coding errors destroying data. Hourly and weekly backups don't protect against hard drive failure.