I've inherited an application that uses Atomikos for transaction handling in Spring on top of an Oracle database. In production deployments transaction logging has always been enabled by setting com.atomikos.icatch.enable_logging=true but the truth is I can't find any info on what exactly these logs are used for.
The atomikos site states "this should never be disabled on production or data integrity cannot be guaranteed" and I found a comment in a jta.properties on that site that said there is a "risk of losing data after restart or crash" if it is disabled.
We don't enable this in our development environments and are able to use the application normally. I thought they might be used in the case of the application crashing but if so I'm not sure how they'd be used. Maybe automatically during the next startup or manually in some way? In terms of data integrity I know Oracle enables it's own data recovery but maybe these transaction logs hold data that Oracle hasn't seen yet, e.g. if Spring were to crash.
http://fogbugz.atomikos.com/default.asp?community.6.1950.6 seems to indicate that the transaction logs are used for recovery only and can be disabled if you don't need them for recovery.
These logs maintain transaction information in the latest revision that may not be known yet to your database. without this set, recovery after a crash/restart will probably be incorrect.
HTH
Guy
Before I answer you question you need to read the begining of this post here How would you tune Distributed ( XA ) transaction for performance? to get the therminology.
The Atomikos is acting as Transaction coordinator who coordinates across the participants which are the different databases. As a coordinator it orchestrate the process of transactions accross the different databases. It is essentialy the same work that a Policemen is doing at the middle of a crossroad.
Atomikos writes its log file in order to know where exactyly in the process of the distributed transaction it is. In case of failure it can trace its uncommited transactions progress and attempt from the place it was previously interrupted. As such the transaction log is very important for the transaction recovery process.
Related
TL;DR: Is it possible to basically create a fast, temporary, "fork" of a database (like a snapshot transaction) without any locks given that I know for a fact that the changes will never be committed and always be rolled back.
Details:
I'm currently working with SQL Server and am trying to implement a feature where the user can try all sorts of stuff (in the application) that is never persisted in the database.
My first instinct was to (mis)use snapshot transactions for that to basically "fork" the database into a short lived (under 15min) user-specific context. The rest of the application wouldn't even have to know that all the actions the user performs will later be thrown away (I currently persist the connection across requests - it's a web application).
Problem is that there are situations where the snapshot transaction locks and waits for other transactions to complete. My guess is that this happens because SQL server has to make sure it can merge the data if one of the open transactions commits, but in my case I know for a fact that I will never commit the changes from this transactions and always throw the data away (note that not everything happens in this transactions, there are other things that a user can do that happen on a different connection and are persisted).
Are there other ideas, that don't involve cloning the database (too large/slow) or updating/changing the schema of all tables (I'd like to avoid "poisoning" the schema with the implemenation detail of the "try out" feature).
No. SQL Server has copy-on-write Database Snapshots, but the snapshots are read-only. So where a SNAPSHOT transaction acquires regular exclusive locks when it modifies the database, a Database Snapshot would just give you an error.
There are storage technologies that can a writable copy-on-write storage snapshot, like NetApp. You would run a command to create a new LUN that is a snapshot of an existing LUN, present it to your server as a disk, mount its volume in a folder or drive letter, and attach the files you find there as a database. This is often done for cloning across environments to refresh dev/test with prod data without having to copy all the data. But it seems like way too much infrastructure work for your use case.
I have a problem using the AWS Database Migration Service for implementing a transactional replication from SQL Server as a source database engine, a help is highly appreciated.
The 'safeguardPolicy' connection attribute defaults to 'RELY_ON_SQL_SERVER_REPLICATION_AGENT'. The tools will start mimicking a transaction in the database for preventing the log to be reused and to be able to read as much changes from the active log.
But what is the intended behavior of these safeguard transaction? Will those sessions be stopped at some point? What is the mechanism to start / run for some time / stop such a transaction?
The production databases I manage are in Full recovery mode, with Log backups on each half an hour. The log grows to an enormous size due to the inability for a valid truncation procedure to succeed and because of those safeguard transactions initiated by the DMS tool.
The only solution to a full transaction log due to LOG_SCAN caused by such behavior of DMS for now is to stop the DMS tasks and run a manual truncation of the log, to release space not used. But it is not a solution at all if we need to stop the replication each time such a problem occurs, knowing that it will occur often.
Please share some internals about the tool if possible.
Thanks
I wonder a case. I have a project using a database (Oracle and Mssql). My project has a framework that I manage transactions.
In thread I open a database connection and start a transaction.(In transaction, there are many update and insert queries.) While code is running, somehow connection is closed. Because I have try-catch block, I catch exception and rollback transaction. BUT; if my connection is closed because some reasons, how rollback query can run on database? How can I handle this situation? If I open a new connection and rollback, does it work?
Thanks.
There is a term you should know - ACID compliancy:
Atomicity is an all-or-none proposition;
Consistency guarantees that a transaction never leaves your database in a half-finished state.
Isolation keeps transactions separated from each other until they’re finished.
Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination.
Concerning MySQL
In order to get this at MySQL, you have to use Transaction Safe Tables (TST). Advantages of Transaction-Safe Tables:
Safer. Even if MySQL crashes or you get hardware problems, you can get your data back, either by automatic recovery or from a backup + the transaction log.
You can combine many statements and accept these all in one go with the COMMIT command.
You can execute ROLLBACK to ignore your changes (if you are not running in auto-commit mode).
If an update fails, all your changes will be restored.
Concerning SQL Server
You should read "Transaction Behavior On Lost Connection" MSDN forum topic.
To understand better what lays behind MS SQL Server transactions, read a good article "Locks and Duration of Transactions in MS SQL Server"
Make sure you are not using any autocommit feature (I think that's enabled by default in some MySQL installations). If you do all your commits "manually", a broken connection will just result in you never committing the transaction and so it won't ever get written.
You cannot reconnect to rollback in most database systems.
I understand that the transaction logs keep a record of historical transactions in order to facilitate a restore if needed. However do I need to keep creating transaction log backups for inactive databases that are hanging around on the server? No DDL statements are run against them and they are just used for reference.
I am just a bit worried that I might run out of log space if I get this wrong.
Have you considered changing the recovery model of your databases to the SIMPLE recovery model? Doing so would negate the need to backup the transaction log as it would be automatically re-used in the "unlikely" event that you need it to be.
I would still advise that regular FULL database backups be taken.
Also, if these database are indeed true read only databases then why not consider setting them to be so. This action would have the advantage of immediately highlighting any queries/users that are "still" issuing DML operations when you believe there to be none.
Other options for identifying queries that are performing more than just READ operations include running a Profiler Trace of activity on your database server and also an aggressive option would be to revoke all data modification rights from the relevant database Users.
Transaction logs are actually truncated when they're backed up. So, if these databases are actually inactive, you shouldn't be backing up any transaction logs for them since the logs would be empty.
Also, common practice for "inactive" databases would be to make them READ ONLY with a SIMPLE recovery model.
I am looking into using log shipping in a SQL Server 2005 environment. The idea was to set up frequent log shipping to a secondary server. The intent: Use the secondary server to serve report queries, thereby offloading the primary db server.
I came across this on a sqlservercentral forum thread:
When you create the log shipping you have 2 choices. You can configure restore log operation to be done with norecovery or with standby option. If you use the norecovery option, you can not issue select statements on it. If instead of norecovery you use the standby option, you can run select queries on the database.
Bear in mind with the standby option when log file restores occur users will be kicked out without warning by the restore process. Acutely when you configure the log shipping with standby option, you can also select between 2 choices – kill all processes in the secondary database and perform log restore or don’t perform log restore if the database is being used. Of course if you select the second option, the restore operation might never run if someone opens a connection to the database and doesn’t close it, so it is better to use the first option.
So my questions are:
Is the above true? Can you really not use log shipping in the way I intend?
If it is true, could someone explain why you cannot execute SELECT statements to a database while the transaction log is being restored?
EDIT:
First question is duplicate of this serverfault question. But I still would like the second question answered: Why is it not possible to execute SELECT statements while the transaction log is being restored?
could someone explain why you cannot
execute SELECT statements to a
database while the transaction log is
being restored?
Short answer is that RESTORE statement takes an exclusive lock on the database being restored.
For writes, I hope there is no need for me to explain why they are incompatible with a restore. Why does it not allow reads either? First of all, there is no way to know if a session that has a lock on a database is going to do a read or a write. But even if it would be possible, restore (log or backup) is an operation that updates directly the data pages in the database. Since these updates go straight to the physical location (the page) and do not follow the logical hierarchy (metadata-partition-page-row), they would not honor possible intent locks from other data readers, and thus have the possibility to change structures as they are read. A SELECT table scan following the page next-prev pointers would be thrown into disarray, resulting in a corrupted read.
Well yes and no.
You can do exactly what you wish to do, in that you may offload reporting workloads to a secondary server by configuring Log Shipping to a read only copy of a database. I have set this type of architecture up on a number of occasions previously and it works very well indeed.
The caveat is that in order to perform a restore of a Transaction Log Backup file there must be no other connections to the database in question. Hence the two choices being, when the restore process runs it will either fail, thereby prioritising user connections, or it will succeed by disconnecting all user connection in order to perform the restore.
Dependent on your restore frequency this is not necessarily a problem. You simply educate your users to the fact that, say every hour at 10 past the hour, there is a possibility that your report may fail. If this happens simply re-run the report.
EDIT: You may also want to evaluate alternative architeciture solutions to your business need. For example, Transactional Replication or Database Mirroring with a Database Snapshot
If you have enterprise version, you can use database mirroring + snapshot to create read-only copy of the database, available for reporting, etc. Mirroring uses "continuous" log shipping "under the hood". It is frequently used in scenario you have described.
Yes it's true.
I think the following happens:
While the transaction log is being restored, the database is locked, as large portions of it are being updated.
This is for performance reasons more then anything else.
I can see two options:
Use database mirroring.
Schedule the log shipping to only occur when the reporting system is not in use.
Slight confusion in that, the norecovery flag on the restore means your database is not going to be brought out of a recovery state and into an online state - that is why the select statements will not work - the database is offline. The no-recovery flag is there to allow you to restore multiple log files in a row (in a DR type scenario) without bringing the database back online.
If you did not want to log ship / have the disadvantages you could swap to a one way transactional replication, but the overhead / set-up will be more complex overall.
Would peer-to-peer replication work. Then you can run queries on one instance and so save the load on the original instance.