Is there any way that I can determine a DML was run on current session, and I need to Commit or Rollback the transaction? Efficiently?
The Problem
We have a Transaction Manager in our application. It do the Commit/Rollback on each request. But, sometimes we just query some data, and there's no need to Commit/Rollback. We don't want the overhead of frequent commits when there isn't any change.
Inefficient Solution?
this answer suggest to query dbms_transaction. But I'm not sure that if it have a less overhead than doing frequent commits. (Overhead both on our application and database.)
Related
I am trying to find an architectural approach to handling the following concurrency issue:
Multiple users may submit database transactions to the same subset of a database table in a relational database simultaneously, each with different transactions. In this scenario each transaction will run in Isolation level Serializable thus ensuring that the transactions are handled as if they occurred one after the other but this does not solve my specific concurrency issue...
Transaction 1 originates from User 1 who has through the Application made a batch of inserts, updates and deletes on a subset of a table.
User 2 may have started editing the data in the Application any time prior to the commit from Transaction 1 and thus is editing dirty data. Transaction 2 thus originates from User 2 who has (unknowingly) made a batch of inserts, updates, deletes on a the same subset of the table as User 1 BUT these are likely to overwrite and in some instances overlap with the changes made in Transaction 1. Transaction 2 will not fail based on MVCC but is not sensible to perform.
I need Transaction 2 to fail (ideally not even start) due to the data in the database (after the Transaction 1 commit) not being in the state that it was in when User 2 received his/her initial data to work on.
There must be "standard" architectural patterns to achieve my objective - any pointers in the right direction will be much appreciated.
Transaction 2 will not fail based on MVCC
If the transactions are really done in serializable isolation level, then transaction 2 will fail if the transaction includes both the original read of the data (since changed), and the attempted changes based on it. True serializable isolation was implemented way back in PostgreSQL 9.1.
I need some light here. I am working with SQL Server 2008.
I have a database for my application. Each table has a trigger to stores all changes on another database (on the same server) on one unique table 'tbSysMasterLog'. Yes the log of the application its stored on another database.
Problem is, before any Insert/update/delete command on the application database, a transaction its started, and therefore, the table of the log database is locked until the transaction is committed or rolled back. So anyone else who tries to write in any another table of the application will be locked.
So...is there any way possible to disable transactions on a particular database or on a particular table?
You cannot turn off the log. Everything gets logged. You can set to "Simple" which will limit the amount of data saved after the records are committed.
" the table of the log database is locked": why that?
Normally you log changes by inserting records. The insert of records should not lock the complete table, normally there should not be any contention in insertion.
If you do more than inserts, perhaps you should consider changing that. Perhaps you should look at the indices defined on log, perhaps you can avoid some of them.
It sounds from the question that you have a create transaction at the start of your triggers, and that you are logging to the other database prior to the commit transaction.
Normally you do not need to have explicit transactions in SQL server.
If you do need explicit transactions. You could put the data to be logged into variables. Commit the transaction and then insert it into your log table.
Normally inserts are fast and can happen in parallel with out locking. There are certain things like identity columns that require order, but this is very lightweight structure they can be avoided by generating guids so inserts are non blocking, but for something like your log table a primary key identity column would give you a clear sequence that is probably helpful in working out the order.
Obviously if you log after the transaction, this may not be in the same order as the transactions occurred due to the different times that transactions take to commit.
We normally log into individual tables with a similar name to the master table e.g. FooHistory or AuditFoo
There are other options a very lightweight method is to use a trace, this is what is used for performance tuning and will give you a copy of every statement run on the database (including triggers), and you can log this to a different database server. It is a good idea to log to different server if you are doing a trace on a heavily used servers since the volume of data is massive if you are doing a trace across say 1,000 simultaneous sessions.
https://learn.microsoft.com/en-us/sql/tools/sql-server-profiler/save-trace-results-to-a-table-sql-server-profiler?view=sql-server-ver15
You can also trace to a file and then load it into a table, ( better performance), and script up starting stopping and loading traces.
The load on the server that is getting the trace log is minimal and I have never had a locking problem on the server receiving the trace, so I am pretty sure that you are doing something to cause the locks.
I have 2 sessions, each performs the same tasks but on different tables, as follows
begin tran
update...set...
commit tran
checkpoint
Each update is a large batch. The database is under simple recovery mode. To save t-log from growing too large we issue checkpoint so that t-log truncation can happen.
My question is:
If session A committed the transaction and issued a checkpoint while session B is still in the process of updating will the checkpoint issue by session A wait on session B due to session B's active transaction?
In other words would a checkpoint have to wait for all active transaction to finish? How likely is it for the two sessions to form a deadlock?
Also if two checkpoint commands are issued at the same time what will happen?
Note that the session A updates table_A and session B updates table_B. They never update the same table at any given time.
Also I know that using insert into, rename, drop can achieve faster update. But I am limited not to do so. I just want to know about checkpoint concurrency.
Thanks,
A manual Checkpoint simply tells SQL Server to write in-memory changes to disk. It should have no effect on the size of the log.
If Session A commits and checkpoints while Session B is in a transaction on a different table, these are unrelated events - the Session A checkpoint will go ahead, as will the Session B transaction. Since a manual checkpoint simply forces a write of the in-memory data to disk at a time of the programmer's choosing rather than at a time of SQL Server's choosing, the only perceptible consequence should be slightly degraded performance.
Since checkpoints take effect at the database level, concurrent Checkpoints should have the same effect as one Checkpoint.
Checkpoints have absolutely no relation to the data in the database. They do not cause data changes or changes in visibility.
You are likely degrading performance considerably.
Also, it is unlikely that this solves your log problems because SQL Server by default checkpoints regularly anyway. Learn about the log a little more and you'll surely find a better way to address that. Or ask a question about your log problems.
I'm looking into optimizing throughput in a Java application that frequently (100+ transactions/second) updates data in a Postgresql database. Since I don't mind losing a few transactions if the database crashes, I think that using asynchronous commit could be a good fit.
My only concern is that I don't want a delay after commit until other transactions/queries see my commit. I am using the default isolation level "Read committed".
So my question is: Does using asynchronous commit in Postgresql in any way mean that there will be delays before other transactions see my committed data or before they proceed if they have been waiting for my transaction to complete? (As mentioned before, I don't care if there is a delay before my data is persisted to disk.)
It would seem that this is the behavior you're looking for.
http://www.postgresql.org/docs/current/static/wal-async-commit.html
Selecting asynchronous commit mode means that the server returns success as soon as the transaction is logically completed, before the
WAL records it generated have actually made their way to disk. This
can provide a significant boost in throughput for small transactions.
The WAL is used to provide on-disk data integrity, and has nothing to do about table integrity of a running server; it's only important if the server crashes. Since they specifically mention "as soon as the transaction is logically completed", the documention is indicating that this does not affect table behavior.
I'm writing some logging/auditing code that will be running in production (not just when errors are thrown or while developing). After reading Coding Horror's experiences with dead-locking and logging, I decided I should seek advice. (Jeff's solution of "not logging" won't work for me, this is legally mandated security auditing)
Is there an suitable Isolation level for minimizing contention and dead-locking? Any query hints I can add to the insert statement or the stored procedure?
I care deeply about the transactional integrity for everything except the audit table. The idea is that so much will be logged that if a few entries fail, it's not a problem. If the logging stops a some other transaction-- that would be bad.
I can log to a database or a file, although logging to a file is less attractive because I need to be able to display the results somehow. Logging to a file would (almost) guarantee the logging wouldn't interfere with other code though.
A normal transaction (ie. READ COMMITTED) insert already does the 'minimal' locking. Insert intensive applications will not deadlock on the insert, no matter the order of how the insert is mixed with other operations. At worst an intensive insert system may cause page latch contention on the hot spot where insert occurs, but not deadlocks.
To cause deadlocks as described by Jeff there has to be more at play, like any one of the following:
The system is using a higher isolation level (they had it coming then and well deserve it)
They were reading from the log table during the transaction (so is no longer 'append-only')
The deadlock chain involved application layer locks (ie. .Net lock statements in the log4net framework) resulting in undetectable deadlocks (ie. application hangs). Given that solving the problem involved looking at process dumps, I guess this is the scenario they were having.
So as long as you do insert only logging in READ COMMITTED isolation level transactions you are safe. If you expect the same problem I suspect SO had (ie. deadlocks involving application layer locks) then no amount of database wizardry can save you, as the problem can still manifest even if you log on separate transaction or into separate connection.
If you don't care about consistency on your logging table, why not perform all the logging from a separate thread.
I probably would not wait for transactions to complete before logging, since the log can be pivotal in diagnosing long running transactions. Also, this enables you to see all the work a transaction that rolled back did.
Grab the stack trace and all of your logging data in the logging thread, chuck it on a queue when there are new logging messages, flush them to the db in a single transaction.
Steps to minimizing locking:
(KEY) perform all appends to the logging table outside of the main thread/connection/transaction.
Ensure your logging table has a monotonically increasing clustered index (Eg. int identity ) that is increasing each time you append a log message. This ensures the pages being inserted into are usually in memory and avoids the performance hits you get with heap tables.
Perform multiple appends to the log in a transaction (10 inserts in a transaction are faster than 10 inserts out of a transaction and usually acquire/release less locks)
Give it a break. Only perform logging to your db every N milliseconds. Batch up bits of works.
If you need to report on stuff historically, you can consider partitioning your logging table. Example: You could create a new logging table every month, and at the same time have a log VIEW that is a UNION ALL of all the older logging tables. Perform the reporting against the most appropriate source.
You will get better performance by flushing multiple logging messages in a single (smallish) transaction, and have the advantage that if 10 threads are doing work and logging stuff, only a single thread is flushing stuff to the logging table. This pipelining actually makes stuff scale better.
Since you don't care about the transactional integrity of the audit table, you can obviously perform logging outside of the transaction (i.e. after it completes). That will minimise impact on the transaction.
Also, if you want to minimize locking, you should try to ensure that as much of your query workload as possible has covering non-clustered indexes. (SQL Server 2005 and above, the use of the INCLUDE statement in NC indexes can make a big difference)
One easy way to prevent your logging from having locking issues with your 'regular' database is to not use the same database. Just create another database for your logging. As a bonus, the rapid growth of your logging database won't result in fragmentation in your main DB. Personall, I usually prefer to log to a file -- but then again, I'm used to doing heavy text manipulation in my editor - VIM. Logging to a separate DB should help avoid deadlocking issues.
Just make sure that if you try writing your own database appender for the logging framework you use, you be very careful about your locks (which I'm guessing is what tripped up Jeff in the blog post you reference). Properly written (see several of the comments in Jeff's post), you shouldn't have locking issues with your logging framework unless they do something odd.