Is there a delay before other transactions see a commit if using asynchronous commit in Postgresql? - database

I'm looking into optimizing throughput in a Java application that frequently (100+ transactions/second) updates data in a Postgresql database. Since I don't mind losing a few transactions if the database crashes, I think that using asynchronous commit could be a good fit.
My only concern is that I don't want a delay after commit until other transactions/queries see my commit. I am using the default isolation level "Read committed".
So my question is: Does using asynchronous commit in Postgresql in any way mean that there will be delays before other transactions see my committed data or before they proceed if they have been waiting for my transaction to complete? (As mentioned before, I don't care if there is a delay before my data is persisted to disk.)

It would seem that this is the behavior you're looking for.
http://www.postgresql.org/docs/current/static/wal-async-commit.html
Selecting asynchronous commit mode means that the server returns success as soon as the transaction is logically completed, before the
WAL records it generated have actually made their way to disk. This
can provide a significant boost in throughput for small transactions.
The WAL is used to provide on-disk data integrity, and has nothing to do about table integrity of a running server; it's only important if the server crashes. Since they specifically mention "as soon as the transaction is logically completed", the documention is indicating that this does not affect table behavior.

Related

Prevent SQL view from being blocked by lengthy delete/insert transaction using snapshot isolation

I am attempting to prevent "splash pages" on our website, which are generated from SQL views, from blocking during the rather lengthy update of underlying tables.
The update stored procedure utilizes snapshot isolation and the resulting row versioning allows transacted select queries from the tables being updated; returning their values prior to the updating transaction beginning. Wonderful stuff.
Unfortunately, my select queries still appear to be blocked if they are from a view constructed from the underlying tables.
UPDATE: Turns out these queries are not blocked in the traditional sense as there are no requests in WAIT state in sys.dm_tran_locks. But are none-the-less stalled waiting for synchronous statistics to complete before compiling.
Thank you, #Charlieface for exposing this essential piece to using SNAPSHOT isolation for data availability during large update transactions!
It seems, after working through various options in the comments, that AUTO_UPDATE_STATISTICS_ASYNC was set to OFF.
This meant that a big enough update was causing a synchronous statistics refresh, which in turn blocked other queries from even compiling.
If AUTO_UPDATE_STATISTICS_ASYNC is set to ON then currently compiling queries do not benefit from the statistics update (a minor disadvantage), however there is no blocking because the statistics are refreshed asynchronously.
One more downside if it is ON is that you cannot access the database in single-user mode, although that case is obviously very rare, and you can just turn it off for that one time.
See more in this great article, there are more very rare circumstances when you want it OFF

SQL Server ROLLBACK transaction took forever. Why?

We have a huge DML script, that opens up a transaction and performs a lot of changes and only then it commits.
So recently, I had triggered this scripts (through an app), and as it was taking quite an amount of time, I had killed the session, which triggered a ROLLBACK.
So the problem is that this ROLLBACK took forever and moreover it was hogging a lot of CPU (100% utilization), and as I was monitoring this session (using exec DMVs), I saw a lot of waits that are IO related (IO_COMPLETION, PAGE_IO_LATCH etc).
So my question is:
1. WHy does a rollback take some much amount of time? Is it because it needs to write every revert change to the LOG file? And the IO waits I saw could be related to IO operation against this LOG file?
2. Are there any online resources that I can find, that explains how ROLLBACK mechanism works?
Thank You
Based on another article on the DBA side of SO, ROLLBACKs are slower for at least two reasons: the original SQL is capable of being multithreaded, where the rollback is single-threaded, and two, a commit confirms work that is already complete, where the rollback not only must identify the log action to reverse, but then target the impacted row.
https://dba.stackexchange.com/questions/5233/is-rollback-a-fast-operation
This is what I have found out about why a ROLLBACK operation in SQL Server could be time-consuming and as to why it could produce a lot of IO.
Background Knowledge (Open Tran/Log mechanism):
When a lot of changes to the DB are being written as part of an open transaction, these changes modify the data pages in memory (dirty pages) and log records (into a structure called LOG BLOCKS) generated are initially written to the buffer pool (In Memory). These dirty pages are flushed to the disk either by a recurring Checkpoint operation or a lazy-write process. In accordance with the write-ahead logging mechanism of the SQL Server, before the dirty pages are flushed the LOG RECORDS describing these changes needs to be flushed to the disk as well.
Keeping this background knowledge in mind, now when a transaction is rolled back, this is almost like a recovery operation, where all the changes that are written to the disk, have to be undone. So, the heavy IO we were experiencing might have happened because of this, as there were lots of data changes that had to be undone.
Information Source: https://app.pluralsight.com/library/courses/sqlserver-logging/table-of-contents
This course has a very deep and detailed explanation of how logging recovery works in SQL Server.

Transaction isolation in JET?

MSDN describes the JET transaction isolation for its OLEDB provider as follows:
Jet supports five levels of nesting in transactions. The only
supported mode for transactions is Read Committed. Setting lesser
levels of transactional separation implies Read Committed. Setting
higher levels will cause StartTransaction to fail.
Jet supports only single-phase commit.
MSDN describes Read Committed as follows:
Specifies that shared locks are held while the data is being read to
avoid dirty reads, but the data can be changed before the end of the
transaction, resulting in nonrepeatable reads or phantom data. This
option is the SQL Server default.
My questions are:
What is single-phase commit? What consequence does this have for transactions and isolation?
Would the Read Committed isolation level as described above be suitable for my requirements here?
What is the best way to achieve a Serializable transaction isolation using Jet?
By question number:
Single-phase commit is used where all of your data is in one database -- the activity of the transaction is committed atomically and you're done. If you have a logical transaction which needs to be spread across multiple storage engines (like a relational database for metadata and some sort of document store for a big blob) you can use a transaction manager to coordinate the activities so that the work is persisted in both or neither, if both products support two phase commit. They are just telling you that they don't support two-phase commit, so the product is not suitable for distributed transactions.
Yes, if you check the condition in the UPDATE statement itself; otherwise you might have problems.
They seem to be suggesting that you can't.
As an aside, I worked for decades as a consultant in quite a variety of environments. More than once I was engaged to migrate people off of Jet because of performance problems. In one case a simple "star" type query was running for two minutes because it was joining on the client rather than letting the database do it. As a direct query against the database it was sub-second. In another case there was a report which took 72 hours to run through Jet, which took 2 minutes when run directly against the database. If it generally works OK for you, you might be able to deal with such situations by using stored procedures where Jet is causing performance pain.

Advice for minimizing locking on an append only table in MS SQL Server?

I'm writing some logging/auditing code that will be running in production (not just when errors are thrown or while developing). After reading Coding Horror's experiences with dead-locking and logging, I decided I should seek advice. (Jeff's solution of "not logging" won't work for me, this is legally mandated security auditing)
Is there an suitable Isolation level for minimizing contention and dead-locking? Any query hints I can add to the insert statement or the stored procedure?
I care deeply about the transactional integrity for everything except the audit table. The idea is that so much will be logged that if a few entries fail, it's not a problem. If the logging stops a some other transaction-- that would be bad.
I can log to a database or a file, although logging to a file is less attractive because I need to be able to display the results somehow. Logging to a file would (almost) guarantee the logging wouldn't interfere with other code though.
A normal transaction (ie. READ COMMITTED) insert already does the 'minimal' locking. Insert intensive applications will not deadlock on the insert, no matter the order of how the insert is mixed with other operations. At worst an intensive insert system may cause page latch contention on the hot spot where insert occurs, but not deadlocks.
To cause deadlocks as described by Jeff there has to be more at play, like any one of the following:
The system is using a higher isolation level (they had it coming then and well deserve it)
They were reading from the log table during the transaction (so is no longer 'append-only')
The deadlock chain involved application layer locks (ie. .Net lock statements in the log4net framework) resulting in undetectable deadlocks (ie. application hangs). Given that solving the problem involved looking at process dumps, I guess this is the scenario they were having.
So as long as you do insert only logging in READ COMMITTED isolation level transactions you are safe. If you expect the same problem I suspect SO had (ie. deadlocks involving application layer locks) then no amount of database wizardry can save you, as the problem can still manifest even if you log on separate transaction or into separate connection.
If you don't care about consistency on your logging table, why not perform all the logging from a separate thread.
I probably would not wait for transactions to complete before logging, since the log can be pivotal in diagnosing long running transactions. Also, this enables you to see all the work a transaction that rolled back did.
Grab the stack trace and all of your logging data in the logging thread, chuck it on a queue when there are new logging messages, flush them to the db in a single transaction.
Steps to minimizing locking:
(KEY) perform all appends to the logging table outside of the main thread/connection/transaction.
Ensure your logging table has a monotonically increasing clustered index (Eg. int identity ) that is increasing each time you append a log message. This ensures the pages being inserted into are usually in memory and avoids the performance hits you get with heap tables.
Perform multiple appends to the log in a transaction (10 inserts in a transaction are faster than 10 inserts out of a transaction and usually acquire/release less locks)
Give it a break. Only perform logging to your db every N milliseconds. Batch up bits of works.
If you need to report on stuff historically, you can consider partitioning your logging table. Example: You could create a new logging table every month, and at the same time have a log VIEW that is a UNION ALL of all the older logging tables. Perform the reporting against the most appropriate source.
You will get better performance by flushing multiple logging messages in a single (smallish) transaction, and have the advantage that if 10 threads are doing work and logging stuff, only a single thread is flushing stuff to the logging table. This pipelining actually makes stuff scale better.
Since you don't care about the transactional integrity of the audit table, you can obviously perform logging outside of the transaction (i.e. after it completes). That will minimise impact on the transaction.
Also, if you want to minimize locking, you should try to ensure that as much of your query workload as possible has covering non-clustered indexes. (SQL Server 2005 and above, the use of the INCLUDE statement in NC indexes can make a big difference)
One easy way to prevent your logging from having locking issues with your 'regular' database is to not use the same database. Just create another database for your logging. As a bonus, the rapid growth of your logging database won't result in fragmentation in your main DB. Personall, I usually prefer to log to a file -- but then again, I'm used to doing heavy text manipulation in my editor - VIM. Logging to a separate DB should help avoid deadlocking issues.
Just make sure that if you try writing your own database appender for the logging framework you use, you be very careful about your locks (which I'm guessing is what tripped up Jeff in the blog post you reference). Properly written (see several of the comments in Jeff's post), you shouldn't have locking issues with your logging framework unless they do something odd.

database autocommit - does it go directly to disk?

So I know that autocommit commits every sql statement, but do updates to the database go directly to the disk or do they remain on cache until flushed?
I realize it's dependent on the database implementation.
Does auto-commit mean
a) every statement is a complete transaction AND it goes straight to disk or
b) every statement is a complete transaction and it may go to cache where it will be flushed later or it may go straight to disk
Clarification would be great.
Auto-commit simply means that each statement is in its own transaction which commits immediately. This is in contrast to the "normal" mode, where you must explicitly BEGIN a transaction and then COMMIT once you are done (usually after several statements).
The phrase "auto-commit" has nothing to do with disk access or caching. As an implementation detail, most databases will write to disk on commit so as to avoid data loss, but this isn't mandatory in the spec.
For ARIES-based protocols, committing a transaction involves logging all modifications made within that transaction. Changes are flushed immediately to logfile, but not necessarily to datafile (that is dependent on the implementation). That is enough to ensure that the changes can be recovered in the event of a failure. So, (b).
Commit provides no guarantee that something has been written to disk, only that your transaction has been completed and the changes are now visible to other users.
Permanent does not necessarily mean written to disk (i.e. durable)... Even if a "commit" waits for the transaction to complete can be configured with some databases.
For example, Oracle 10gR2 has several commit modes, including IMMEDIATE,WAIT,BATCH,NOWAIT. BATCH will queue the buffer the changes and the writer will write the changes to disk at some future time. NOWAIT will return immediately without regard for I/O.
The exact behavior of commmit is very database specific and can often be configured depending on your tolerance for data loss.
It depends on the DBMS you're using. For example, Firebird has it as an option in configuration file. If you turn Forced Writes on, the changes go directly to the disk. Otherwise they are submitted to the filesystem, and the actual write time depends on the operating system caching.
If the database transaction is claimed to be ACID, then the D (durability) mandates that the transaction committed should survive the crash immediately after the successful commit. For single server database, that means it's on the disk (disk commit). For some modern multi-server databases, it can also means that the transaction is sent to one or more servers (network commit, which are typically much faster than disk), under the assumption that the probability of multiple server crash at the same time is much smaller.
It's impossible to guarantee that commits are atomic, so modern databases use two-phase or three phase commit strategies. See Atomic Commit

Resources