How to Decide to use Database Transactions

How to Decide to use Database Transactions - database

How do you guys decide that you should be wrapping the sql in a transaction?
Please throw some light on this.
Cheers !!

A transaction should be used when you need a set of changes to be processed completely to consider the operation complete and valid. In other words, if only a portion executes successfully, will that result in incomplete or invalid data being stored in your database?
For example, if you have an insert followed by an update, what happens if the insert succeeds and the update fails? If that would result in incomplete data (in this case, an orphaned record), you should wrap the two statements in a transaction to get them to complete as a "set".

If you are executing two or more statements that you expect to be functionally atomic, you should wrap them in a transaction.

if your have more than a single data modifying statement to execute to complete a task, all should be within a transaction.
This way, if the first one is successful, but any of the following ones has an error, you can rollback (undo) everything as if nothing was ever done.

Whenever you wouldn't like it if part of the operation can complete and part of it doesn't.

Anytime you want to lock up your database and potentially crash your production application, anytime you want to litter your application with hidden scalability nightmares go ahead and create a transaction. Make it big, slow, and put a loop inside.
Seriously, none of the above answers acknowledge the trade-off and potential problems that come with heavy use of transactions. Be careful, and consider the risk/reward each time.
Ebay doesn't use them at all. I'm sure there are many others.
http://www.infoq.com/interviews/dan-pritchett-ebay-architecture

Whenever any operation falls under ACID(Atomicity,Consistency,Isolation,Durability) criteria you should use transactions
Read this article

When you want to use atomic or isolation property of database for a set of changes.
Atomicity: An atomic transaction is an indivisible and irreducible series of database operations such that either all occurs, or nothing occurs(according to wikipedia).
Isolation: isolation determines how transaction integrity is visible to other users and systems(according to wikipedia).

Related

Is that good practice to do other things during db transaction?

During a database transaction, I want to do a few of other things - for example, query ElasticSearch and then write the result to other db table, before committing the transaction, it makes logic sense that I want the entire process in the same transaction, ether all succeed or fail, but I'm not sure if this is doable and good practice. if not, is there any suggestion of good practice in this scenario? Thanks
I think this is a general question and I'm using postgresql if that's any helpful.

You can use BEGIN and SAVEPOINT to control the transactions.
Here is the documentation:
https://www.postgresql.org/docs/8.3/tutorial-transactions.html

That's an excellent question, unfortunately there's no straight answer. It all depends. Let me give you some examples with how I'd suggest approaching:
Fetch extra data during transaction:
a\ save data in DB
b\ fetch some extra data (rollback in case there's fail here)
c\ save extra data in DB
This makes some sense, the whole operation seems like an atomic transaction. But consider the reasons for fail in c. If this data is not available - ok. But if the fails was a temporary network issue? Depending on your business, you might prefer to move b/ after the transaction, maybe even async job and fill the data later?
Saving extra data during transaction:
a\ save some data
b\ save data in an external system
c\ do some more stuff, commit transaction
This is more tricky. What if b\ was succesfull, but there was a fail between b\ and c? How do you rollback changes made in b?
There are probably more, I just wanted to show the way you might think about it.
My gut feeling tells me that the less happens inside the transaction, the easier it should be to reason about what is happening. If you find yourself debugging such weird scenarios - maybe it's reasonable investment to remodel your app so you're able to do those extra calls outside the transaction.

Sql Server transactions - usage recommendations

I saw this sentence not only in one place:
"A transaction should be kept as short as possible to avoid concurrency issues and to enable maximum number of positive commits."
What does this really mean?
It puzzles me now because I want to use transactions for my app which in normal use will deal with inserting of hundreds of rows from many clients, concurrently.
For example, I have a service which exposes a method: AddObjects(List<Objects>) and of course these object contain other nested different objects.
I was thinking to start a transaction for each call from the client performing the appropriate actions (bunch of insert/update/delete for each object with their nested objects). EDIT1: I meant a transaction for entire "AddObjects" call in order to prevent undefined states/behaviour.
Am I going in the wrong direction? If yes, how would you do that and what are your recommendations?
EDIT2: Also, I understood that transactions are fast for bulk oeprations, but it contradicts somehow with the quoted sentence. What is the conclusion?
Thanks in advance!

A transaction has to cover a business specific unit of work. It has nothing to do with generic 'objects', it must always be expressed in domain specific terms: 'debit of account X and credit of account Y must be in a transaction', 'subtract of inventory item and sale must be in a transaction' etc etc. Everything that must either succeed together or fail together must be in a transaction. If you are down an abstract path of 'adding objects to a list is a transaction' then yes, you are on a wrong path. The fact that all inserts/updates/deletes triggered by a an object save are in a transaction is not a purpose, but a side effect. The correct semantics should be 'update of object X and update of object Y must be in a transaction'. Even a degenerate case of a single 'object' being updated should still be regarded in terms of domain specific terms.

That recommendation is best understood as Do not allow user interaction in a transaction. If you need to ask the user during a transaction, roll back, ask and run again.
Other than that, do use transaction whenever you need to ensure atomicity.
It is not a transactions' problem that they may cause "concurrency issues", it is the fact that the database might need some more thought, a better set of indices or a more standardized data access order.

"A transaction should be kept as short as possible to avoid concurrency issues and to enable maximum number of positive commits."
The longer a transaction is kept open the more likely it will lock resources that are needed by other transactions. This blocking will cause other concurrent transactions to wait for the resources (or fail depending on the design).
Sql Server is usually setup in Auto Commit mode. This means that every sql statement is a distinct transaction. Many times you want to use a multi-statement transaction so you can commit or rollback multiple updates. The longer the updates take, the more likely other transactions will conflict.

How to automatically re-run deadlocked transaction? (ASP.NET MVC/SQL Server)

I have a very popular site in ASP.NET MVC/SQL Server, and unfortunately a lot of deadlocks occur. While I'm trying to figure out why they occur via the SQL profiler, I wonder how I can change the default behavior of SQL Server when doing the deadlocks.
Is it possible to re-run the transaction(s) that caused problems instead of showing the error screen?

Remus's answer is fundamentally flawed. According to https://stackoverflow.com/a/112256/14731 a consistent locking order does not prevent deadlocks. The best we can do is reduce their frequency.
He is wrong on two points:
The implication that deadlocks can be prevented. You will find both Microsoft and IBM post articles about reducing the frequency of deadlocks. No where do they claim you can prevent them altogether.
The implication that all deadlocks require you to re-evaluate the state and come to a new decision. It is perfectly correct to retry some actions at the application level, so long as you travel far back enough to the decision point.
Side-note: Remus's main point is that the database cannot automatically retry the operation on your behalf, and he is completely right on that count. But this doesn't mean that re-running operations is the wrong response to a deadlock.

You are barking up the wrong tree. You will never succeed in doing automated deadlock retries by the SQL engine, such concept is fundamentally wrong. The very definition of deadlock is that the state you base your decision on has changed therefore you need to read again the state and make a new decision. If your process has deadlocked, by definition another process has won the deadlocks, and it meas it has changed something you've read.
Your only focus should be at figuring out why the deadlocks occur and eliminate the cause. Invariably, the cause will turn out to be queries that scan more data that they should. While is true that other types of deadlock can occur, I bet is not your case. Many problems will be solved by deploying appropriate indexes. Some problems will send you back to the drawing board and you will have to rethink your requirements.
There are many, many resources out there on how to identify and solve deadlocks:
Detecting and Ending Deadlocks
Minimizing Deadlocks
You may also consider using snapshot isolation, since the lock-free reads involved in snapshot reduce the surface on which deadlocks can occur (ie. only write-write deadlocks can occur). See Using Row Versioning-based Isolation Levels.

A lot of deadlocks occurring is often an indication that you either do not have the correct indexes and/or that your statistics are out of date. Do you have regular scheduled index rebuilds as part of maintenance?
Your save code should automatically retry saves when error 1205 is returned (deadlock occurred). There is a standard pattern that looks like this:
catch (SqlException ex)
{
if (ex.Number == 1205)
{
// Handle Deadlock by retrying save...
}
else
{
throw;
}
}
The other option is to retry within your stored procedures. There is an example of that here: Using TRY...CATCH in Transact-SQL

One option in addition to those suggsted by Mitch and Remus, as your comments suggest you're looking for a fast fix. If you can identify the queries involved in the deadlocks, you can influence which of the queries involved are rolled back and which continue by setting DEADLOCK_PRIORITY for each query, batch or stored procedure.
Looking at your example in the comment to Mitch's answer:
Let's say deadlock occurs on page A,
but page B is trying to access the
locked data. The error will be
displayed on page B, but it doesn't
mean that the deadlock occurred on
page B. It still occurred on page A.
If you consistently see a deadlock occuring from the queries issued from page A and page B, you can influence which page results in an error and which completes successfully. As the others have said, you cannot automatically force a retry.
Post a question with the problem queries and/or the deadlock trace output and theres a good chance you'll get an explanation as to why its occurring and how it could be fixed.

in some cases, you can do below. Between begin tran and commit is all or nothing. So either #errorcode take 0 as value and ends loop, or, in case of failure, decrease counter by 1 and retry again. It may not work if you provide variables to code from outside begin tran/commit. Just an Idea :)
declare #errorcount int = 4 -- retry number
while #errorcount >0
begin
begin tran
<your code here>
set #errorcount =0
commit
set #errorcount=#errorcount-1
end

Why use a READ UNCOMMITTED isolation level?

In plain English, what are the disadvantages and advantages of using
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
in a query for .NET applications and reporting services applications?

This isolation level allows dirty reads. One transaction may see uncommitted changes made by some other transaction.
To maintain the highest level of isolation, a DBMS usually acquires locks on data, which may result in a loss of concurrency and a high locking overhead. This isolation level relaxes this property.
You may want to check out the Wikipedia article on READ UNCOMMITTED for a few examples and further reading.
You may also be interested in checking out Jeff Atwood's blog article on how he and his team tackled a deadlock issue in the early days of Stack Overflow. According to Jeff:
But is nolock dangerous? Could you end
up reading invalid data with read uncommitted on? Yes, in theory. You'll
find no shortage of database
architecture astronauts who start
dropping ACID science on you and all
but pull the building fire alarm when
you tell them you want to try nolock.
It's true: the theory is scary. But
here's what I think: "In theory there
is no difference between theory and
practice. In practice there is."
I would never recommend using nolock
as a general "good for what ails you"
snake oil fix for any database
deadlocking problems you may have. You
should try to diagnose the source of
the problem first.
But in practice adding nolock to queries that you absolutely know are simple, straightforward read-only affairs never seems to lead to problems... As long as you know what you're doing.
One alternative to the READ UNCOMMITTED level that you may want to consider is the READ COMMITTED SNAPSHOT. Quoting Jeff again:
Snapshots rely on an entirely new data change tracking method ... more than just a slight logical change, it requires the server to handle the data physically differently. Once this new data change tracking method is enabled, it creates a copy, or snapshot of every data change. By reading these snapshots rather than live data at times of contention, Shared Locks are no longer needed on reads, and overall database performance may increase.

My favorite use case for read uncommited is to debug something that is happening inside a transaction.
Start your software under a debugger, while you are stepping through the lines of code, it opens a transaction and modifies your database. While the code is stopped, you can open a query analyzer, set on the read uncommited isolation level and make queries to see what is going on.
You also can use it to see if long running procedures are stuck or correctly updating your database using a query with count(*).
It is great if your company loves to make overly complex stored procedures.

This can be useful to see the progress of long insert queries, make any rough estimates (like COUNT(*) or rough SUM(*)) etc.
In other words, the results the dirty read queries return are fine as long as you treat them as estimates and don't make any critical decisions based upon them.

The advantage is that it can be faster in some situations. The disadvantage is the result can be wrong (data which hasn't been committed yet could be returned) and there is no guarantee that the result is repeatable.
If you care about accuracy, don't use this.
More information is on MSDN:
Implements dirty read, or isolation level 0 locking, which means that no shared locks are issued and no exclusive locks are honored. When this option is set, it is possible to read uncommitted or dirty data; values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the four isolation levels.

When is it ok to use READ UNCOMMITTED?
Rule of thumb
Good: Big aggregate reports showing constantly changing totals.
Risky: Nearly everything else.
The good news is that the majority of read-only reports fall in that Good category.
More detail...
Ok to use it:
Nearly all user-facing aggregate reports for current, non-static data e.g. Year to date sales.
It risks a margin of error (maybe < 0.1%) which is much lower than other uncertainty factors such as inputting error or just the randomness of when exactly data gets recorded minute to minute.
That covers probably the majority of what an Business Intelligence department would do in, say, SSRS. The exception of course, is anything with $ signs in front of it. Many people account for money with much more zeal than applied to the related core metrics required to service the customer and generate that money. (I blame accountants).
When risky
Any report that goes down to the detail level. If that detail is required it usually implies that every row will be relevant to a decision. In fact, if you can't pull a small subset without blocking it might be for the good reason that it's being currently edited.
Historical data. It rarely makes a practical difference but whereas users understand constantly changing data can't be perfect, they don't feel the same about static data. Dirty reads won't hurt here but double reads can occasionally be. Seeing as you shouldn't have blocks on static data anyway, why risk it?
Nearly anything that feeds an application which also has write capabilities.
When even the OK scenario is not OK.
Are any applications or update processes making use of big single transactions? Ones which remove then re-insert a lot of records you're reporting on? In that case you really can't use NOLOCK on those tables for anything.

Use READ_UNCOMMITTED in situation where source is highly unlikely to change.
When reading historical data. e.g some deployment logs that happened two days ago.
When reading metadata again. e.g. metadata based application.
Don't use READ_UNCOMMITTED when you know souce may change during fetch operation.

Regarding reporting, we use it on all of our reporting queries to prevent a query from bogging down databases. We can do that because we're pulling historical data, not up-to-the-microsecond data.

This will give you dirty reads, and show you transactions that's not committed yet. That is the most obvious answer. I don't think its a good idea to use this just to speed up your reads. There is other ways of doing that if you use a good database design.
Its also interesting to note whats not happening. READ UNCOMMITTED does not only ignore other table locks. It's also not causing any locks in its own.
Consider you are generating a large report, or you are migrating data out of your database using a large and possibly complex SELECT statement. This will cause a shared lock that's may be escalated to a shared table lock for the duration of your transaction. Other transactions may read from the table, but updates are impossible. This may be a bad idea if its a production database since the production may stop completely.
If you are using READ UNCOMMITTED you will not set a shared lock on the table. You may get the result from some new transactions or you may not depending where it the table the data were inserted and how long your SELECT transaction have read. You may also get the same data twice if for example a page split occurs (the data will be copied to another location in the data file).
So, if its very important for you that data can be inserted while doing your SELECT, READ UNCOMMITTED may make sense. You have to consider that your report may contain some errors, but if its based on millions of rows and only a few of them are updated while selecting the result this may be "good enough". Your transaction may also fail all together since the uniqueness of a row may not be guaranteed.
A better way altogether may be to use SNAPSHOT ISOLATION LEVEL but your applications may need some adjustments to use this. One example of this is if your application takes an exclusive lock on a row to prevent others from reading it and go into edit mode in the UI. SNAPSHOT ISOLATION LEVEL does also come with a considerable performance penalty (especially on disk). But you may overcome that by throwing hardware on the problem. :)
You may also consider restoring a backup of the database to use for reporting or loading data into a data warehouse.

It can be used for a simple table, for example in an insert-only audit table, where there is no update to existing row, and no fk to other table. The insert is a simple insert, which has no or little chance of rollback.

I always use READ UNCOMMITTED now. It's fast with the least issues. When using other isolations you will almost always come across some Blocking issues.
As long as you use Auto Increment fields and pay a little more attention to inserts then your fine, and you can say goodbye to blocking issues.
You can make errors with READ UNCOMMITED but to be honest, it is very easy make sure your inserts are full proof. Inserts/Updates which use the results from a select are only thing you need to watch out for. (Use READ COMMITTED here, or ensure that dirty reads aren't going to cause a problem)
So go the Dirty Reads (Specially for big reports), your software will run smoother...

Transactional theory

(I have a simple CRUD API in a DAO pattern implementation.)
All operations (save, load, update, delete) have a transaction id that must be supplied.
So eg. its possible to do:
...
id = begintransaction();
dao.save(o, id);
sao.update(o2, id);
rollback(id);
All examples excluding load invocations seems intuitive. But as soon as you start to load objects from the database, things "feel" a little bit different. Are load-operations, per definition, tied to a transaction? Or should my load operations be counted as a single amount of work?

Depends on the transaction isolation level (http://en.wikipedia.org/wiki/Isolation_(database_systems)) you're using, but in general they should be part of the transaction. What if somebody else is just in the middle of updating the data you're trying to read? If the read operation is not transactional, you would get old data, and maybe you're interested in the latest data.

If the database is set to decent isolation level, uncommited writes can only be read from the transaction that created them. For example, in Oracle, if a procedure inserts or updates a row and then (without commiting) calls another procedure, which uses "pragma autonomous_transaction" to run in a seperate transaction, that other procedure does not see the new row. (An excellent way to shoot yourself in the foot, btw).
For that reason, you should always consider your load operations as tied to the transaction.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight