Does an Interrupted UPDATE still manipulate data? - sql-server

i have fired an 'update' query by mistake. But while processing i canceled the transaction. I want to know during that time of execution and cancellation has any data got manipulated.

What sort of query?
If it was a plain SELECT, then no damage will have been done.
If it was in a transaction, then the transaction should have been rolled back - and any damage should have been undone.
If the operation was not running in a transaction, the behaviour will be DBMS-specific. Most will treat statements as atomic - either it completes or it is as if the statement was never executed. Not all do things that way, though.
It would help if you specified which DBMS you are using - there can be differences in the answer depending on the nuances of the DBMS in question.

But while processing i cancelled the
transaction.
WP -
Basically, if it was a transaction, and you canceled it before it finished, then whatever had started would have been undone. What your database looks like now should be the same as it looked before the UPDATE.

Related

Prioritizing Transactions in Google AppEngine

Let's say I need to perform two different kinds write operations on a datastore entity that might happen simultaneously, for example:
The client that holds a write-lock on the entry updates the entry's content
The client requests a refresh of the write-lock (updates the lock's expiration time-stamp)
As the content-update operation is only allowed if the client holds the current write-lock, I need to perform the lock-check and the content-write in a transaction (unless there is another way that I am missing?). Also, a lock-refresh must happen in a transaction because the client needs to first be confirmed as the current lock-holder.
The lock-refresh is a very quick operation.
The content-update operation can be quite complex. Think of it as the client sending the server a complicated update-script that the server executes on the content.
Given this, if there is a conflict between those two transactions (should they be executed simultaneously), I would much rather have the lock-refresh operation fail than the complex content-update.
Is there a way that I can "prioritize" the content-update transaction? I don't see anything in the docs and I would imagine that this is not a specific feature, but maybe there is some trick I can use?
For example, what happens if my content-update reads the entry, writes it back with a small modification (without committing the transaction), then performs the lengthy operation and finally writes the result and commits the transaction? Would the first write be applied immediately and cause a simultaneous lock-refresh transaction to fail? Or are all writes kept until the transaction is committed at the end?
Is there such a thing as keeping two transactions open? Or doing an intermediate commit in a transaction?
Clearly, I can just split my content-update into two transactions: The first one sets a "don't mess with this, please!"-flag and the second one (later) writes the changes and clears that flag.
But maybe there is some other trick to achieve this with fewer reads/writes/transactions?
Another thought I had was that there are 3 different "blocks" of data: The current lock-holder (LH), the lock expiration (EX), and the content that is being modified (CO). The lock-refresh operation needs to perform a read of LH and a write to EX in a transaction, while the content-update operation needs to perform a read of LH, a read of CO, and a write of CO in a transaction. Is there a way to break the data apart into three entities and somehow have the transactions span only the needed entities? Since LH is never modified by these two operations, this might help avoid the conflict in the first place?
The datastore uses optimistic concurrency control, which means that a (datastore primitive) transaction waits until it is committed, then succeeds only if someone else hasn't committed first. Typically, the app retries the failed transaction with fresh data. There is no way to modify this first-wins behavior.
It might help to know that datastore transactions are strongly consistent, so a client can first commit a lock refresh with a synchronous datastore call, and when that call returns, the client knows for sure whether it obtained or refreshed the lock. The client can then proceed with its update and lock clear. The case you describe where a lock refresh and an update might occur concurrently from the same client sounds avoidable.
I'm assuming you need the lock mechanism to prevent writes from other clients while the lock owner performs multiple datastore primitive transactions. If a client is actually only doing one update before it releases the lock and it can do so within seconds (well before the datastore RPC timeout), you might get by with just a primitive datastore transaction with optimistic concurrency control and retries. But a lock might be a good idea for simple serialization of, say, edits to a record in a user interface, where a user hits an "edit" button in a UI and you want that to guarantee that the user has some time to prepare and submit changes without the record being changed by someone else. (Whether that's the user experience you want is your decision. :) )

Can Lost Update happen in read committed isolation level in PostgreSQL?

I have a query like below in PostgreSQL:
UPDATE
queue
SET
queue.status = 'PROCESSING'
WHERE
queue.status = 'WAITING' AND
queue.id = (SELECT id FROM queue WHERE STATUS = 'WAITING' LIMIT 1 )
RETURNING
queue.id
and many workers try to process one work at a time (that's why I have sub-query with limit 1). After this update, each worker grabs information about the id and processes the work, but sometimes they grab the same work and process it twice or more. The isolation level is Read Committed.
My question is how can I guarantee one work is going to be processed once? I know there is so many post out there but I can say I have tried most of them and it didn't help () ;
I have tried SELECT FOR UPDATE, but it caused deadlocked situation.
I have tried pg_try_advisory_xact_lock, but it caused out of shared
memory
I tried adding AND pg_try_advisory_xact_lock(queue.id) to the outer query's WHERE clause, but ... [?]
Any help would be appreciated.
A lost update won't occur in the situation you describe, but it won't work properly either.
What will happen in the example you've given above is that given (say) 10 workers started simultaneously, all 10 of them will execute the subquery and get the same ID. They will all attempt to lock that ID. One of them will succeed; the others will block on the first one's lock. Once the first backend commits or rolls back, the 9 others will race for the lock. One will get it, re-check the WHERE clause and see that the queue.status test no longer matches, and return without modifying any rows. The same will happen with the other 8. So you used 10 queries to do the work of one query.
If you fail to explicitly check the UPDATE result and see that zero rows were updated you might think you were getting lost updates, but you aren't. You just have a concurrency bug in your application caused by a misunderstanding of the order-of-execution and isolation rules. All that's really happening is that you're effectively serializing your backends so that only one at a time actually makes forward progress.
The only way PostgreSQL could avoid having them all get the same queue item ID would be to serialize them, so it didn't start executing query #2 until query #1 finished. If you want to you can do this by LOCKing the queue table ... but again, you might as well just have one worker then.
You can't get around this with advisory locks, not easily anyway. Hacks where you iterated down the queue using non-blocking lock attempts until you got the first lockable item would work, but would be slow and clumsy.
You are attempting to implement a work queue using the RDBMS. This will not work well. It will be slow, it will be painful, and getting it both correct and fast will be very very hard. Don't roll your own. Instead, use a well established, well tested system for reliable task queueing. Look at RabbitMQ, ZeroMQ, Apache ActiveMQ, Celery, etc. There's also PGQ from Skytools, a PostgreSQL-based solution.
Related:
In PostgreSQL, do multiple UPDATES to different rows in the same table having a locking conflict?
Can multiple threads cause duplicate updates on constrained set?
Why do we need message brokers like rabbitmq over a database like postgres?
SKIP LOCKED can be used to implement queue in PostgreSql. see
In PostgreSQL, lost update happens in READ COMMITTED and READ UNCOMMITTED but if you use SELECT FOR UPDATE in READ COMMITTED and READ UNCOMMITTED, lost update doesn't happen.
In addition, lost update doesn't happen in REPEATABLE READ and SERIALIZABLE whether or not you use SELECT FOR UPDATE. *Error happens if there is a lost update condition.

Objectify transaction vs. regular load then save

I need only confirmation that I get this right.
If, for example I have an Entity X with a field x, and when a request is sent I want to do X.x++. If I use just X = ofy().load().type(X.class).id(xId).get() then I do some calculations and afterwards I do X.x++ and the I save it. If during the calculations another request is posted, I'll get an unwanted behavior. And instead if I'll do this all in a transaction, the second request won't have access to X until I finish.
Is it so?
Sorry if the question is a bit nooby.
Thanks,
Dan
Yes you got it right but when using transaction remember the first that completes wins and the rest fail. Look also at #Peter Knego's answer for how they work.
But don't worry about the second request if it fails to read.
You have like 2 options:
Force a retries
Use eventual consistency in your transactions
As far as the retries are concerned:
Your transaction function can be called multiple times safely without
undesirable side effects. If this is not possible, you can set
retries=0, but know that the transaction will fail on the first
incident of contention
Example:
#db.transactional(retries=10)
As far as eventual consistency is concerned:
You can opt out of this protection by specifying a read policy that
requests eventual consistency. With an eventually consistent read of
an entity, your app gets the current known state of the entity being
read, regardless of whether there are still committed changes to be
applied. With an eventually consistent ancestor query, the indexes
used for the query are consistent with the time the indexes are read
from disk. In other words, an eventual consistency read policy causes
gets and queries to behave as if they are not a part of the current
transaction. This may be faster in some cases, since the operations do
not have to wait for committed changes to be written before returning
a result.
Example:
#db.transactional()
def test():
game_version = db.get(
db.Key.from_path('GameVersion', 1),
read_policy=db.EVENTUAL_CONSISTENCY)
No, GAE transaction do not do locking, they use optimistic concurrency control. You will have access to X all the time, but when you try to save it in the second transactions it will fail with ConcurrentModificationException.

How to automatically re-run deadlocked transaction? (ASP.NET MVC/SQL Server)

I have a very popular site in ASP.NET MVC/SQL Server, and unfortunately a lot of deadlocks occur. While I'm trying to figure out why they occur via the SQL profiler, I wonder how I can change the default behavior of SQL Server when doing the deadlocks.
Is it possible to re-run the transaction(s) that caused problems instead of showing the error screen?
Remus's answer is fundamentally flawed. According to https://stackoverflow.com/a/112256/14731 a consistent locking order does not prevent deadlocks. The best we can do is reduce their frequency.
He is wrong on two points:
The implication that deadlocks can be prevented. You will find both Microsoft and IBM post articles about reducing the frequency of deadlocks. No where do they claim you can prevent them altogether.
The implication that all deadlocks require you to re-evaluate the state and come to a new decision. It is perfectly correct to retry some actions at the application level, so long as you travel far back enough to the decision point.
Side-note: Remus's main point is that the database cannot automatically retry the operation on your behalf, and he is completely right on that count. But this doesn't mean that re-running operations is the wrong response to a deadlock.
You are barking up the wrong tree. You will never succeed in doing automated deadlock retries by the SQL engine, such concept is fundamentally wrong. The very definition of deadlock is that the state you base your decision on has changed therefore you need to read again the state and make a new decision. If your process has deadlocked, by definition another process has won the deadlocks, and it meas it has changed something you've read.
Your only focus should be at figuring out why the deadlocks occur and eliminate the cause. Invariably, the cause will turn out to be queries that scan more data that they should. While is true that other types of deadlock can occur, I bet is not your case. Many problems will be solved by deploying appropriate indexes. Some problems will send you back to the drawing board and you will have to rethink your requirements.
There are many, many resources out there on how to identify and solve deadlocks:
Detecting and Ending Deadlocks
Minimizing Deadlocks
You may also consider using snapshot isolation, since the lock-free reads involved in snapshot reduce the surface on which deadlocks can occur (ie. only write-write deadlocks can occur). See Using Row Versioning-based Isolation Levels.
A lot of deadlocks occurring is often an indication that you either do not have the correct indexes and/or that your statistics are out of date. Do you have regular scheduled index rebuilds as part of maintenance?
Your save code should automatically retry saves when error 1205 is returned (deadlock occurred). There is a standard pattern that looks like this:
catch (SqlException ex)
{
if (ex.Number == 1205)
{
// Handle Deadlock by retrying save...
}
else
{
throw;
}
}
The other option is to retry within your stored procedures. There is an example of that here: Using TRY...CATCH in Transact-SQL
One option in addition to those suggsted by Mitch and Remus, as your comments suggest you're looking for a fast fix. If you can identify the queries involved in the deadlocks, you can influence which of the queries involved are rolled back and which continue by setting DEADLOCK_PRIORITY for each query, batch or stored procedure.
Looking at your example in the comment to Mitch's answer:
Let's say deadlock occurs on page A,
but page B is trying to access the
locked data. The error will be
displayed on page B, but it doesn't
mean that the deadlock occurred on
page B. It still occurred on page A.
If you consistently see a deadlock occuring from the queries issued from page A and page B, you can influence which page results in an error and which completes successfully. As the others have said, you cannot automatically force a retry.
Post a question with the problem queries and/or the deadlock trace output and theres a good chance you'll get an explanation as to why its occurring and how it could be fixed.
in some cases, you can do below. Between begin tran and commit is all or nothing. So either #errorcode take 0 as value and ends loop, or, in case of failure, decrease counter by 1 and retry again. It may not work if you provide variables to code from outside begin tran/commit. Just an Idea :)
declare #errorcount int = 4 -- retry number
while #errorcount >0
begin
begin tran
<your code here>
set #errorcount =0
commit
set #errorcount=#errorcount-1
end

How to Decide to use Database Transactions

How do you guys decide that you should be wrapping the sql in a transaction?
Please throw some light on this.
Cheers !!
A transaction should be used when you need a set of changes to be processed completely to consider the operation complete and valid. In other words, if only a portion executes successfully, will that result in incomplete or invalid data being stored in your database?
For example, if you have an insert followed by an update, what happens if the insert succeeds and the update fails? If that would result in incomplete data (in this case, an orphaned record), you should wrap the two statements in a transaction to get them to complete as a "set".
If you are executing two or more statements that you expect to be functionally atomic, you should wrap them in a transaction.
if your have more than a single data modifying statement to execute to complete a task, all should be within a transaction.
This way, if the first one is successful, but any of the following ones has an error, you can rollback (undo) everything as if nothing was ever done.
Whenever you wouldn't like it if part of the operation can complete and part of it doesn't.
Anytime you want to lock up your database and potentially crash your production application, anytime you want to litter your application with hidden scalability nightmares go ahead and create a transaction. Make it big, slow, and put a loop inside.
Seriously, none of the above answers acknowledge the trade-off and potential problems that come with heavy use of transactions. Be careful, and consider the risk/reward each time.
Ebay doesn't use them at all. I'm sure there are many others.
http://www.infoq.com/interviews/dan-pritchett-ebay-architecture
Whenever any operation falls under ACID(Atomicity,Consistency,Isolation,Durability) criteria you should use transactions
Read this article
When you want to use atomic or isolation property of database for a set of changes.
Atomicity: An atomic transaction is an indivisible and irreducible series of database operations such that either all occurs, or nothing occurs(according to wikipedia).
Isolation: isolation determines how transaction integrity is visible to other users and systems(according to wikipedia).

Resources