What is the mechanism for Transaction Rollback in sql server?
Every update in the database will first write an entry into the log containing the description of the change. Eg. if you update a column value from A to B the log will contain a record of the update, something like: in table T the column C was changed from A to B for record with key K by transaction with id I. If you rollback the transaction, the engine will start scanning the log backward looking for records of work done by your transaction and will undo the work: when it finds the record of update from A to B, will change the value back to A. An insert will be undone by deleting the inserted row. A delete will be undone by inserting back the row. This is described in Transaction Log Logical Architecture and Write-Ahead Transaction Log.
This is the high level explanation, the exact internal details how this happen are undocumented for laymen and not subject to your inspection nor changes.
Have a look at ROLLBACK TRANSACTION (Transact-SQL)
Rolls back an explicit or implicit
transaction to the beginning of the
transaction, or to a savepoint inside
the transaction.
In terms of how it does it, all of the data modifications within the transaction are stored within the transaction log, with additional space also reserved in the log for the undo records, in the event that it has to rollback.
Each transaction log has sufficient information within it, to reverse the change is has made, so that it can undo the change if required. (As well as replay them in a DR scenario)
If we take a simple delete operation as an example (since I've decoded that here as an example of the log contents) the record being deleted is stored inside the transaction log entry of LOP_DELETE_ROWS and with some non-trivial effort you can decode and demonstrate the entire row is within the log entry.
If the transaction is to be rolled back, the undo space reserved in the log is going to be used, and the row would be re-inserted. The reason for the undo reservation of space is to ensure that the transaction log can not be filled up mid transaction, leaving it no space to complete or rollback.
Related
I came across a piece of code in a sql stored proc in our code base where it was using chunking without transaction block. I don’t see how chunking could be beneficial without tran block? I've been humbled a few times when I've jumped into conclusion without digging more, so what advantage does chunking without tran block offer? Is there any?
The pseudocode is something like:
Populate the Main temptable (ID, Name, UpdatedFlag). This flag column
indicates whether the record has been updated or not.
Start while loop (do as long as there is a record in MainTable with
UpdatedFlag = 0)
Only select the given chunkSize into ChunkSizeMain tempTable (ID,
Name) from the records that hasn’t been marked as updated
Begin TRY block
Start updating some other table by joining on ID of
ChunkSizeMainTable.
Update UpdatedFlag = 1 in MainTable.
End try
Begin catch //some action End Catch
Every update query in SQL Server runs in a transaction irrespective of whether it has a BEGIN TRAN next to it. (autocommit transaction if implicit_transaction is not on)
"Chunking" is usually done to stop the transaction log needing to increase in size when the database is in simple recovery mode. A single UPDATE statement that affects 1 million rows will need to have all of that logged to the active log. Dividing into batches can allow the log from earlier committed batches to be truncated and reused by later batches.
It may also be done to reduce the effect on concurrent queries by reducing the length of time of each operation and/or potentially reducing the risk of lock escalation by only updating a few thousand rows at a time.
If I have a database transaction which goes along the lines of:
DELETE FROM table WHERE id = ANY(ARRAY[id1, id2, id3,...]) RETURNING foo, bar;
if num_rows_returned != num_rows_in_array then
rollback and return
Do stuff with deleted data...
Commit
My understanding is that the DELETE query will lock those rows, until the transaction is committed or rolled back. As according to the postgres 9.1 docs:
An exclusive row-level lock on a specific row is automatically
acquired when the row is updated or deleted. The lock is held until
the transaction commits or rolls back, just like table-level locks.
Row-level locks do not affect data querying; they block only writers
to the same row.
I am using the default read committed isolation level in postgres 9.1.13
I would take from this that I should be OK, but I want to ensure that this means the following things are true:
Only one transaction may delete and return a row from this table, unless a previous transaction was rolled back.
This means "Do stuff with deleted data" can only be done once per row.
If two transactions try to do the above at once with conflicting rows, one will always succeed (ignoring system failure), and one will always fail.
Concurrent transactions may succeed when there is no crossover of rows.
If a transaction is unable to delete and return all rows, it will rollback and thus not delete any rows. A transaction may try to delete two rows for example. One row is already deleted by another transaction, but the other is free to be returned. However since one row is already deleted, the other must not be deleted and processed. Only if all specified ids can be deleted and returned may anything take place.
Using the normal idea of concurrency, processes/transactions do not fail when they are locked out of data, they wait.
The DBMS implements execution in such a way that transactions advance but only seeing effects from other transactions according to the isolation level. (Only in the case of detected deadlock is a transaction aborted, and even then its implemented execution will begin again, and the killing is not evident to its next execution or to other transactions except per isolation level.) Under SERIALIZABLE isolation level this means that the database will change as if all transactions happened without overlap in some order. Other levels allow a transaction to see certain effects of overlapped implementation execution of other transactions.
However in the case of PostgresSQL under SERIALIZABLE when a transaction tries to commit and the DBMS sees that it would give non-serialized behaviour the tranasaction is aborted with notification but not automatically restarted. (Note that this is not failure from implementation execution attempted access to a locked resource.)
(Prior to 9.1, PostgrSQL SERIALIZABLE did not give SQL standard (serialized) behaviour: "To retain the legacy Serializable behavior, Repeatable Read should now be requested.")
The locking protocols are how actual implementation execution gets interleaved to maximize throughput while keeping that true. All locking does is prevent actual overlapped implementation execution accesses to effect the apparent serialized execution.
Explicit locking by transaction code also just causes waiting.
Your question does not reflect this. You seem to think that attempted access to a locked resource by the implementation aborts a transaction. That is not so.
Is this possible without restoring whole database?
I have made changes which I would like to undo, but without putting DB offline, and doing full restore.
No, SQL Server does not have Ctrl + Z.
You protect yourself from this scenario by wrapping all DML statements in a transaction. So you have query windows with this:
BEGIN TRANSACTION;
UDPATE ...
-- COMMIT TRANSACTION;
-- ROLLBACK TRANSACTION;
When you run the update, verify that you updated the right number of rows, the right rows, the right way, etc. And then highlight either the commit or the rollback, depending on whether you performed the update correctly.
On the flip side, be careful with this, as it can mess you up the other way - begin a transaction, forget to commit or rollback, then go out for lunch, leave for the day, go on vacation, etc.
Unfortunately that will only help you going forward. In your current scenario, your easiest path is going to be to restore a copy of the database, and harvest the data from that copy (you don't need to completely over-write the current database to restore the data affected by this update).
The short answer is: No.
However, you don't have to take the DB offline to do a partial restore on a table or tables.
You can restore a backup to a separate database and then use TSQL queries to restore the rows that were negatively impacted by your update. This can take place while the main database is online.
More info on restoring a database to a new location:
http://technet.microsoft.com/en-us/library/ms186390.aspx
For future reference, as per my comment,
It is a good practice to use a TRANSACTION.
-- Execute a transaction statement before doing an update.
BEGIN TRANSACTION
... < your update code >
Then if the update is wrong or produces undesired results, you can ROLLBACK the TRANSACTION
-- Ooops I screwed up! Let's rollback!
--ROLLBACK TRANSACTION -- I have this commented out and then just select the command when needed. This helps to not accidentally rollback if you just press CTRL+E, (or F5 in SSMS 2012)
... and it goes away :)
When all is well you just COMMIT the TRANSACTION.
-- COMMIT TRANSACTION -- commented out, see above
Or else you lock the database for all users!
So don't forget to commit!
Yes, besides doing a full restore, there is a viable solution provided by 3rd party tool, which reads information from a database transaction log, parse it, and then creates an undo T-SQL script in order to rollback user actions
Check out the How to recover SQL Server data from accidental updates without backups online article for more information. The article is focused on the UPDATE operation, but with appropriate settings and filters, you can rollback any other database change that's recorded within the transaction log
Disclaimer: I work as a Product Support Engineer at ApexSQL
It is not possible unless you version your data appropriately or do a restore.
Possible but It will require lot of efforts.
SQL Server maintains logs for DELETED/UPDATED/INSERTED data in non-readable format and to read them you should have the efficient tool Event Log Analyzer.
As a slightly modified version to the answers above, I sometimes like to use an automatically rolled back transaction in combination with the OUTPUT keyword and the INSERTED internal table to see what will actually update as a result set.
For instance,
BEGIN TRANSACTION;
UPDATE TableA
SET TableA.Column1 = #SomeValue
OUTPUT INSERTED.*
WHERE <condition>
ROLLBACK TRANSACTION;
If the result set looks good to me, then I'll change the last statement to COMMIT TRANSACTION;.
I'm new to SQL. I have a large number of stored procedures in my production database. I planned to write an audit table that would be used by these stored procedures to keep track of changes ( these stored procedures would write to this audit table ). But the issue is that when a transaction rolls back, the rows inserted into the audit table also get rolled back. Is there any way to create a table that is not affected by transaction rollbacks. Any other idea that satisfies my requirement is welcome!!!
You can't, once a session starts a transaction all activity on that session is contained inside the transaction.
What you can do is to open a different session, for instance a CLR procedure that connects as an ordinary client (not using the context connection) and audits from this connection.
But auditing actions that rollback is a bit unusual, since you are auditing things never occurred from the database perspective and the audit record and the actual database state will conflict.
OK if you want to know what was rolled back here is what you do:
Let your exisiting audit process handle succesful inserts.
Put the values for the insert into a table variable in your sp. It is important that it is a table variable and not a temp table. Now inthe catch block for the transaction perform the rollback. This will not clear the table variable. Then insert into your audit table the values from the table variable (Add a field to the audt table so you can mark the records as rolled back and possibly one for the error message.)
We don't do this specifically for auditing but we have done this to record the errors.
I've got in an ASP.NET application this process :
Start a connection
Start a transaction
Insert into a table "LoadData" a lot of values with the SqlBulkCopy class with a column that contains a specific LoadId.
Call a stored procedure that :
read the table "LoadData" for the specific LoadId.
For each line does a lot of calculations which implies reading dozens of tables and write the results into a temporary (#temp) table (process that last several minutes).
Deletes the lines in "LoadDate" for the specific LoadId.
Once everything is done, write the result in the result table.
Commit transaction or rollback if something fails.
My problem is that if I have 2 users that start the process, the second one will have to wait that the previous has finished (because the insert seems to put an exclusive lock on the table) and my application sometimes falls in timeout (and the users are not happy to wait :) ).
I'm looking for a way to be able to have the users that does everything in parallel as there is no interaction, except the last one: writing the result. I think that what is blocking me is the inserts / deletes in the "LoadData" table.
I checked the other transaction isolation levels but it seems that nothing could help me.
What would be perfect would be to be able to remove the exclusive lock on the "LoadData" table (is it possible to force SqlServer to only lock rows and not table ?) when the Insert is finished, but without ending the transaction.
Any suggestion?
Look up SET TRANSACTION ISOLATION LEVEL READ COMMITTED SNAPSHOT in Books OnLine.
Transactions should cover small and fast-executing pieces of SQL / code. They have a tendancy to be implemented differently on different platforms. They will lock tables and then expand the lock as the modifications grow thus locking out the other users from querying or updating the same row / page / table.
Why not forget the transaction, and handle processing errors in another way? Is your data integrity truely being secured by the transaction, or can you do without it?
if you're sure that there is no issue with cioncurrent operations except the last part, why not start the transaction just before those last statements, Whichever they are that DO require isolation), and commit immediately after they succeed.. Then all the upfront read operations will not block each other...