SQL Server transaction, where is uncommitted data held? - sql-server

My question relates to how are transactions dealt with in SQL Server.
Lets say I have a user defined transaction that contains a lot of data, where does that data get stored during that process? Its only committed to the database file if the transaction is successful but where does it reside beforehand.
Does that stay in memory of the program creating the transaction
to the SQL server memory process
is it written to the transaction log of the database?
temporary files or some other disk location?

Uncommitted data is written into the table(s) in question, it is simply not marked as committed until the transaction commits. If the transaction is rolled back the data will be overwritten the next time a write occurs that needs the affected page. Once the transaction commits, the data is in the table is committed and cannot be overwritten. The log contains an ongoing record of what is happening in the database so that transactions can be rolled backwards or forwards after a system crash, or if a transaction is rolled back.

SQL Server uses ARIES write-ahead logging. Details are described in How It Works: Bob Dorr's SQL Server I/O Presentation. Write-ahead logging requires every change made tot he data to eb described by a log record so that crash recovery can reconstruct the database. To rollback a transaction all that one has to do is to walk the log backward and generate a compensating action for every record generated by the transaction. This will in effect undo everything done by the transaction. Two phase locking will ensure that the compensating operation is always free to proceed.
See also Inside the SQL Server Transaction Log and, of course, ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging.

Related

What happens if the power is removed at the moment the database transaction takes place?

A transaction is understood as a concept that guarantees that something does not happen or happens. What if the system physically writes to the disk to apply the transaction and a power outage occurs?
As defined by the ACID-Properties of transactions: If the transaction-commit returned successfully before the system is down, then it is guaranteed that the changes have been persisted. If it did not return, but the commit is being processed. The following possibilities exist after the restart of the dbms:
The database is in the state before the transaction being committed has been started
The database is in the state, that all changes of the transaction being committed have been persisted and can be found subsequently.

Confusion regarding transaction ROLLBACK on server restart

Following is an except from SQL Server 461 training kit -
[...] if the database server shuts down unexpectedly before the fact of
successful commit could be written to the log, when SQL Server starts up the database, the transaction will be rolled back and any database changes undone.
Microsoft page (https://msdn.microsoft.com/en-us/library/jj835093(v=sql.120).aspx#WAL) reads,
The log records must be written to disk before the associated dirty page is removed from the buffer cache and written to disk.
[...] Log records are written to disk when the transactions are committed.
Any transactions that have not been committed would not have its log records flushed to disk, and therefore would not have the dirty page with data modifications flushed to disk.
I can only see rolling forward changes that have not been flushed to disk yet (from a committed transaction), but this rollback scenario seems impossible since there would never be a change on disk (from a non-committed transaction) to begin with.
Which part am I misunderstanding?
Any transactions that have not been COMMITed, would not have its log records flushed to disk
This is not correct. At commit all the log records relating to the transaction must be flushed to disc but this can certainly happen earlier. It doesn't have to wait for all the transactions to commit before persisting that part of the log.
there would never be a change on disk (from an uncommited transaction)
to begin with.
This is not correct.
Changes from uncommitted transactions can be written out to disc as soon as they are permanently recorded in the transaction log.
This happens every checkpoint.
It is of course not permissible for these changes to be written to the data files on disc until the transaction log with these changes has been flushed to disc as then there would be no way of recovery.

In Log-Based Recovery why do we redo committed transactions?

The log is a sequence of log records, which maintains information about update activities on the database. Whenever a transaction starts, reads, writes or commits it registers itself in the log with its particular action. So now when recovering from failure a transaction needs to be undone if the transaction hasn't committed and it needs to be redone if it has committed. My doubt is regarding the logic behind doing this. Why do we need to redo committed transactions?
Reference: Slide 19 - http://codex.cs.yale.edu/avi/db-book/db6/slide-dir/PPT-dir/ch16.ppt
The data changes for a committed transaction, stored in the database buffers of the SGA, are not necessarily written immediately to the datafiles by the database writer (DBWn) background process.
because they are in SGA they are visible to other users but those changes still can be lost after commit if not written to datafiles immediately.
Reference: https://docs.oracle.com/cd/B19306_01/server.102/b14220/transact.htm
Reference for image: https://docs.oracle.com/cd/E17781_01/server.112/e18804/memory.htm#ADMQS174
It may be possible for a transaction T1 that all its log records have been output to stable storage but the actual updates on data are still in main memory. If a failure occurs at this point then redoing this transaction will ensure that all updates which were virtually lost due to failure would now get written to the stable storage.

Writing to transaction log when log comes to full size

Let's say we have database with defined transaction log initial size to 100MB and maxsize is UNLIMITED.
SQL Server will write into log sequentially from start to end. In one book I found next sentence:
When SQL Server reaches the end of the file as defined by the size
when it was set up, it will wrap around to the beginning again,
looking for free space to use. SQL Server can wrap around without
increasing the physical log file size when there is free virtual
transaction space. Virtual transaction log space becomes free when SQL
Server can write the data from the transaction log into the underlying
tables within the database.
Last part is really confusing to me. What last sentence means? Does it means that SQL Server overwrite old, committed transactions with new transactions?
As far as I know, that would not be the case, because, all transactions must be presented until backup is done.
I don't know if I was enough clear, I will updtae post if needed some explanations.
This only applies to SIMPLE transaction logging:
Virtual transaction log space becomes free when SQL Server can write the data from the transaction log into the underlying tables within the database.
This means, that once the transactions have actually been written to the physical tables, they are no longer needed in the transaction log. Because at this point, a power outage or another catastrophic failure can no longer cause the transactions to be "lost", as they have already been persisted to the disk.
No need to wait until a backup is done. However, if you need full point-in-time recovery, you would use FULL transaction logging, and in that case, no transaction logs will ever be overwritten.
The log records are no longer needed in the transaction log if all of the following are true:
The transaction of which it is part has committed.
The database pages it changed have all been written to disk by a checkpoint.
The log record is not needed for a backup (full, differential, or log).
The log record is not needed for any feature that reads the log (such as database mirroring or replication).
Further Reads,
https://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx
https://technet.microsoft.com/en-us/library/jj835093%28v=sql.110%29.aspx

Question about database transaction log

I read the following statement:
SQL Server doesn’t write data immediately to disk. It is kept in a
buffer cache until this cache is full or until SQL Server issues a
checkpoint, and then the data is written out. If a power failure
occurs while the cache is still filling up, then that data is lost.
Once the power comes back, though, SQL Server would start from its
last checkpoint state, and any updates after the last checkpoint that
were logged as successful transactions will be performed from the
transaction log.
And a couple of questions arise:
What if the power failure happens after SQL Server issues a
checkpoint and before the buffer cache is actuall written to
disk? Isn't the content in buffer cache permanently missing?
The transaction log is also stored as disk file, which is no
different from the actual database file. So how could we guarantee
the integrity of log file?
So, is it true that no real transaction ever exists? It's only a matter of probability.
The statement is correct in that data can be written to cache, but misses the vital point that SQL Server uses a technique called Write Ahead Logging (WAL). The writes to the log are not cached, and a transaction is only considered complete once the transaction records have been written to the log.
http://msdn.microsoft.com/en-us/library/ms186259.aspx
In the event of a failure, the log is replayed as you mention, but the situation regarding the data pages still being in memory and not written to disk does not matter, since the log of their modification is stored and can be retrieved.
It is not true that there is no real transaction, but if you are operating in simple logging mode then the ability to replay is not there.
For the integrity of the log file / same as the data file - a proper backup schedule and a proper restore testing schedule - do not just backup data / logs and assume they work.
What if the power failure happens after SQL Server issues a checkpoint and before the buffer cache is actuall written to disk? Isn't the content in buffer cache permanently missing?
The checkpoint start and end are different records on the transaction log.
The checkpoint is marked as succeeded only after the end of the checkpoint has been written into the log and the LSN of the oldest living transaction (including the checkpoint itself) is written into the database.
If the checkpoint fails to complete, the database is rolled back to the previous LSN, taking the data from the transaction log as necessary.
The transaction log is also stored as disk file, which is no different from the actual database file. So how could we guarantee the integrity of log file?
We couldn't. It's just the data are stored in two places rather than one.
If someone steals your server with both data and log files on it, your transactions are lost.

Resources