Following is an except from SQL Server 461 training kit -
[...] if the database server shuts down unexpectedly before the fact of
successful commit could be written to the log, when SQL Server starts up the database, the transaction will be rolled back and any database changes undone.
Microsoft page (https://msdn.microsoft.com/en-us/library/jj835093(v=sql.120).aspx#WAL) reads,
The log records must be written to disk before the associated dirty page is removed from the buffer cache and written to disk.
[...] Log records are written to disk when the transactions are committed.
Any transactions that have not been committed would not have its log records flushed to disk, and therefore would not have the dirty page with data modifications flushed to disk.
I can only see rolling forward changes that have not been flushed to disk yet (from a committed transaction), but this rollback scenario seems impossible since there would never be a change on disk (from a non-committed transaction) to begin with.
Which part am I misunderstanding?
Any transactions that have not been COMMITed, would not have its log records flushed to disk
This is not correct. At commit all the log records relating to the transaction must be flushed to disc but this can certainly happen earlier. It doesn't have to wait for all the transactions to commit before persisting that part of the log.
there would never be a change on disk (from an uncommited transaction)
to begin with.
This is not correct.
Changes from uncommitted transactions can be written out to disc as soon as they are permanently recorded in the transaction log.
This happens every checkpoint.
It is of course not permissible for these changes to be written to the data files on disc until the transaction log with these changes has been flushed to disc as then there would be no way of recovery.
Related
The log is a sequence of log records, which maintains information about update activities on the database. Whenever a transaction starts, reads, writes or commits it registers itself in the log with its particular action. So now when recovering from failure a transaction needs to be undone if the transaction hasn't committed and it needs to be redone if it has committed. My doubt is regarding the logic behind doing this. Why do we need to redo committed transactions?
Reference: Slide 19 - http://codex.cs.yale.edu/avi/db-book/db6/slide-dir/PPT-dir/ch16.ppt
The data changes for a committed transaction, stored in the database buffers of the SGA, are not necessarily written immediately to the datafiles by the database writer (DBWn) background process.
because they are in SGA they are visible to other users but those changes still can be lost after commit if not written to datafiles immediately.
Reference: https://docs.oracle.com/cd/B19306_01/server.102/b14220/transact.htm
Reference for image: https://docs.oracle.com/cd/E17781_01/server.112/e18804/memory.htm#ADMQS174
It may be possible for a transaction T1 that all its log records have been output to stable storage but the actual updates on data are still in main memory. If a failure occurs at this point then redoing this transaction will ensure that all updates which were virtually lost due to failure would now get written to the stable storage.
In which conditions the T-logs are flushed from the log cache to log file or disk?
Does it happen after every commit or after every 3 seconds or only after checkpoint?
And in where the dirty pages are stored in the SQL server when the memory is not big enough to hold the data in the buffer pool(in temp db or in the respective databases)? and for how long the uncommitted data is preserved in SQL server and where?
You are asking two random questions
1.Transaction log buffer
2.Buffer pool
In which conditions the T-logs are flushed from the log cache to log file or disk?Does it happen after every commit or after every 3 seconds or only after checkpoint?
Consider below update statement
Update table set id=1
where id=2
First of all this modification is written to Transaction log buffer..SQLServer then writes this modification to disk, before we get successfull commit..This is called write Ahead logging and this type of commits will not be periodic or any thing..This happens per statement
And in where the dirty pages are stored in the SQL server when the memory is not big enough to hold the data in the buffer pool(in temp db or in the respective databases)? and for how long the uncommitted data is preserved in SQL server and where?
consider the same update transaction and this update needs to touch three pages..and one page is not in buffer pool..In this case,SQL reads the page from disk and places it in buffer pool and modifies it..Now this page is called dirty page...
These type of pages will be flushed to disk ,When check point occurs..check point occurs due to various conditions as mentioned in below link
https://msdn.microsoft.com/en-us/library/ms189573.aspx
Checkpoint is an internal process that writes all dirty pages (modified pages) from Buffer Cache to Physical disk, apart from this it also writes the log records from log buffer to physical file.
A checkpoint always writes out all pages that have changed (known as being marked dirty) since the last checkpoint, or since the page was read in from disk. It doesn't matter whether the transaction that changed a page has committed or not – the page is written to disk regardless. The only exception is for tempdb, where data pages are not written to disk as part of a checkpoint.
A checkpoint is only done for tempdb when the tempdb log file reaches 70% full – this is to prevent the tempdb log from growing if at all possible (note that a long-running transaction can still essentially hold the log hostage and prevent it from clearing, just like in a user database).
Conditions when the Transaction logs are flushed into the Log File:
The LOGWRITER is the process which is responsible for writing logs from Log Cache to Log file.
The conditions where the log buffer is flushed to disk includes:
A session issues a commit or a rollback command.
The log buffer becomes 1/3 full.
After every checkpoint.
Whenever Log file becomes 70% Full.
It also depends on the Target Recovery Time
Does it happen after every commit or after every 3 seconds or only
after checkpoint?
It happens after every commit and after every checkpoint.
Checkpoint occurs for a user database, all dirty pages for that database are flushed to disk (as well as other operations). This does not happen for tempdb. Tempdb is not recovered in the event of a crash, and so there is no need to force dirty tempdb pages to disk, except in the case where the lazywriter process (part of the buffer pool) has to make space for pages from other databases. When you issue a manual CHECKPOINT, all the dirty pages are flushed, but for automatic checkpoints they’re not.
Checkpoints
How long the uncommitted data is preserved in SQL server and where?
SQL Server will keep the uncommitted data in the data and log files unless and until the transaction is completed / rolled back.
Let's say we have database with defined transaction log initial size to 100MB and maxsize is UNLIMITED.
SQL Server will write into log sequentially from start to end. In one book I found next sentence:
When SQL Server reaches the end of the file as defined by the size
when it was set up, it will wrap around to the beginning again,
looking for free space to use. SQL Server can wrap around without
increasing the physical log file size when there is free virtual
transaction space. Virtual transaction log space becomes free when SQL
Server can write the data from the transaction log into the underlying
tables within the database.
Last part is really confusing to me. What last sentence means? Does it means that SQL Server overwrite old, committed transactions with new transactions?
As far as I know, that would not be the case, because, all transactions must be presented until backup is done.
I don't know if I was enough clear, I will updtae post if needed some explanations.
This only applies to SIMPLE transaction logging:
Virtual transaction log space becomes free when SQL Server can write the data from the transaction log into the underlying tables within the database.
This means, that once the transactions have actually been written to the physical tables, they are no longer needed in the transaction log. Because at this point, a power outage or another catastrophic failure can no longer cause the transactions to be "lost", as they have already been persisted to the disk.
No need to wait until a backup is done. However, if you need full point-in-time recovery, you would use FULL transaction logging, and in that case, no transaction logs will ever be overwritten.
The log records are no longer needed in the transaction log if all of the following are true:
The transaction of which it is part has committed.
The database pages it changed have all been written to disk by a checkpoint.
The log record is not needed for a backup (full, differential, or log).
The log record is not needed for any feature that reads the log (such as database mirroring or replication).
Further Reads,
https://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx
https://technet.microsoft.com/en-us/library/jj835093%28v=sql.110%29.aspx
My question relates to how are transactions dealt with in SQL Server.
Lets say I have a user defined transaction that contains a lot of data, where does that data get stored during that process? Its only committed to the database file if the transaction is successful but where does it reside beforehand.
Does that stay in memory of the program creating the transaction
to the SQL server memory process
is it written to the transaction log of the database?
temporary files or some other disk location?
Uncommitted data is written into the table(s) in question, it is simply not marked as committed until the transaction commits. If the transaction is rolled back the data will be overwritten the next time a write occurs that needs the affected page. Once the transaction commits, the data is in the table is committed and cannot be overwritten. The log contains an ongoing record of what is happening in the database so that transactions can be rolled backwards or forwards after a system crash, or if a transaction is rolled back.
SQL Server uses ARIES write-ahead logging. Details are described in How It Works: Bob Dorr's SQL Server I/O Presentation. Write-ahead logging requires every change made tot he data to eb described by a log record so that crash recovery can reconstruct the database. To rollback a transaction all that one has to do is to walk the log backward and generate a compensating action for every record generated by the transaction. This will in effect undo everything done by the transaction. Two phase locking will ensure that the compensating operation is always free to proceed.
See also Inside the SQL Server Transaction Log and, of course, ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging.
I read the following statement:
SQL Server doesn’t write data immediately to disk. It is kept in a
buffer cache until this cache is full or until SQL Server issues a
checkpoint, and then the data is written out. If a power failure
occurs while the cache is still filling up, then that data is lost.
Once the power comes back, though, SQL Server would start from its
last checkpoint state, and any updates after the last checkpoint that
were logged as successful transactions will be performed from the
transaction log.
And a couple of questions arise:
What if the power failure happens after SQL Server issues a
checkpoint and before the buffer cache is actuall written to
disk? Isn't the content in buffer cache permanently missing?
The transaction log is also stored as disk file, which is no
different from the actual database file. So how could we guarantee
the integrity of log file?
So, is it true that no real transaction ever exists? It's only a matter of probability.
The statement is correct in that data can be written to cache, but misses the vital point that SQL Server uses a technique called Write Ahead Logging (WAL). The writes to the log are not cached, and a transaction is only considered complete once the transaction records have been written to the log.
http://msdn.microsoft.com/en-us/library/ms186259.aspx
In the event of a failure, the log is replayed as you mention, but the situation regarding the data pages still being in memory and not written to disk does not matter, since the log of their modification is stored and can be retrieved.
It is not true that there is no real transaction, but if you are operating in simple logging mode then the ability to replay is not there.
For the integrity of the log file / same as the data file - a proper backup schedule and a proper restore testing schedule - do not just backup data / logs and assume they work.
What if the power failure happens after SQL Server issues a checkpoint and before the buffer cache is actuall written to disk? Isn't the content in buffer cache permanently missing?
The checkpoint start and end are different records on the transaction log.
The checkpoint is marked as succeeded only after the end of the checkpoint has been written into the log and the LSN of the oldest living transaction (including the checkpoint itself) is written into the database.
If the checkpoint fails to complete, the database is rolled back to the previous LSN, taking the data from the transaction log as necessary.
The transaction log is also stored as disk file, which is no different from the actual database file. So how could we guarantee the integrity of log file?
We couldn't. It's just the data are stored in two places rather than one.
If someone steals your server with both data and log files on it, your transactions are lost.