Possible to checkpoint a WAL file during a transaction? - c

We are performing quite large transactions on a SQLite database that is causing the WAL file to grow extremely large. (Sometimes up to 1GB for large transactions.) Is there a way to checkpoint the WAL file while in the middle of a transaction? When I try calling sqlite3_wal_checkpoint() or executing the WAL checkpoint PRAGMA statement, both return SQLITE_BUSY.

Not really. This is whole point of transactions: WAL (or journal file) keeps data that would become official once successfully committed. Until that happens, if anything goes wrong - program crash, computer reboot, etc, WAL or journal file allow to safely rollback (undo) uncommitted action. Moving only part of this uncommitted transaction would defeat the purpose.
Note that SQLite documentations defines check-pointing as moving the WAL file transactions back into the database. In other words, checkpointing moves one or more transactions from WAL, but not part of huge uncommitted transaction.
There are few possible solutions to your problem:
Avoid huge transactions - commit in smaller chunks if you can. Of course, this is not always possible, depending on your application.
Use old journaling mode with PRAGMA journal_mode=DELETE. It is slightly slower than new WAL mode (with PRAGMA journal_mode=WAL), but in my experience it tends to create much smaller journal files, and they get deleted when transaction successfully commits. For example, Android 4.x is still using old journaling mode - it tends to work faster on flash and does not create huge temporary or journal files.

Related

Doesn't batch writing WAL files in databases negate the purporse of WAL files?

I am reading about databases and I can't understand one thing about WAL files. They exist in order to make sure transactions are reliable and recoverable, however, apparently, to improve performance, WAL files are written in batches instead of immediately. This looks to me quite contradictory and negates the purpose of WAL files. What happens if there's a crash between WAL commits? How does this differ from not having the WAL at all and simply fsync'ing the database itself periodically?
I've no much idea and just seeked for information about this as it seems interesting to me.
If some ninja find my explanation incorrect please, correct me. What I understand at this point is that WAL files are written before the commit, then once confirmed that the transaction data is on the WAL, it confirms the transaction.
What is done in batch is to move this WAL data to heap and index, real tables.
Write-Ahead Logging (WAL) is a standard method for ensuring data integrity. A detailed description can be found in most (if not all) books about transaction processing. Briefly, WAL's central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged, that is, after log records describing the changes have been flushed to permanent storage. If we follow this procedure, we do not need to flush data pages to disk on every transaction commit, because we know that in the event of a crash we will be able to recover the database using the log: any changes that have not been applied to the data pages can be redone from the log records. (This is roll-forward recovery, also known as REDO.)
https://www.postgresql.org/docs/current/wal-intro.html

SQL Server ROLLBACK transaction took forever. Why?

We have a huge DML script, that opens up a transaction and performs a lot of changes and only then it commits.
So recently, I had triggered this scripts (through an app), and as it was taking quite an amount of time, I had killed the session, which triggered a ROLLBACK.
So the problem is that this ROLLBACK took forever and moreover it was hogging a lot of CPU (100% utilization), and as I was monitoring this session (using exec DMVs), I saw a lot of waits that are IO related (IO_COMPLETION, PAGE_IO_LATCH etc).
So my question is:
1. WHy does a rollback take some much amount of time? Is it because it needs to write every revert change to the LOG file? And the IO waits I saw could be related to IO operation against this LOG file?
2. Are there any online resources that I can find, that explains how ROLLBACK mechanism works?
Thank You
Based on another article on the DBA side of SO, ROLLBACKs are slower for at least two reasons: the original SQL is capable of being multithreaded, where the rollback is single-threaded, and two, a commit confirms work that is already complete, where the rollback not only must identify the log action to reverse, but then target the impacted row.
https://dba.stackexchange.com/questions/5233/is-rollback-a-fast-operation
This is what I have found out about why a ROLLBACK operation in SQL Server could be time-consuming and as to why it could produce a lot of IO.
Background Knowledge (Open Tran/Log mechanism):
When a lot of changes to the DB are being written as part of an open transaction, these changes modify the data pages in memory (dirty pages) and log records (into a structure called LOG BLOCKS) generated are initially written to the buffer pool (In Memory). These dirty pages are flushed to the disk either by a recurring Checkpoint operation or a lazy-write process. In accordance with the write-ahead logging mechanism of the SQL Server, before the dirty pages are flushed the LOG RECORDS describing these changes needs to be flushed to the disk as well.
Keeping this background knowledge in mind, now when a transaction is rolled back, this is almost like a recovery operation, where all the changes that are written to the disk, have to be undone. So, the heavy IO we were experiencing might have happened because of this, as there were lots of data changes that had to be undone.
Information Source: https://app.pluralsight.com/library/courses/sqlserver-logging/table-of-contents
This course has a very deep and detailed explanation of how logging recovery works in SQL Server.

How in-memory databases persist data

I was looking in to the concept of in-memory databases. Articles about that says,
An in-memory database system is a database management system that stores data entirely in main memory.
and they discuss advantages and disadvantages of this concept.
My problem is if these database managements system that stores data entirely in main memory,
do all the data vanish after a power failure???
or are there ways to protect the data ???
Most in-memory database systems offer persistence, at least as an option. This is implemented through transaction logging. On normal shutdown, an in-memory database image is saved. When next re-opened, the previous saved image is loaded and thereafter, every transaction committed to the in-memory database is also appended to a transaction log file. If the system terminates abnormally, the database can be recovered by re-loading the original database image and replaying the transactions from the transaction log file.
The database is still all in-memory, and therefore there must be enough available system memory to store the entire database, which makes it different from a persistent database for which only a portion is cached in memory. Therefore, the unpredictability of a cache-hit or cache-miss is eliminated.
Appending the transaction to the log file can usually be done synchronously or asynchronously, which will have very different performance characteristics. Asynchronous transaction logging will still risk the possibility of losing committed transactions if they were not flushed from the file system buffers and the system is shutdown unexpectedly (i.e. a kernel panic).
In-memory database transaction logging is guaranteed to only ever incur one file I/O to append the transaction to the log file. It doesn't matter if the transaction is large or small, it's still just one write to the persistent media. Further, the writes are always sequential (always appending to the log file), so even on spinning media the performance hit is as small as it can be.
Different media will have greater or lesser impact on performance. HDD will have the greatest, followed by SSD, then memory-tier FLASH (e.g. FusionIO PCIExpress cards) and the least impact coming from NVDIMM memory.
NVDIMM memory can be used to store the in-memory database, or to store the transaction log for recovery. Maximum NVDIMM memory size is less than conventional memory size (and more expensive), but if your in-memory database is some gigabytes in size, this option can retain 100% of the performance of an in-memory database while also providing the same persistence as a conventional database on persistent media.
There are performance comparisons of an in-memory database with transaction logging to HDD, SSD and FusionIO in this whitepaper: http://www.automation.com/pdf_articles/mcobject/McObject_Fast_Durable_Data_Management.pdf
And with NVDIMM in this paper: http://www.odbms.org/wp-content/uploads/2014/06/IMDS-NVDIMM-paper.pdf
The papers were written by us (McObject), but are vendor-neutral.

Does the transaction log drive need to be as fast as the database drive?

We are telling our client to put a SQL Server database file (mdf), on a different physical drive than the transaction log file (ldf). The tech company (hired by our client) wanted to put the transaction log on a slower (e.g. cheaper) drive than the database drive, because with transaction logs, you are just sequencially writing to the log file.
I told them that I thought that the drive (actually a RAID configuration) needed to be on a fast drive as well, because every data changing call to the database, needs be saved there, as well as to the database itself.
After saying that though, I realized I was not entirely sure about that. Does the speed of the transaction log drive make a significant difference in performance... if the drive with the database is fast?
The speed of the log drive is the most critical factor for a write intensive database. No updates can occur faster than the log can be written, so your drive must support your maximum update rate experienced at a spike. And all updates generate log. Database file (MDF/NDF) updates can afford slower rates of write because of two factors
data updates are written out lazily and flushed on checkpoint. This means that an update spike can be amortized over the average drive throughput
multiple updates can accumulate on a single page and thus will need one single write
So you are right that the log throughput is critical.
But at the same time, log writes have a specific pattern of sequential writes: log is always appended at the end. All mechanical drives have a much higher throughput, for both reads and writes, for sequential operations, since they involve less physical movement of the disk heads. So is also true what your ops guys say that a slower drive can offer in fact sufficient throughput.
But all these come with some big warnings:
the slower drive (or RAID combination) must truly offer high sequential throughput
the drive must see log writes from one and only one database, and nothing else. Any other operation that could interfere with the current disk head position will damage your write throughput and result in slower database performance
the log must be only write, and not read. Keep in mind that certain components need to read from the log, and thus they will move the disk mechanics to other positions so they can read back the previously written log:
transactional replication
database mirroring
log backup
In simplistic terms, if you are talking about an OLTP database, your throughput is determined by the speed of your writes to the Transaction Log. Once this performance ceiling is hit, all other dependant actions must wait on the commit to log to complete.
This is a VERY simplistic take on the internals of the Transaction Log, to which entire books are dedicated, but the rudimentary point remains.
Now if the storage system you are working with can provide the IOPS that you require to support both your Transaction Log and Database data files together then a shared drive/LUN would provide adequately for your needs.
To provide you with a specific recommended course of action I would need to know more about your database workload and the performance you require your database server to deliver.
Get your hands on the title SQL Server 2008 Internals to get a thorough look into the internals of the SQL Server transaction log, it's one of the best SQL Server titles out there and it will pay for itself in minutes from the value you gain from reading.
Well, the transaction log is the main structure that provides ACID, can be a big bottleneck for performance, and if you do backups regularly its required space has an upper limit, so i would put it in a safe, fast drive with just space enough + a bit of margin.
The Transaction log should be on the fastest drives, if it just can complete the write to the log it can do the rest of the transaction in memory and let it hit disk later.

database autocommit - does it go directly to disk?

So I know that autocommit commits every sql statement, but do updates to the database go directly to the disk or do they remain on cache until flushed?
I realize it's dependent on the database implementation.
Does auto-commit mean
a) every statement is a complete transaction AND it goes straight to disk or
b) every statement is a complete transaction and it may go to cache where it will be flushed later or it may go straight to disk
Clarification would be great.
Auto-commit simply means that each statement is in its own transaction which commits immediately. This is in contrast to the "normal" mode, where you must explicitly BEGIN a transaction and then COMMIT once you are done (usually after several statements).
The phrase "auto-commit" has nothing to do with disk access or caching. As an implementation detail, most databases will write to disk on commit so as to avoid data loss, but this isn't mandatory in the spec.
For ARIES-based protocols, committing a transaction involves logging all modifications made within that transaction. Changes are flushed immediately to logfile, but not necessarily to datafile (that is dependent on the implementation). That is enough to ensure that the changes can be recovered in the event of a failure. So, (b).
Commit provides no guarantee that something has been written to disk, only that your transaction has been completed and the changes are now visible to other users.
Permanent does not necessarily mean written to disk (i.e. durable)... Even if a "commit" waits for the transaction to complete can be configured with some databases.
For example, Oracle 10gR2 has several commit modes, including IMMEDIATE,WAIT,BATCH,NOWAIT. BATCH will queue the buffer the changes and the writer will write the changes to disk at some future time. NOWAIT will return immediately without regard for I/O.
The exact behavior of commmit is very database specific and can often be configured depending on your tolerance for data loss.
It depends on the DBMS you're using. For example, Firebird has it as an option in configuration file. If you turn Forced Writes on, the changes go directly to the disk. Otherwise they are submitted to the filesystem, and the actual write time depends on the operating system caching.
If the database transaction is claimed to be ACID, then the D (durability) mandates that the transaction committed should survive the crash immediately after the successful commit. For single server database, that means it's on the disk (disk commit). For some modern multi-server databases, it can also means that the transaction is sent to one or more servers (network commit, which are typically much faster than disk), under the assumption that the probability of multiple server crash at the same time is much smaller.
It's impossible to guarantee that commits are atomic, so modern databases use two-phase or three phase commit strategies. See Atomic Commit

Resources