How in-memory databases persist data - database

I was looking in to the concept of in-memory databases. Articles about that says,
An in-memory database system is a database management system that stores data entirely in main memory.
and they discuss advantages and disadvantages of this concept.
My problem is if these database managements system that stores data entirely in main memory,
do all the data vanish after a power failure???
or are there ways to protect the data ???

Most in-memory database systems offer persistence, at least as an option. This is implemented through transaction logging. On normal shutdown, an in-memory database image is saved. When next re-opened, the previous saved image is loaded and thereafter, every transaction committed to the in-memory database is also appended to a transaction log file. If the system terminates abnormally, the database can be recovered by re-loading the original database image and replaying the transactions from the transaction log file.
The database is still all in-memory, and therefore there must be enough available system memory to store the entire database, which makes it different from a persistent database for which only a portion is cached in memory. Therefore, the unpredictability of a cache-hit or cache-miss is eliminated.
Appending the transaction to the log file can usually be done synchronously or asynchronously, which will have very different performance characteristics. Asynchronous transaction logging will still risk the possibility of losing committed transactions if they were not flushed from the file system buffers and the system is shutdown unexpectedly (i.e. a kernel panic).
In-memory database transaction logging is guaranteed to only ever incur one file I/O to append the transaction to the log file. It doesn't matter if the transaction is large or small, it's still just one write to the persistent media. Further, the writes are always sequential (always appending to the log file), so even on spinning media the performance hit is as small as it can be.
Different media will have greater or lesser impact on performance. HDD will have the greatest, followed by SSD, then memory-tier FLASH (e.g. FusionIO PCIExpress cards) and the least impact coming from NVDIMM memory.
NVDIMM memory can be used to store the in-memory database, or to store the transaction log for recovery. Maximum NVDIMM memory size is less than conventional memory size (and more expensive), but if your in-memory database is some gigabytes in size, this option can retain 100% of the performance of an in-memory database while also providing the same persistence as a conventional database on persistent media.
There are performance comparisons of an in-memory database with transaction logging to HDD, SSD and FusionIO in this whitepaper: http://www.automation.com/pdf_articles/mcobject/McObject_Fast_Durable_Data_Management.pdf
And with NVDIMM in this paper: http://www.odbms.org/wp-content/uploads/2014/06/IMDS-NVDIMM-paper.pdf
The papers were written by us (McObject), but are vendor-neutral.

Related

Are there any databases really durable?

I was reading http://www.h2database.com/html/advanced.html#durability_problems and i found
Some databases claim they can guarantee durability, but such claims are wrong. A durability test was run against H2, HSQLDB, PostgreSQL, and Derby. All of those databases sometimes lose committed transactions. The test is included in the H2 download, see org.h2.test.poweroff.Test
Also it says
Where losing transactions is not acceptable, a laptop or UPS (uninterruptible power supply) should be used.
So is there any database that is durable. The document says about fsync() command and most hard drives do not obey fsync(). It also talks about no reliable way to flush hard drive buffers
So, is there a time after which a committed transaction becomes durable, so we can buy ups that gives minimum that much backup of power supply.
Also is there a way to know that a transaction committed is durable. Suppose we don't buy ups and after knowing that a transaction is durable we can show success message.
The problem depends on whether or not you can instruct the HDD/SDD to commit transactions to durable media. If the mass storage device does not have the facility to flush to durable media, then no data storage system on top of it can be said to be truely durable.
There are plenty of NAS devices with built in UPS however and these seem to fit the requirement for durable media - if the database on a seperate server commits data to that device and does a checkpoint then the commits are flushed to the media. So long as the media survives a power outage then you can say its durable. The UPS on the NAS should be capable of issuing a controlled shutdown to its associated disk pack, guaranteeing permenance.
Alternatively you could use something like SQL Azure which writes commits to multiple (3) seperate database storage instances on different servers. Although we have no idea if those writes ever reach a permenant storage media, it doesnt actually matter - the measurement of durability is read-repeatability; and this seems to meet that requirement.

Possible to checkpoint a WAL file during a transaction?

We are performing quite large transactions on a SQLite database that is causing the WAL file to grow extremely large. (Sometimes up to 1GB for large transactions.) Is there a way to checkpoint the WAL file while in the middle of a transaction? When I try calling sqlite3_wal_checkpoint() or executing the WAL checkpoint PRAGMA statement, both return SQLITE_BUSY.
Not really. This is whole point of transactions: WAL (or journal file) keeps data that would become official once successfully committed. Until that happens, if anything goes wrong - program crash, computer reboot, etc, WAL or journal file allow to safely rollback (undo) uncommitted action. Moving only part of this uncommitted transaction would defeat the purpose.
Note that SQLite documentations defines check-pointing as moving the WAL file transactions back into the database. In other words, checkpointing moves one or more transactions from WAL, but not part of huge uncommitted transaction.
There are few possible solutions to your problem:
Avoid huge transactions - commit in smaller chunks if you can. Of course, this is not always possible, depending on your application.
Use old journaling mode with PRAGMA journal_mode=DELETE. It is slightly slower than new WAL mode (with PRAGMA journal_mode=WAL), but in my experience it tends to create much smaller journal files, and they get deleted when transaction successfully commits. For example, Android 4.x is still using old journaling mode - it tends to work faster on flash and does not create huge temporary or journal files.

Does the transaction log drive need to be as fast as the database drive?

We are telling our client to put a SQL Server database file (mdf), on a different physical drive than the transaction log file (ldf). The tech company (hired by our client) wanted to put the transaction log on a slower (e.g. cheaper) drive than the database drive, because with transaction logs, you are just sequencially writing to the log file.
I told them that I thought that the drive (actually a RAID configuration) needed to be on a fast drive as well, because every data changing call to the database, needs be saved there, as well as to the database itself.
After saying that though, I realized I was not entirely sure about that. Does the speed of the transaction log drive make a significant difference in performance... if the drive with the database is fast?
The speed of the log drive is the most critical factor for a write intensive database. No updates can occur faster than the log can be written, so your drive must support your maximum update rate experienced at a spike. And all updates generate log. Database file (MDF/NDF) updates can afford slower rates of write because of two factors
data updates are written out lazily and flushed on checkpoint. This means that an update spike can be amortized over the average drive throughput
multiple updates can accumulate on a single page and thus will need one single write
So you are right that the log throughput is critical.
But at the same time, log writes have a specific pattern of sequential writes: log is always appended at the end. All mechanical drives have a much higher throughput, for both reads and writes, for sequential operations, since they involve less physical movement of the disk heads. So is also true what your ops guys say that a slower drive can offer in fact sufficient throughput.
But all these come with some big warnings:
the slower drive (or RAID combination) must truly offer high sequential throughput
the drive must see log writes from one and only one database, and nothing else. Any other operation that could interfere with the current disk head position will damage your write throughput and result in slower database performance
the log must be only write, and not read. Keep in mind that certain components need to read from the log, and thus they will move the disk mechanics to other positions so they can read back the previously written log:
transactional replication
database mirroring
log backup
In simplistic terms, if you are talking about an OLTP database, your throughput is determined by the speed of your writes to the Transaction Log. Once this performance ceiling is hit, all other dependant actions must wait on the commit to log to complete.
This is a VERY simplistic take on the internals of the Transaction Log, to which entire books are dedicated, but the rudimentary point remains.
Now if the storage system you are working with can provide the IOPS that you require to support both your Transaction Log and Database data files together then a shared drive/LUN would provide adequately for your needs.
To provide you with a specific recommended course of action I would need to know more about your database workload and the performance you require your database server to deliver.
Get your hands on the title SQL Server 2008 Internals to get a thorough look into the internals of the SQL Server transaction log, it's one of the best SQL Server titles out there and it will pay for itself in minutes from the value you gain from reading.
Well, the transaction log is the main structure that provides ACID, can be a big bottleneck for performance, and if you do backups regularly its required space has an upper limit, so i would put it in a safe, fast drive with just space enough + a bit of margin.
The Transaction log should be on the fastest drives, if it just can complete the write to the log it can do the rest of the transaction in memory and let it hit disk later.

How does a large transaction log affect performance?

I have seen that our database has the full recovery model and has a 3GB transaction log.
As the log gets larger, how will this affect the performance of the database and the performance of the applications that accesses the database?
The recommended best practice is to assign a SQL Server Transaction log file its’ very own disk or LUN.
This is to avoid fragmentation of the transaction log file on disk, as other posters have mentioned, and to also avoid/minimise disk contention.
The ideal scenario is to have your DBA allocate sufficient log space for your database environment ahead of time i.e. to allocate say x GB of data in one go. On a dedicated disk this will create a contiguous allocation, thereby avoiding fragmentation.
If you need to grow your transaction log, again you should endeavour to do so in sizeable chunks in order to endeavour to allocate contiguously.
You should also look to NOT shrink your transaction log file as, repeated shrinking and auto growth can lead to fragmentation of the data file on disk.
I find it best to think of the autogrowth database property as a failsafe i.e. your DBA should proactively monitor transaction log space (perhaps by setting up alerts) so that they can increase the transaction log file size accordingly to support your database usage requirements but the autogrowth property can be in place to ensure that your database can continue to operate normally should unexpected growth occur.
A larger transaction log file in itself if not detrimental to performance as SQL server writes to the log sequentially, so provided you are managing your overall log size and allocation of additional space appropriately you should not be concerned.
In a couple of ways.
If your system is configured to auto-grow the transaction log, as the file gets bigger, your SQL server will need to do more work and you will potentially get slowed down. When you finally do run out of space, you're out of luck and your database will stop taking new transactions.
You need to get with your DBA (or maybe you are the DBA?) and perform frequent, periodic log backups. Save them off your server onto another dedicated backup system. As you back up the log, the space in your existing log file will be reclaimed, preventing the log from getting much bigger. Backing up the transaction log will also allow you to restore your database to a specific point in time after your last full or differential backup, which significantly cuts your data losses in the event of a server failure.
If the log gets fragmented or needs to grow it will slow down the application.
If you don't clear the transaction log periodically by performing backups, the log will get full and consume the entire available disk space.
If the log file growths larger in small steps you will end up with a lot of virtual log files
these virtual log files will slow down the database startups, restore and backup operations.
here is an article that shows how to fix this:
http://www.sqlserveroptimizer.com/2013/02/how-to-speed-up-sql-server-log-file-by-reducing-number-of-virtual-log-file/
For increasing performance it's nice to separate SQL server log file from sql server data file in order to optimize I/O efficiency.
The writing method to the data file is Random but SQL Server writes to the transaction log sequentially.
With Sequential I/O ,SQL Server can read/write data without re-positioning the disk head.

database autocommit - does it go directly to disk?

So I know that autocommit commits every sql statement, but do updates to the database go directly to the disk or do they remain on cache until flushed?
I realize it's dependent on the database implementation.
Does auto-commit mean
a) every statement is a complete transaction AND it goes straight to disk or
b) every statement is a complete transaction and it may go to cache where it will be flushed later or it may go straight to disk
Clarification would be great.
Auto-commit simply means that each statement is in its own transaction which commits immediately. This is in contrast to the "normal" mode, where you must explicitly BEGIN a transaction and then COMMIT once you are done (usually after several statements).
The phrase "auto-commit" has nothing to do with disk access or caching. As an implementation detail, most databases will write to disk on commit so as to avoid data loss, but this isn't mandatory in the spec.
For ARIES-based protocols, committing a transaction involves logging all modifications made within that transaction. Changes are flushed immediately to logfile, but not necessarily to datafile (that is dependent on the implementation). That is enough to ensure that the changes can be recovered in the event of a failure. So, (b).
Commit provides no guarantee that something has been written to disk, only that your transaction has been completed and the changes are now visible to other users.
Permanent does not necessarily mean written to disk (i.e. durable)... Even if a "commit" waits for the transaction to complete can be configured with some databases.
For example, Oracle 10gR2 has several commit modes, including IMMEDIATE,WAIT,BATCH,NOWAIT. BATCH will queue the buffer the changes and the writer will write the changes to disk at some future time. NOWAIT will return immediately without regard for I/O.
The exact behavior of commmit is very database specific and can often be configured depending on your tolerance for data loss.
It depends on the DBMS you're using. For example, Firebird has it as an option in configuration file. If you turn Forced Writes on, the changes go directly to the disk. Otherwise they are submitted to the filesystem, and the actual write time depends on the operating system caching.
If the database transaction is claimed to be ACID, then the D (durability) mandates that the transaction committed should survive the crash immediately after the successful commit. For single server database, that means it's on the disk (disk commit). For some modern multi-server databases, it can also means that the transaction is sent to one or more servers (network commit, which are typically much faster than disk), under the assumption that the probability of multiple server crash at the same time is much smaller.
It's impossible to guarantee that commits are atomic, so modern databases use two-phase or three phase commit strategies. See Atomic Commit

Resources