How In-Memory databases avoid use of Virtual Memory - database

Since the memory is managed by the OS, how an In-Memory Database process avoid its pages in physical memory of being moved to the virtual memory at the disk?

On some systems, it is possible to pin pages in memory, but this is discouraged - you are defeating the operating system's virtual memory manager, which might benefit the IMDS but be detrimental to overall system performance.
Our (McObject) recommendation is to ensure that you have enough physical memory so that the operating system does not swap in-memory database pages to the swap space.
If it's not possible to ensure that you have enough physical memory, then you're better off creating a conventional persistent database, and creating as large of a database cache with the DBMS' facility as you can (again, within the constraints of physical memory), and allow the DBMS to move pages into and out of it's own cache. It will do so more intelligently than than the operating system.

Related

Are the contents in a LMDB always stored BOTH in disk AND memory?

I'm wanting to use the rust implementation of LMDB but I can't seem to find whether it would always maintain a copy of whats in memory also in the disk and viceversa.
My reasoning would be that the DB will have some cache and any overflows would be written to disk but since I can't find it in the documentation I'm not sure.
The other case would be that LMDB maps its contents on disk to memory and that would work for small quantities but not for what I have in mind.
Does anyone have an idea on how LMDB works on this regard?
Does anyone have an idea on how LMDB works on this regard?
If you are worried about not being able to operate on a dataset that does not fit in memory, you are ok - LMDB does handle that. Your dataset is not limited to the size of ram.
LMDB is memory-mapped which is a technique that allows developers to read/write data on disk 'like' it is in memory - the OS does all the heavy lifting required
LMDB always stored BOTH in disk AND memory?
The entire dataset is on disk. Some parts of it are in memory. When the parts that are not in memory are needed - the OS fetches it from disk and gives it to the application by putting it in the process' memory
The other case would be that LMDB maps its contents on disk to memory and that would work for small quantities but not for what I have in mind.
Yes, that is it.
I read RocksDB support the usage you are looking while offering similar guarantees.
Small quantities depends on the available RAM. Major vendors including mongodb with wiredtiger backend, but not only, postgresql comes also to mind, that highly recommend to have as much memory as the working dataset.
In an opposite, you can find database systems such as rust full text search engine using an cold (offline?) object storage.

Cassandra Vs ScyllaDB Memory Usage

I am doing performance comparisons of ScyllaDB and Cassandra, specifically looking at the impact of memory. The machines I am using each have 16GB and 8 cores.
Based on the docs, Cassandra will default to 4GB Xmx and use the remaining 12GB as file system cache.
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsTuneJVM.html
ScyllaDB instead will use all 16GB for itself.
http://docs.scylladb.com/faq/#scylla-is-using-all-of-my-memory-why-is-that-what-if-the-server-runs-out-of-memory
What I'm wondering is if this is a fair comparison setup (4GB Xmx for Cassandra vs 16GB for Scylla)? I realize this is what each recommend, but would a more fair test be 8GB Xmx for Cassandra and --memory 8G for ScyllaDB? My workload is mostly write intensive and I don't expect file system caching to always be able to help Cassandra. It's odd to me that ScyllaDB does not expect almost any file system caching compared to Cassandra's huge reliance on it.
Cassandra will always use all of the system memory; the heap size (-Xmx) setting just determines how much is used by the heap and how much by other memory consumers (off-heap structures and the page cache). So if you limit Scylla's memory usage, it will be at a disadvantage compared to Cassandra.
Scylla will use ~1/2 of the memory for MemTable, and the other half for Key/Partition caching.
If your workload is mostly write, more memory will have less of effect on performance, and should be bounded by either I/O or CPU.
I would recommend reading:
http://www.scylladb.com/2017/10/05/io-access-methods-scylla/
To understand the way Scylla is writing information.
And
http://www.scylladb.com/2016/12/15/sswc-part1/
To understand the way Scylla is balancing I/O workloads

Storage capacity of in-memory database?

Is storage capacity of in-memory database limited to size of RAM? If yes, is there any ways to increase its capacity except for increasing RAM size. If no, please give some explanations.
As previously mentioned, in-memory storage capacity is limited by the addressable memory, not by the amount of physical memory in the system. Simon was also correct that the OS will swap memory to the page file, but you really want to avoid that. In the context of the DBMS, the OS will do a worse job of it than if you simply used a persistent database with as large of a cache as you have physical memory to support. IOW, the DBMS will manage its cache more intelligently than the OS would manage paged memory containing in-memory database content.
On a 32 bit system, each process is limited to a total of 3GB of RAM, whether you have 3GB physically or 512MB. If you have more data (including the in-mem DB) and code then will fit into physical RAM then the Page file on disc is used to swap out memory that is currently not being used. Swapping does slow everything down though. There are some tricks you can use for extending that: Memory-mapped files, /3GB switch; but these are not easy to implement.
On 64 bit machines, a processes memory limitation is huge - I forget what it is but it's up in the TB range.
VoltDB is an in-memory SQL database that runs on a cluster of 64-bit Linux servers. It has high performance durability to disk for recovery purposes, but tables, indexes and materialized views are stored 100% in-memory. A VoltDB cluster can be expanded on the fly to increase the overall available RAM and throughput capacity without any down time. In a high-availability configuration, individual nodes can also be stopped to perform maintenance such as increasing the server's RAM, and then rejoined to the cluster without any down time.
The design of VoltDB, led by Michael Stonebraker, was for a no-compromise approach to performance and scalability of OLTP transaction processing workloads with full ACID guarantees. Today these workloads are often described as Fast Data. By using main memory, and single-threaded SQL execution code distributed for parallel processing by core, the data can be accessed as fast as possible in order to minimize the execution time of transactions.
There are in-memory solutions that can work with data sets larger than RAM. Of course, this is accomplished by adding some operations on disk. Tarantool's Vinyl, for example, can work with data sets that are 10 to 1000 times the size of available RAM. Like other databases of recent vintage such as RocksDB and Bigtable, Vinyl's write algorithm uses LSM trees instead of B trees, which helps with its speed.

How do in-memory databases provide durability?

More specifically, are there any databases that don't require secondary storage (e.g. HDD) to provide durability?
Note:This is a follow up of my earlier question.
If you want persistence of transations writing to persistent storage is only real option (you perhaps do not want to build many clusters with independent power supplies in independent data centers and still pray that they never fail simultaneously). On the other hand it depends on how valuable your data is. If it is dispensable then pure in-memory DB with sufficient replication may be appropriate. BTW even HDD may fail after you stored your data on it so here is no ideal solution. You may look at http://www.julianbrowne.com/article/viewer/brewers-cap-theorem to choose replication tradeoffs.
Prevayler http://prevayler.org/ is an example of in-memory system backed up with persistent storage (and the code is extremely simple BTW). Durability is provided via transaction logs that are persisted on appropriate device (e.g. HDD or SSD). Each transaction that modifies data is written into log and the log is used to restore DB state after power failure or database/system restart. Aside from Prevayler I have seen similar scheme used to persist message queues.
This is indeed similar to how "classic" RDBMS works except that logs are only data written to underlying storage. The logs can be used for replication also so you may send one copy of log to a live replica other one to HDD. Various combinations are possible of course.
All databases require non-volatile storage to ensure durability. The memory image does not provide a durable storage medium. Very shortly after you loose power your memory image becomes invalid. Likewise, as soon as the database process terminates, the operating system will release the memory containing the in-memory image. In either case, you loose your database contents.
Until any changes have been written to non-volatile memory, they are not truely durable. This may consist of either writing all the data changes to disk, or writing a journal of the change being done.
In space or size critical instances non-volatile memory such as flash could be substituted for a HDD. However, flash is reported to have issues with the number of write cycles that can be written.
Having reviewed your previous post, multi-server replication would work as long as you can keep that last server running. As soon as it goes down, you loose your queue. However, there are a number of alternatives to Oracle which could be considered.
PDAs often use battery backed up memory to store their databases. These databases are non-durable once the battery runs down. Backups are important.
In-memory means all the data is stored in memory for it to be accessed. When data is read, it can either be read from the disk or from memory. In case of in-memory databases, it's always retrieved from memory. However, if the server is turned off suddenly, the data will be lost. Hence, in-memory databases are said to lack support for the durability part of ACID. However, many databases implement different techniques to achieve durability. This techniques are listed below.
Snapshotting - Record the state of the database at a given moment in time. In case of Redis the data is persisted to the disk after every two seconds for durability.
Transaction Logging - Changes to the database are recorded in a journal file, which facilitates automatic recovery.
Use of NVRAM usually in the form of static RAM backed up by battery power. In this case data can be recovered after reboot from its last consistent state.
classic in memory database can't provide classic durability, but depending on what your requirements are you can:
use memcached (or similar) to storing in memory across enough nodes that it's unlikely that the data is lost
store your oracle database on a SAN based filesystem, you can give it enough RAM (say 3GB) that the whole database is in RAM, and so disk seek access never stores your application down. The SAN then takes care of delayed writeback of the cache contents to disk. This is a very expensive option, but it is common in places where high performance and high availability are needed and they can afford it.
if you can't afford a SAN, mount a ram disk and install your database on there, then use DB level replication (like logshipping) to provide failover.
Any reason why you don't want to use persistent storage?

Performance scenario RAM Disk and In memory database(IMDB)?

I was just wondering, we have in memory database(IMDB) and we also have a way to put the database in a RAM Disk. So which would be faster? You valuable comments and experiences
Wikipedia - Computer Data Storage
Latency
The time it takes to access a particular location in storage. The relevant unit of measurement is typically nanosecond for primary storage, millisecond for secondary storage
It really depends on the hardware architecture. However internal memory is almost always the fastest way of storing and retrieving data, unless you have a specialized main board.

Resources