Config Redis to use disk automatically - database

I need to setup a caching system which is able to cache some GBs of data without consuming too much RAM. Is there a wat to configure Redis to use disk storage automatically when reaching X RAM usage? If not, is there any in-memory database in which this is possible?

If you need a caching system, why do you want to spill over to disk and not just remove (probably old and not used) entries from the cache?
Redis has various strategies when memory reaches some limits...
Please read Redis documentation about memory configuration and eviction policies
Bottom line in redis.conf its possible to configure memory limits and a policy of caching (probably volatile-lru or allkey-lru will help here)

No , You can't .
Redis store all data in memory , all disk storages are just for restore .
Maybe you can try other k-v store , eg : RocksDB 、LevelDB 。

Related

Redis MAXMEMORY management volatile-lru vs allkeys-lru

I am using redis as a datastore rather than cache, but there is a maxmemory limit set, In my understanding the maxmemory specifies the RAM that redis can use, should it not swap the data back to disk once the memory limit is reached.
I have a mixture of keys while some have their expiry set and others don't
I have tried both volatile-lru and allkeys-lru, as specified in the documentation both remove the old keys based on the property.
What configuration should I use to avoid data loss? Should I set an expiry on all keys and use volatile-lru? What am I missing?
Swapping out memory to disk (virtual memory) was deprecated/deleted in Redis 2.4/2.6. Most likely, you are not using such an old version.
You control what Redis does when memory is exhausted with maxmemory and maxmemory-policy. Both are settings in redis.conf. Take a look. Swapping memory out to disk is not an option in recent Redis versions.
If Redis can't remove keys according to the policy, or if the policy is
set to 'noeviction', Redis will start to reply with errors to commands
that would use more memory, like SET, LPUSH, and so on, and will continue
to reply to read-only commands like GET.
If maxmemory is reached, you lose data only if the eviction policy set in maxmemory-policy indicates Redis to evict some keys and how to select these keys (volatile or all, lfu/lru/ttl/random). Otherwise, Redis start rejecting write commands to preserve the data already in memory. Read commands continue to be served.
You can run Redis without a maxmemory setting (default), so it will continue using up memory until the OS memory is exhausted.
If your operating system has virtual memory enabled, and the maxmemory setting allows Redis to go over the physical memory available, then your OS (not Redis) starts to swap out memory to disk. You can expect a performance drop then.
In general as a rule of thumb:
Use the allkeys-lru policy when you expect a power-law distribution in
the popularity of your requests, that is, you expect that a subset of
elements will be accessed far more often than the rest. This is a good
pick if you are unsure.
Use the allkeys-random if you have a cyclic
access where all the keys are scanned continuously, or when you expect
the distribution to be uniform (all elements likely accessed with the
same probability).
Use the volatile-ttl if you want to be able to
provide hints to Redis about what are good candidate for expiration by
using different TTL values when you create your cache objects.
The volatile-lru and volatile-random policies are mainly useful when you want to use a single instance for both caching and to have a set of persistent keys. However it is usually a better idea to run two Redis instances to solve such a problem.
As given in the documentation
Using Redis as an LRU cache
Do not set that param if you're use redis as a datastore, it is used for cache scenario.

Explain analyze buffers - Does it give OS Cache as well

When we are doing explain (analyze,buffers) the query we get results and shows how much data comes from the cache and how much comes from disk.
But there are two layers in postgres, one is the OS cache and the shared buffers itself.Does the query plan shows the cache from shared_buffers or OS cache or both ?
There are extensions to see them individually i.e pgfincore and pg_buffer_cache, but what data I see in the query plan? Does it belong to shared_buffers/OS cache or both of them just combined ?
Postgres only controls and knows about its own cache. It can't know about the cache management of the operating system.
Does it belong to shared_buffers/OS cache or both of them just combined?
Those figures only relate to shared_buffers, not the cache of the operating system.

High performance persistent key value store for huge amount of records

The scenario is about 1 billion records. Each record has 1kb data size and is store in SSD.
Which kv store can provide best random read performance? It need to reduce disk access to only 1 time per query and all of the data index will be stored in memory.
Redis is fast but it's too expensive to store 1 TB data in memory.
LevelDB reads disk several times per query.
The closest one I found is fatcache but it's not persistent. It's an SSD-backed memcached.
Any suggestions?
RocksDB might be the choice for you, which is optimized for fast storage like memory and flash-disk, and its highly customizable. If your application is read-only after initial bulk-load, then you can config RocksDB to compact everything in one single big file. In that way, reads are guaranteed to have at most single I/O. However, if your application handles both reads and writes, then in order to have at most one I/O per read, you will need to sacrifice the write performance as you need to config rocksdb to compact very often, and that hurts write performance.
Tuning guide for RocksDB can also be found here.
You may want to try RocksDB, it's a facebook library which optimized for SSD storage. You can also try Ardb, it's a redis protocol compatible NoSQL DB build on RockDB/LevelDB/LMDB.
Have you looked at aerospike ? I haven't use it, but they claim to have good performances on SSD.
LMDB is faster than RocksDB and uses 1/3rd as much memory. Also LMDb requires no tuning; RocksDB requires careful tuning of over 40 parameters to get performance that approaches LMDB's.
http://www.lmdb.tech/bench/inmem/scaling.html
Also LMDB is fully transactional and 100% crash-proof, RocksDB is neither.

How to cap memory usage by Extensible Storage Engine (JetBlue)?

I have an app that every so often hits a ESE database quite hard and then stops for a long time. After hitting the database memory usage goes way up (over 150MB) and stays high. I'm assuming ESE has lots of cached data.
Is there a way to cap the memory usage by ESE? I'm happy to suffer any perf hit
the only way I've seen to drop the memory usage is to close the DB
You can control the database cache size by setting the database cache size system parameter (JET_paramCacheSize). That number can be changed on-the-fly.
You might not need to set it though: by default ESENT will manage its cache size automatically by looking at available system memory, system paging and database I/O load. If you have hundreds of MB of free memory then ESENT won't see any reason to reduce the cache size. On the other hand. if you start using the memory on your system you should find that ESENT will automatically reduce the size of the database cache in your application. You can set the limits for automatic cache sizing with the JET_paramCacheSizeMin and JET_paramCacheSizeMax parameters.
Documentation link for the sytem parameters: http://msdn.microsoft.com/en-us/library/ms683044.aspx

What is happening to such distributed in-memory cloud databases as Hazelcast and Scalris if there is more Data to store than RAM in the cluster?

What is happening to such distributed in-memory cloud databases as
Hazelcast
Scalaris
if there is more Data to store than RAM in the cluster?
Are they going to Swap? What if the Swap space is full?
I can't see a disaster recovery strategy at both databases! Maybe all data is lost if the memory is full?
Is there a availability to write things down to the hard-disk for memory issues?
Are there other databases out there, which offer the same functionality as Hazelcast or Scalaris with backup features / hdd-storage / disaster recovery?
I don't know what the state of affairs was when the accepted answer by Martin K. was published, but Scalaris FAQ now claims that this is supported.
Can I store more data in Scalaris than ram+swapspace is available in the cluster?
Yes. We have several database
backends, e.g. src/db_ets.erl (ets)
and src/db_tcerl (tokyocabinet). The
former uses the main memory for
storing data, while the latter uses
tokyocabinet for storing data on disk.
With tokycoabinet, only your local
disks should limit the total size of
your database. Note however, that this
still does not provide persistency.
For instructions on switching the
database backend to tokyocabinet see
Tokyocabinet.
Regarding to the teams of Hazelcast and Scalaris, they say both, that writing more Data than RAM is available isn't supported.
The Hazlecast team is going to write a flatfile store in the near future.

Resources