Redis: Database Size to Memory Ratio? - database

What is Redis's database size to memory ratio?
For instance, if I have an 80MB database, how much RAM will Redis use (when used with a normal web app)?

Redis will use a bit more RAM than disk. The dumpfile format is probably a bit more densely packed. This is some numbers from a real production system (a 64 bit EC2 large instance running Redis 2.0.4 on Ubuntu 10.04):
$ redis-cli info | grep used_memory_human
$ du -sh /mnt/data/redis/dump.rdb
950M /mnt/data/redis/dump.rdb
As you can see, the dumpfile is a few hundred megs smaller than the memory usage.
In the end it depends on what you store in the database. I have mainly hashes in mine, with only a few (perhaps less than 1%) sets. None of the keys contain very large objects, the average object size is 889 bytes.

Redis databases are stored in memory, so an 80mb database would take up 80mb in ram.
Redis is an extremely low memory using program, and you can see that from this example from the website "1 Million keys with the key being the natural numbers from 0 to 999999 and the string "Hello World" as value uses 100MB [of Ram]". My Redis app uses around 300kb to 500kb of ram, so you would need a lot of data to reach a database of 80mb. Redis also saves to disk snapshots of the database, so 80mb in ram and 80mb on the hard drive.


Scaling database storage capacity horizontally

Here's my question:
Database usually saves data into disk, may it be SQL or NoSQL database this is the most common. However in a cloud environment, machines typically are shipped with just an ample amount of storage, and as the application is used this can get used up and will become a problem, and although vertically scaling the storage (adding more disks, e.g. mounts) I understand in my case that vertical scaling is not a long term solution.
What is the best scale out solution for databases?
Given for example when ordering cloud machines you get typically just enough disk for each machine, 50GB for instance.
So if we're targeting 1TB minimum capacity for we will need to run like 20 machines? for 10TB capacity like ten times more machine?
How do you from day one make use of a scalable database without worrying on running out of disk space? (I mean given that you can spin more machines, if needed, using a dashboard from a cloud provider)

rclone slow transfer from bucket to filesystem

Im using rclone to tranfer data between a minio bucket and a shared storage. Im migrating a store and The amount of data is around 200GB of product pictures. Every single picture have his own folder/path. So there are a lot of folders that needs to create to. Rclone is installed on the new server and the storage is connected to the server via san. The transfer is running over a week and we are at 170GB right now. Everything works fine but it is really slow in my opinion. Is it normal that a transfer out of a bucket into a classic filesystem is that slow?
(Doing the math, the speed is only 2.3Mbps. I am honestly not going to pay anything for that speed.)
Perhaps you should break down the issue and diagnose part by part. Below are several common places to look out for slow transfer (generally speaking for any file transfer):
First of all, network and file systems are usually not performant with lots of small files, so to isolate the issue, upload a bigger file to minio first (1GB+). And for each step, test with big file first.
Is the speed of the source fast enough? Try copying the files from minio to a local storage or Ramdisk (/tmp is usually tmpfs and in turn stored in RAM, use mount to check).
Is the speed of the destination fast enough? Try dd or other disk performance testing utility.
Is the network latency to source high? Try pinging or curling the API (with timing)
Is the network latency to destination high? Try iostat
Maybe the CPU is the bottleneck? As encoding and decoding stuff takes quite a lot of computing power. Try top when a copy is running.
Again, try these steps with the big file and fragmented file separately. The is quite a chance that fragmented files is an issue. If that is the case, I would try to look for concurrency option in rclone.
I had the same problem copying hundreds of thousands of small files from a S3-compatible storage to a local storage. Originally I was using s3fs+rsync. Very (very) slow, and it was getting stuck on the largest folders. Then I discovered rclone, and finished the migration within a few hours with these parameters:
rclone copy source:/bucket /destination/folder --checkers 256 --transfers 256 --fast-list --size-only --progress
Explanation of the options (from
--checkers 256 Number of checkers to run in parallel (default 8)
--transfers 256 Number of file transfers to run in parallel (default 4)
--fast-list Use recursive list if available; uses more memory but fewer transactions
--size-only Skip based on size only, not mod-time or checksum (wouldn't apply in your case if copying to an empty destination)
--progress Show progress during transfer

Whole Oracle database in memory

Suppose I have an Oracle database whose data files are 256 GB in size. Is it a good idea to use a server with, say, 384 GB RAM in order to host the entire database in RAM?
Is there any difference if you only have, say, 128 GB RAM?
I'm talking about caching and Oracle inner workings, not memory based filesystem. Suppose OLTP, and a 100 GB working set.
Assuming you are talking about Oracle using the memory for caching and other processes and not a memory based filesystem (which is an awful idea)... more memory is almost always better than less memory.
The real world answer is it depends. If your working set of data is a few GB or less then the extra memory wouldn't help as much.
How much memory you need and when extra memory stops helping depends on your application and what style of DB (OLTP,DSS) and there is no simple yes/no answer.
Use the views V$SGA_TARGET_ADVICE and V$PGA_TARGET_ADVICE to predict the performance improvement of additional memory.
Oracle records many statistics about physical (disk) and logical (total) I/O requests. People used to obsess over the buffer cache hit ratio. It can be helpful but that number doesn't tell the whole story. If the ratio is 99% then your cache is probably sufficient and adding more memory won't help. If it's low then you might benefit from more memory, or perhaps the processes that use disk aren't time critical.
Be careful before you request more memory. I've seen a lot of memory wasted because some people assume more memory will solve everything. Oracle has many I/O features to help reduce memory requirements. The "in-memory database" fad is mostly hype.

Storage capacity of in-memory database?

Is storage capacity of in-memory database limited to size of RAM? If yes, is there any ways to increase its capacity except for increasing RAM size. If no, please give some explanations.
As previously mentioned, in-memory storage capacity is limited by the addressable memory, not by the amount of physical memory in the system. Simon was also correct that the OS will swap memory to the page file, but you really want to avoid that. In the context of the DBMS, the OS will do a worse job of it than if you simply used a persistent database with as large of a cache as you have physical memory to support. IOW, the DBMS will manage its cache more intelligently than the OS would manage paged memory containing in-memory database content.
On a 32 bit system, each process is limited to a total of 3GB of RAM, whether you have 3GB physically or 512MB. If you have more data (including the in-mem DB) and code then will fit into physical RAM then the Page file on disc is used to swap out memory that is currently not being used. Swapping does slow everything down though. There are some tricks you can use for extending that: Memory-mapped files, /3GB switch; but these are not easy to implement.
On 64 bit machines, a processes memory limitation is huge - I forget what it is but it's up in the TB range.
VoltDB is an in-memory SQL database that runs on a cluster of 64-bit Linux servers. It has high performance durability to disk for recovery purposes, but tables, indexes and materialized views are stored 100% in-memory. A VoltDB cluster can be expanded on the fly to increase the overall available RAM and throughput capacity without any down time. In a high-availability configuration, individual nodes can also be stopped to perform maintenance such as increasing the server's RAM, and then rejoined to the cluster without any down time.
The design of VoltDB, led by Michael Stonebraker, was for a no-compromise approach to performance and scalability of OLTP transaction processing workloads with full ACID guarantees. Today these workloads are often described as Fast Data. By using main memory, and single-threaded SQL execution code distributed for parallel processing by core, the data can be accessed as fast as possible in order to minimize the execution time of transactions.
There are in-memory solutions that can work with data sets larger than RAM. Of course, this is accomplished by adding some operations on disk. Tarantool's Vinyl, for example, can work with data sets that are 10 to 1000 times the size of available RAM. Like other databases of recent vintage such as RocksDB and Bigtable, Vinyl's write algorithm uses LSM trees instead of B trees, which helps with its speed.

Will performance of a SQL server degrade if the DB can't fit in the memory?

Will the performance of a SQL server drastically degrade if the database is bigger than the RAM? Or does only the index have to fit in the memory? I know this is complex, but as a rule of thumb?
Only the working set or common data or currently used data needs to fit into the buffer cache (aka data cache). This includes indexes too.
There is also the plan cache, network buffers + other stuff too. MS have put a lot of work into memory management on SQL Server and it's works well, IMHO.
Generally, more RAM will help but it's not essential.
Yes, when indexes cant fit in the memory or when doing full table scans. Doing aggregate functions over data not in memory will also require many (and maybe random) disc reads.
For some benchmarks:
Query time will depend significantly
on whether the affected data currently
resides in memory or disk access is
required. For disk intensive
operations, the characteristics of the
disk sequential and random I/O
performance are also important.
There for, don't expect the same performance if your db size > ram size.
Edit: is full of examples like:
Once the database doesn't fit in RAM you hit a wall.
Or here:
Even if the DB size is just 10% bigger than RAM size this test shows a 2.6 times drop in performance.
Although, remember that this is for hot data, data that you want to query over and don't can cache. If you can, you can easily live with significant less memory.
All DB operations will have to be backed up by writing to disk, having more RAM is helpful, but not essential.
Loading the whole database into RAM is not practical. Database can be upto a Terabytes these days. There is little chance that anyone would buy so much RAM. I think performance will be optimal even if the size of the RAM available is one tenth of the size of the database.
