Performance scenario RAM Disk and In memory database(IMDB)? - database

I was just wondering, we have in memory database(IMDB) and we also have a way to put the database in a RAM Disk. So which would be faster? You valuable comments and experiences

Wikipedia - Computer Data Storage
Latency
The time it takes to access a particular location in storage. The relevant unit of measurement is typically nanosecond for primary storage, millisecond for secondary storage
It really depends on the hardware architecture. However internal memory is almost always the fastest way of storing and retrieving data, unless you have a specialized main board.

Related

Are the contents in a LMDB always stored BOTH in disk AND memory?

I'm wanting to use the rust implementation of LMDB but I can't seem to find whether it would always maintain a copy of whats in memory also in the disk and viceversa.
My reasoning would be that the DB will have some cache and any overflows would be written to disk but since I can't find it in the documentation I'm not sure.
The other case would be that LMDB maps its contents on disk to memory and that would work for small quantities but not for what I have in mind.
Does anyone have an idea on how LMDB works on this regard?
Does anyone have an idea on how LMDB works on this regard?
If you are worried about not being able to operate on a dataset that does not fit in memory, you are ok - LMDB does handle that. Your dataset is not limited to the size of ram.
LMDB is memory-mapped which is a technique that allows developers to read/write data on disk 'like' it is in memory - the OS does all the heavy lifting required
LMDB always stored BOTH in disk AND memory?
The entire dataset is on disk. Some parts of it are in memory. When the parts that are not in memory are needed - the OS fetches it from disk and gives it to the application by putting it in the process' memory
The other case would be that LMDB maps its contents on disk to memory and that would work for small quantities but not for what I have in mind.
Yes, that is it.
I read RocksDB support the usage you are looking while offering similar guarantees.
Small quantities depends on the available RAM. Major vendors including mongodb with wiredtiger backend, but not only, postgresql comes also to mind, that highly recommend to have as much memory as the working dataset.
In an opposite, you can find database systems such as rust full text search engine using an cold (offline?) object storage.

What influences Avg. Disk Sec/Read beyond hardware?

I have been monitoring the performance of an OLTP database (approx. 150GB); the average disk sec/read and average disk sec/write values are exceeding 20 ms over a 24hr period.
I need to arrive at a clear explanation as to why the business application has no influence over the 'less-than-stellar' performance on these counters. I also need to exert some pressure to have the storage folk re-examine their configuration as it applies to the placement of the mdf, ldf and tempdb files on their SAN. At present, my argument is shaky but I am pressing my point with people who don't understand the difference between IOPs and disk latency.
Beyond the limitations of physical hardware and the placement of data files across physical disks, is there anything else that would influence these counter values? For instance: the number of transactions per second, the size of the query, poorly written queries or missing indexes? My readings say 'no' but I need a voice of authority in this debate.
There are "a lot" of factors that can affect the overall latency. To truly rule it as SAN or not, you will want to look at the "Avg. Disk sec/Read counter" and the "Avg. Desk sec/Write Counter", that you mentioned. Just make sure you are looking at the "Physical Disk" object, and not the "Logical Disk" object. The logical disk counter includes the file system overhead, and may be different, depending on different factors.
Once you have the counters for the physical disks, you will want to compare them to the latency counters for the Storage unit, the server is connected to. You mentioned "storage folk" So I'm going to assume that is a different team, hopefully they will be nice and provide the info to you.
If it is a Storage unit issue, then both of these counters should match up pretty good. That indicates the storage unit is truly running slow. If the storage unit counters show significantly better, then it's something in between. Depending on what type of storage network you are using this would be the HBA/NIC/Switches that connect the server and storage together. Or if it's a VM then the host machine stats would prove useful as well.
Apart from obvious reasons such as "not enough memory for buffer pool", latency mostly depends on how your storage is actually implemented.
If your server has external SAN, usually its problem is that it might give you stellar throughput, but it will never (again, usually) give you stellar latency. It's just the way things are. It might become a real headache for heavy loaded OLTP systems, sure.
So, if you are about to squeeze every last microsecond from your storage, most probably you will need local drives. That, and your RAID 10 should have enough spindles to cope with the load.

Whole Oracle database in memory

Suppose I have an Oracle database whose data files are 256 GB in size. Is it a good idea to use a server with, say, 384 GB RAM in order to host the entire database in RAM?
Is there any difference if you only have, say, 128 GB RAM?
I'm talking about caching and Oracle inner workings, not memory based filesystem. Suppose OLTP, and a 100 GB working set.
Regards,
Assuming you are talking about Oracle using the memory for caching and other processes and not a memory based filesystem (which is an awful idea)... more memory is almost always better than less memory.
The real world answer is it depends. If your working set of data is a few GB or less then the extra memory wouldn't help as much.
How much memory you need and when extra memory stops helping depends on your application and what style of DB (OLTP,DSS) and there is no simple yes/no answer.
Use the views V$SGA_TARGET_ADVICE and V$PGA_TARGET_ADVICE to predict the performance improvement of additional memory.
Oracle records many statistics about physical (disk) and logical (total) I/O requests. People used to obsess over the buffer cache hit ratio. It can be helpful but that number doesn't tell the whole story. If the ratio is 99% then your cache is probably sufficient and adding more memory won't help. If it's low then you might benefit from more memory, or perhaps the processes that use disk aren't time critical.
Be careful before you request more memory. I've seen a lot of memory wasted because some people assume more memory will solve everything. Oracle has many I/O features to help reduce memory requirements. The "in-memory database" fad is mostly hype.

Storage capacity of in-memory database?

Is storage capacity of in-memory database limited to size of RAM? If yes, is there any ways to increase its capacity except for increasing RAM size. If no, please give some explanations.
As previously mentioned, in-memory storage capacity is limited by the addressable memory, not by the amount of physical memory in the system. Simon was also correct that the OS will swap memory to the page file, but you really want to avoid that. In the context of the DBMS, the OS will do a worse job of it than if you simply used a persistent database with as large of a cache as you have physical memory to support. IOW, the DBMS will manage its cache more intelligently than the OS would manage paged memory containing in-memory database content.
On a 32 bit system, each process is limited to a total of 3GB of RAM, whether you have 3GB physically or 512MB. If you have more data (including the in-mem DB) and code then will fit into physical RAM then the Page file on disc is used to swap out memory that is currently not being used. Swapping does slow everything down though. There are some tricks you can use for extending that: Memory-mapped files, /3GB switch; but these are not easy to implement.
On 64 bit machines, a processes memory limitation is huge - I forget what it is but it's up in the TB range.
VoltDB is an in-memory SQL database that runs on a cluster of 64-bit Linux servers. It has high performance durability to disk for recovery purposes, but tables, indexes and materialized views are stored 100% in-memory. A VoltDB cluster can be expanded on the fly to increase the overall available RAM and throughput capacity without any down time. In a high-availability configuration, individual nodes can also be stopped to perform maintenance such as increasing the server's RAM, and then rejoined to the cluster without any down time.
The design of VoltDB, led by Michael Stonebraker, was for a no-compromise approach to performance and scalability of OLTP transaction processing workloads with full ACID guarantees. Today these workloads are often described as Fast Data. By using main memory, and single-threaded SQL execution code distributed for parallel processing by core, the data can be accessed as fast as possible in order to minimize the execution time of transactions.
There are in-memory solutions that can work with data sets larger than RAM. Of course, this is accomplished by adding some operations on disk. Tarantool's Vinyl, for example, can work with data sets that are 10 to 1000 times the size of available RAM. Like other databases of recent vintage such as RocksDB and Bigtable, Vinyl's write algorithm uses LSM trees instead of B trees, which helps with its speed.

Will performance of a SQL server degrade if the DB can't fit in the memory?

Will the performance of a SQL server drastically degrade if the database is bigger than the RAM? Or does only the index have to fit in the memory? I know this is complex, but as a rule of thumb?
Only the working set or common data or currently used data needs to fit into the buffer cache (aka data cache). This includes indexes too.
There is also the plan cache, network buffers + other stuff too. MS have put a lot of work into memory management on SQL Server and it's works well, IMHO.
Generally, more RAM will help but it's not essential.
Yes, when indexes cant fit in the memory or when doing full table scans. Doing aggregate functions over data not in memory will also require many (and maybe random) disc reads.
For some benchmarks:
Query time will depend significantly
on whether the affected data currently
resides in memory or disk access is
required. For disk intensive
operations, the characteristics of the
disk sequential and random I/O
performance are also important.
http://www.sql-server-performance.com/articles/per/large_data_operations_p7.aspx
There for, don't expect the same performance if your db size > ram size.
Edit:
http://highscalability.com/ is full of examples like:
Once the database doesn't fit in RAM you hit a wall.
http://highscalability.com/blog/2010/5/3/mocospace-architecture-3-billion-mobile-page-views-a-month.html
Or here:
Even if the DB size is just 10% bigger than RAM size this test shows a 2.6 times drop in performance.
http://www.mysqlperformanceblog.com/2010/04/08/fast-ssd-or-more-memory/
Although, remember that this is for hot data, data that you want to query over and don't can cache. If you can, you can easily live with significant less memory.
All DB operations will have to be backed up by writing to disk, having more RAM is helpful, but not essential.
Loading the whole database into RAM is not practical. Database can be upto a Terabytes these days. There is little chance that anyone would buy so much RAM. I think performance will be optimal even if the size of the RAM available is one tenth of the size of the database.

Resources