Why Logical Read reduce the performance - SQL Server? - sql-server

Logical Reads - Reading the data from the data cache
Physical Reads - Reading the data from Disk
Usually, we are using the caches for better performance, it will reduce the time to access the data from the disk.
But in SQL, why reducing the logical reads improve performance?
If data cache affects the performance, then why we need that data cache in middle?
I'm new to SQL tuning, please clarify.
Thanks.

A logical read is a request to read a block. It may be served from cache or it may be served from disk. If it is served from disk, it is also a physical read. All physical reads are also logical reads.
What SQL Server (or any database) calls a physical read may or may not actually require going to spinning disk. The operating system might have a cache, the SAN array might have a cache, etc. A physical read from SQL Server simply means that SQL Server didn't have the block in cache.
Normally, when we're tuning a query, we focus on logical reads because that is a relatively stable value and a query that does less logical I/O will generally be faster than one that does more logical I/O. When we're tuning a query or when we're running it from our application, what fraction of our logical reads are actually physical reads is somewhat up to chance. If you're working on tuning a query, it is very likely that the data you're relying on is quickly going to get cached so your physical I/O will go down if you just keep running the query. When the query runs in production, you might get lucky and find that most of the blocks are in cache and there is very little I/O or you might get unlucky and find that very few of the blocks are in cache. If you focus on physical I/O when you're tuning, you'll be chasing a constantly moving target. And since SQL Server can't differentiate between a physical read that was actually served from the operating system cache or a physical read that was served from the SAN cache or a physical read that was served from a SSD in the SAN or a physical read that was served by going to actual physical spinning disk, you're mixing a bunch of different things with very different performance profiles together. If you run the query 100 times, you'll get roughly the same number of logical I/Os every time. You'll get wildly different numbers of physical I/Os and those physical I/Os are likely to have wildly different performance characteristics because some will be hitting physical disk and some will just be hitting the operating system cache.
As a very rough first approximation for most OLTP systems, the odds that a block you want is in cache is going to be roughly constant (most queries are reading relatively recent rows in most tables that are mostly cached). If your system keeps 95% of the blocks that you're reading in cache, you can reasonably guess that if you have a query that does 1000 logical I/O's per execution, on average it's going to do 50 physical I/O's per execution. Sometimes you'll get lucky and it'll do 0 physical I/Os, sometimes you'll get luck and it'll do 250 physical I/Os but on average you'll get 50. If you reduce the logical I/O, you'll probably reduce the physical I/O by the same fraction.
Of course, this is a very rough approximation. If you've got a poorly performing query that does a table scan of a multi-million row table, the odds that all the 10 year old rows are cached is a lot lower than the odds that the recent blocks that you actually mostly want to read are cached. If you get rid of the full-scan's logical I/O, you'll probably get a much larger percentage-wise reduction in physical I/O because you're focusing the query on the recent blocks that SQL Server is really good at caching.
And, of course, sometimes we're focused on things other than logical I/O. Sometimes our systems are CPU-bound not I/O-bound and we want to focus on how much CPU a query is using. Sometimes our systems are under memory pressure and we want to tune how much memory a query uses. But normally database systems are I/O bound and starting with a focus on logical I/O is normally reasonable when we tune a query.

Related

Why is a distributed in-memory cache faster than a database query?

https://medium.com/#i.gorton/six-rules-of-thumb-for-scaling-software-architectures-a831960414f9 states about distributed caches:
Better, why query the database if you don’t need to? For data that is frequently read and changes rarely, your processing logic can be modified to first check a distributed cache, such as a memcached server. This requires a remote call, but if the data you need is in cache, on a fast network this is far less expensive than querying the database instance.
The claim is that a distributed in-memory cache is faster than querying the database. Looking at Latency Numbers Every Programmer Should Know, it shows that the latencies of the operations compare like this: Main memory reference <<< Round trip within same datacenter < Read 1 MB sequentially from SSD <<< Send packet CA->Netherlands->CA.
I interpret a network call to access the distributed cache as "Send packet CA->Netherlands->CA" since the cached data may not be in the same datacenter. Am I wrong? Should I assume that replication factor is high such that cached data should be available in all datacenters and instead the comparison between a distributed cache and a database is more like "Round trip within same datacenter" vs "Read 1 MB sequentially from SSD"?
Databases typically require accessing data from disk, which is slow. Although most will cache some data in memory, which makes frequently run queries faster, there are other overheads such as:
query parsing and syntax checking
database permission/authorisation checking
data access plan creation by the optimizer
a quite chatty protocol, especially when multiple rows are returned
All of which add latency.
Caches have none of these overheads. In the general case, there are more reads than writes for caches, plus caches always have a value available in memory (if not a cold hit) - writing to the cache doesn't stop reading the current value - synchronised writes just mean a slight delay between the write request and the new value being available (everywhere).

MaximumInsertCommitSize optimization

I have a simple ssis ETL moving data from one server to another, a couple of the tables are 150gb+. For these larger tables how do I optimize the MaximumInsertCommitSize, on the server that is being loaded, I see the the the ram utilization is near 100% (64 gb) for the duration of the load for these tables. I am also getting pageiolatch_ex suspensions on the server that is being loaded.
This leads me to believe that the MaximumInsertCommitSize needs to be brought down.
my first question is whether you agree with this?
my other questions are more open ended
How do I optimize it by trial and error when it takes an hour to load the table (and table size matters for this operation so I would have to load all of it) ?
would network speed ever play into this optimization due to the increased bandwidth with multiple ?partitions? of data
would the hard drive speed affect it as this is the servers cheapest component (facepalm) - my thinking here is page io latch memory waits indicate that the number of disk operation is minimized but since the ram is over utilized sending tasks to the drive instead of waiting would be better.
Finally is there a calculator for this, I feel that even vague approximation of MaximumInsertCommitSize would be great (just based on network, disk, ram, file size, with assumptions that the destination is partitioned and has ample disk space)

In memory databases with LMDB

I have a project which uses BerkelyDB as a key value store for up to hundreds of millions of small records.
The way it's used is all the values are inserted into the database, and then they are iterated over using both sequential and random access, all from a single thread.
With BerkeleyDB, I can create in-memory databases that are "never intended to be preserved on disk". If the database is small enough to fit in the BerkeleyDB cache, it will never be written to disk. If it is bigger than the cache, then a temporary file will be created to hold the overflow. This option can speed things up significantly, as it prevents my application from writing gigabytes of dead data to disk when closing the database.
I have found that the BerkeleyDB write performance is too poor, even on an SSD, so I would like to switch to LMDB. However, based on the documentation, it doesn't seem like there is an option creating a non-persistent database.
What configuration/combination of options should I use to get the best performance out of LMDB if I don't care about persistence or concurrent access at all? i.e. to make it act like an "in-memory database" with temporary backing disk storage?
Just use MDB_NOSYNC and never call mdb_env_sync() yourself. You could also use MDB_WRITEMAP in addition. The OS will still eventually flush dirty pages to disk; you can play with /proc/sys/vm/dirty_ratio etc. to control that behavior.
From this post: https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
vm.dirty_ratio is the absolute maximum amount of system memory that can be filled with dirty pages before everything must get committed to disk. When the system gets to this point all new I/O blocks until dirty pages have been written to disk.
If the dirty ratio is too small, then you will see frequent synchronous disk writes.

Storage capacity of in-memory database?

Is storage capacity of in-memory database limited to size of RAM? If yes, is there any ways to increase its capacity except for increasing RAM size. If no, please give some explanations.
As previously mentioned, in-memory storage capacity is limited by the addressable memory, not by the amount of physical memory in the system. Simon was also correct that the OS will swap memory to the page file, but you really want to avoid that. In the context of the DBMS, the OS will do a worse job of it than if you simply used a persistent database with as large of a cache as you have physical memory to support. IOW, the DBMS will manage its cache more intelligently than the OS would manage paged memory containing in-memory database content.
On a 32 bit system, each process is limited to a total of 3GB of RAM, whether you have 3GB physically or 512MB. If you have more data (including the in-mem DB) and code then will fit into physical RAM then the Page file on disc is used to swap out memory that is currently not being used. Swapping does slow everything down though. There are some tricks you can use for extending that: Memory-mapped files, /3GB switch; but these are not easy to implement.
On 64 bit machines, a processes memory limitation is huge - I forget what it is but it's up in the TB range.
VoltDB is an in-memory SQL database that runs on a cluster of 64-bit Linux servers. It has high performance durability to disk for recovery purposes, but tables, indexes and materialized views are stored 100% in-memory. A VoltDB cluster can be expanded on the fly to increase the overall available RAM and throughput capacity without any down time. In a high-availability configuration, individual nodes can also be stopped to perform maintenance such as increasing the server's RAM, and then rejoined to the cluster without any down time.
The design of VoltDB, led by Michael Stonebraker, was for a no-compromise approach to performance and scalability of OLTP transaction processing workloads with full ACID guarantees. Today these workloads are often described as Fast Data. By using main memory, and single-threaded SQL execution code distributed for parallel processing by core, the data can be accessed as fast as possible in order to minimize the execution time of transactions.
There are in-memory solutions that can work with data sets larger than RAM. Of course, this is accomplished by adding some operations on disk. Tarantool's Vinyl, for example, can work with data sets that are 10 to 1000 times the size of available RAM. Like other databases of recent vintage such as RocksDB and Bigtable, Vinyl's write algorithm uses LSM trees instead of B trees, which helps with its speed.

Estimating IOPS requirements of a production SQL Server system

We're working on an application that's going to serve thousands of users daily (90% of them will be active during the working hours, using the system constantly during their workday). The main purpose of the system is to query multiple databases and combine the information from the databases into a single response to the user. Depending on the user input, our query load could be around 500 queries per second for a system with 1000 users. 80% of those queries are read queries.
Now, I did some profiling using the SQL Server Profiler tool and I get on average ~300 logical reads for the read queries (I did not bother with the write queries yet). That would amount to 150k logical reads per second for 1k users. Full production system is expected to have ~10k users.
How do I estimate actual read requirement on the storage for those databases? I am pretty sure that actual physical reads will amount to much less than that, but how do I estimate that? Of course, I can't do an actual run in the production environment as the production environment is not there yet, and I need to tell the hardware guys how much IOPS we're going to need for the system so that they know what to buy.
I tried the HP sizing tool suggested in the previous answers, but it only suggests HP products, without actual performance estimates. Any insight is appreciated.
EDIT: Main read-only dataset (where most of the queries will go) is a couple of gigs (order of magnitude 4gigs) on the disk. This will probably significantly affect the logical vs physical reads. Any insight how to get this ratio?
Disk I/O demand varies tremendously based on many factors, including:
How much data is already in RAM
Structure of your schema (indexes, row width, data types, triggers, etc)
Nature of your queries (joins, multiple single-row vs. row range, etc)
Data access methodology (ORM vs. set-oriented, single command vs. batching)
Ratio of reads vs. writes
Disk (database, table, index) fragmentation status
Use of SSDs vs. rotating media
For those reasons, the best way to estimate production disk load is usually by building a small prototype and benchmarking it. Use a copy of production data if you can; otherwise, use a data generation tool to build a similarly sized DB.
With the sample data in place, build a simple benchmark app that produces a mix of the types of queries you're expecting. Scale memory size if you need to.
Measure the results with Windows performance counters. The most useful stats are for the Physical Disk: time per transfer, transfers per second, queue depth, etc.
You can then apply some heuristics (also known as "experience") to those results and extrapolate them to a first-cut estimate for production I/O requirements.
If you absolutely can't build a prototype, then it's possible to make some educated guesses based on initial measurements, but it still takes work. For starters, turn on statistics:
SET STATISTICS IO ON
Before you run a test query, clear the RAM cache:
CHECKPOINT
DBCC DROPCLEANBUFFERS
Then, run your query, and look at physical reads + read-ahead reads to see the physical disk I/O demand. Repeat in some mix without clearing the RAM cache first to get an idea of how much caching will help.
Having said that, I would recommend against using IOPS alone as a target. I realize that SAN vendors and IT managers seem to love IOPS, but they are a very misleading measure of disk subsystem performance. As an example, there can be a 40:1 difference in deliverable IOPS when you switch from sequential I/O to random.
You certainly cannot derive your estimates from logical reads. This counter really is not that helpful because it is often unclear how much of it is physical and also the CPU cost of each of these accesses is unknown. I do not look at this number at all.
You need to gather virtual file stats which will show you the physical IO. For example: http://sqlserverio.com/2011/02/08/gather-virtual-file-statistics-using-t-sql-tsql2sday-15/
Google for "virtual file stats sql server".
Please note that you can only extrapolate IOs from the user count if you assume that cache hit ratio of the buffer pool will stay the same. Estimating this is much harder. Basically you need to estimate the working set of pages you will have under full load.
If you can ensure that your buffer pool can always take all hot data you can basically live without any reads. Then you only have to scale writes (for example with an SSD drive).

Resources