From a performance tuning perspective which one is more important?
Say a query reports 30 scans and 148 logical reads on a table with about 2 million records.
A modified version of the the same query reports 1 scan with 1400 logical reads. Second query takes about 40ms less CPU time to execute. Is the second query better?
I think so and this is my thesis:
In the first case, we have a high number of scans on a very large table. This is costly on CPU and server memory, since all the rows in the table have to be loaded into memory. Executing such a query thousands of times will be taxing on server resources.
In the second case, we have less scans even though we are accumulating a higher number of logical reads. Since logical reads effectively corresponds to number of pages being read from cache, the bottle neck here will be network bandwidth in getting the results back to the client. The actual work SQL Server has to do in this case is less.
What are your thoughts?
The logical read metrics are mostly irrelevant. You care about time elapsed, CPU time spent and disk resources used. Why would you care about logical reads? They are accounted for by looking at CPU time.
If you want your query to go faster measure wall clock time. If you want to use less resources measure CPU and physical IO.
Related
I have a relatively large sqlite db (~2 GB). I'm running a simple SELECT query on it and it roughly takes 2 min to run. Any similar query after first execution takes less than a second.
I did some google and apparently this condition is known as 'cold cache/warm cache' behavior. In which at the first attempt, each byte is physically read from the hard disk (cold cache) and this is a slow task. On the next attempts, it simply retrieve data from RAM buffers (warm cache).
However, on my SELECT statement I'm just selecting a few tables with certain conditions. Does sqlite need to transfer ALL the database to RAM first before running any query? Is there any way around it or it's just the way it works?
Thanks for any comment
P.S. I tried VACUUM and ANALYZE but didn't improve execution time.
There are two kinds of queries that I ran,
1.A purposely introduced query to perform sorting(order by) in about 10 columns.This uses CPU since sorting is a CPU intensive operation.
The scenario involved running the query which took 30 seconds and ran about 100 of those using simultaneous connections on 100 different tables.CPU usage on a 32 core machine was about 85% on all 32 cores and all 100 queries ran in parallel.
2.Inserting a million rows on a table.
I don't understand why this would consume CPU, since this is purely disk I/O.But I inserted 1 million rows on a single table using 100 simultaneous connections/threads and no indexes where there on those tables,now insert is not the fastest way to load data, but the point here is it is consuming CPU time about 32% on about 10 cores.This is way lesser than the above but still I am just curios.
I could be wrong because of Wal archiving was on and query log was on - does this contribute to CPU.I am assuming no since those are also disk IO.
There was no other process/application running/installed on this machine other than postgres.
Many different things:
CPU time for query planning and the logic in the executor for query execution
Transforming text representations of tuples into their on-disk format. Parsing dates, and so on.
Log output
Processing the transaction logs
Writing to shared_buffers when inserting pages to write, scanning shard_buffers for pages to write out
Interprocess communication for lock management
Scanning through in-memory cached copies of indexes when checking uniqueness, inserting new keys in an index, etc
....
If you really want to know the juicy details, fire up perf with stack traces enabled to see where CPU time is spent.
If your table had a primary key, then it has an implicit index.
It may also be true that if the table had a primary key, then it would be stored as a b-tree and not a simple flat table; I'm not clear on this point since my postgres-fu has weakened over the years, but many DBMSes use the primary key as a default clustering key for a b-tree and just store everything in the b-tree. Managing that b-tree requires plenty of CPU.
Additionally, if you're inserting from 100 threads and connections, then postgres has to perform locking in order to keep internal data structures consistent. Fighting for locks can consume a ton of CPU, and is especially difficult to do efficiently on machines with many CPUs - acquiring a single mutex requires the cooperation of every CPU in the system ala cache coherency protocol.
You may want to experiment with different numbers of threads, while measuring overall runtime and cpu usage - you may find that with, say, 8 threads, the total CPU utilized is 1/10th of your current usage, but still gets the job done within 110-150% of the original time. This would be a sure sign that lock contention is killing your CPU usage.
We are running a daily batch and see sometimes factor 20 runtime diffenencies.
Analyzing a trace which recorded fast and slow performance timeframes I isolated a select statement returning a single row from a clustered index which logs a duration of 1101 micos (3 logical Reads) in the "fast" timeframe.
A few minutes later the same select with the same plan lasted 28'275 micros (3 logical reads).
Both timeframes (fast/slow) are in prework time and there is almost no other activity on the server.
It is a AlwaysOn cluster running SQLServer 2012 with CPU-usage always below 30% and due to lots of RAM low IO activity.
To us the trace does not reveal a reason for the long duration. Any suggestions what we could trace for to gain more insight?
Thanks
Juerg
Addition:
Added tracing for some of the action and found another strange thing. The app is requesting the same data from the same table with different PK's with dynamic SQL commands (select * from t1 where OID='...'). It does it 4 times in a row and the exec plan is the same (1 index seek and 1 Key Lookup) for all 4 selects. Each select triggeres 8 locical reads. 3 out of the 4 selects log 0 ms CPU time in the trace and 1 logs 15 ms?
Am I right that even a physical read (can't see that in the trace but we got lots of RAM and I doubt that a physical read happens) should not increase the CPU count? What could cause that counter to be so high in comparison to the other reads?
Theoretical SQL Server 2008 question:
If a table-scan is performed on SQL Server with a significant amount of 'free' memory, will the results of that table scan be held in memory, thereby negating the efficiencies that may be introduced by an index on the table?
Update 1: The tables in question contain reference data with approx. 100 - 200 records per table (I do not know the average size of each row), so we are not talking about massive tables here.
I have spoken to the client about introducing a memcached / AppFabric Cache solution for this reference data, however that is out of scope at the moment and they are looking for a 'quick win' that is minimal risk.
Every page read in the scan will be read into the buffer pool and only released under memory pressure as per the cache eviction policy.
Not sure why you think that would negate the efficiencies that may be introduced by an index on the table though.
An index likely means that many fewer pages need to be read and even if all pages are already in cache so no physical reads are required reducing the number of logical reads is a good thing. Logical reads are not free. They still have overhead for locking and reading the pages.
Besides the performance problem (even when all pages are in memory a scan is still going to be many many times slower than an index seek on any table of significant size) there is an additional issue: contention.
The problem with scans is that any operation will have to visit every row. This means that any select will block behind any insert/update/delete (since is guaranteed to visit the row locked by these operations). The effect is basically serialization of operations and adds huge latency, as SELECT now have to wait for DML to commit every time. Even under mild concurrency the effect is an overall sluggish and slow to respond table. With indexes present operations are only looking at rows in the ranges of interest and this, by virtue of simple probabilities, reduces the chances of conflict. The result is a much livelier, responsive, low latency system.
Full Table Scans also are not scalable as the data grows. It’s very simple. As more data is added to a table, full table scans must process more data to complete and therefore they will take longer. Also, they will produce more Disk and Memory requests, further putting strain on your equipment.
Consider a 1,000,000 row table that a full table scan is performed on. SQL Server reads data in the form of an 8K data page. Although the amount of data stored within each page can vary, let’s assume that on average 50 rows of data fit in each of these 8K pages for our example. In order to perform a full scan of the data to read every row, 20,000 disk reads (1,000,000 rows / 50 rows per page). That would equate to 156MB of data that has to be processed, just for this one query. Unless you have a really super fast disk subsystem, it might take it a while to retrieve all of that data and process it. Now then, let’s say assume that this table doubles in size each year. Next year, the same query must read 312MB of data just to complete.
Pls refer this link - http://www.datasprings.com/resources/articles-information/key-sql-performance-situations-full-table-scan
Im trying to squeeze some extra performance from searching through a table with many rows.
My current reasoning is that if I can throw away some of the seldom used member from the searched table thereby reducing rowsize the amount of pagesplits and hence IO should drop giving a benefit when data start to spill from memory.
Any good resource detailing such effects?
Any experiences?
Thanks.
Tuning the size of a row is only a major issue if the RDBMS is performing a full table scan of the row, if your query can select the rows using only indexes then the row size is less important (unless you are returning a very large number of rows where the IO of returning the actual result is significant).
If you are doing a full table scan or partial scans of large numbers of rows because you have predicates that are not using indexes then rowsize can be a major factor. One example I remember, On a table of the order of 100,000,000 rows splitting the largish 'data' columns into a different table from the columns used for querying resulted in an order of magnitude performance improvement on some queries.
I would only expect this to be a major factor in a relatively small number of situations.
I don't now what else you tried to increase performance, this seems like grasping at straws to me. That doesn't mean that it isn't a valid approach. From my experience the benefit can be significant. It's just that it's usually dwarfed by other kinds of optimization.
However, what you are looking for are iostatistics. There are several methods to gather them. A quite good introduction can be found ->here.
The sql server query plan optimizer is a very complex algorithm and decision what index to use or what type of scan depends on many factors like query output columns, indexes available, statistics available, statistic distribution of you data values in the columns, row count, and row size.
So the only valid answer to your question is: It depends :)
Give some more information like what kind of optimization you have already done, what does the query plan looks like, etc.
Of cause, when sql server decides to do a table scna (clustered index scan if available), you can reduce io-performance by downsize row size. But in that case you would increase performance dramatically by creating a adequate index (which is a defacto a separate table with smaller row size).
If the application is transactional then look at the indexes in use on the table. Table partitioning is unlikely to be much help in this situation.
If you have something like a data warehouse and are doing aggregate queries over a lot of data then you might get some mileage from partitioning.
If you are doing a join between two large tables that are not in a 1:M relationship the query optimiser may have to resolve the predicates on each table separately and then combine relatively large intermediate result sets or run a slow operator like nested loops matching one side of the join. In this case you may get a benefit from a trigger-maintained denormalised table to do the searches. I've seen good results obtained from denormalised search tables for complex screens on a couple of large applications.
If you're interested in minimizing IO in reading data you need to check if indexes are covering the query or not. To minimize IO you should select column that are included in the index or indexes that cover all columns used in the query, this way the optimizer will read data from indexes and will never read data from actual table rows.
If you're looking into this kind of details maybe you should consider upgrading HW, changing controllers or adding more disk to have more disk spindle available for the query processor and so allowing SQL to read more data at the same time
SQL Server disk I/O is frequently the cause of bottlenecks in most systems. The I/O subsystem includes disks, disk controller cards, and the system bus. If disk I/O is consistently high, consider:
Move some database files to an additional disk or server.
Use a faster disk drive or a redundant array of inexpensive disks (RAID) device.
Add additional disks to a RAID array, if one already is being used.
Tune your application or database to reduce disk access operations.
Consider index coverage, better indexes, and/or normalization.
Microsoft SQL Server uses Microsoft Windows I/O calls to perform disk reads and writes. SQL Server manages when and how disk I/O is performed, but the Windows operating system performs the underlying I/O operations. Applications and systems that are I/O-bound may keep the disk constantly active.
Different disk controllers and drivers use different amounts of CPU time to perform disk I/O. Efficient controllers and drivers use less time, leaving more processing time available for user applications and increasing overall throughput.
First thing I would do is ensure that your indexes have been rebuilt; if you are dealing with huge amount of data and an index rebuild is not possible (if SQL server 2005 onwards you can perform online rebuilds without locking everyone out), then ensure that your statistics are up to date (more on this later).
If your database contains representative data, then you can perform a simple measurement of the number of reads (logical and physical) that your query is using by doing the following:
SET STATISTICS IO ON
GO
-- Execute your query here
SET STATISTICS IO OFF
GO
On a well setup database server, there should be little or no physical reads (high physical reads often indicates that your server needs more RAM). How many logical reads are you doing? If this number is high, then you will need to look at creating indexes. The next step is to run the query and turn on the estimated execution plan, then rerun (clearing the cache first) displaying the actual execution plan. If these differ, then your statistics are out of date.
I think you're going to be farther ahead using standard optimization techniques first -- check your execution plan, profiler trace, etc. and see whether you need to adjust your indexes, create statistics etc. -- before looking at the physical structure of your table.