Table Spool in SQL Server execution plan - sql-server

Is it really bad to get 'Table spool' in sql server execution plan? If not how it is advantageous? Do we really look for getting rid of Table Spool?

According to MSDN:
The Lazy Spool logical operator stores each row from its input in a
hidden temporary object stored in the tempdb database. If the operator
is rewound (for example, by a Nested Loops operator) but no rebinding
is needed, the spooled data is used instead of rescanning the input.
If rebinding is needed, the spooled data is discarded and the spool
object is rebuilt by rescanning the (rebound) input.
It's always better to have no operator than to have one. Advantages are described above (no rescanning). Disadvantage is that rows must be stored in tempdb (usually fits in memory for faster access).
Usually it's not bad to have this operator unless everything fits in memory. You must share execution plan/query for more datailed explanation and possible tweaks.

Related

SQL Server Profiler - Evaluating Reads. What is considered 'good' or 'bad'?

I'm profiling (SQL Server 2008) some of our views and queries to determine their efficiency with regards to CPU usage and Reads. I understand Reads are the number of logical disk reads in 8KB pages. But I'm having a hard time determining what I should be satisfied with.
For example, when I query one of our views, which in turn joins with another view and has three OUTER APPLYs with table-valued UDFs, I get a Reads value of 321 with a CPU value of 0. My first thought is that I should be happy with this. But how do I evaluate the value of 321? This tells me 2,654,208 bytes of data were logically read to satisfy the query (which returned a single row and 30 columns).
How would some of you go about determining if this is good enough, or requires more fine tuning? What criteria would you use?
Also, I'm curious what is included in the 2,654,208 bytes of logical data read. Does this include all the data contained in the 30 columns in the single row returned?
The 2.5MB includes all data in the 321 pages, including the other rows in the same pages as those retrieved for your query, as well as the index pages retrieved to find your data. Note that these are logical reads, not physical reads, e.g. read from a cached page will make the read much 'cheaper' - take CPU and profiler cost indicator as well when optimising.
w.r.t. How to determine an optimum 'target' for reads.
FWIW I compare the actual reads with a optimum value which I can think of as the minimum number of pages needed to return the data in your query in a 'perfect' world.
e.g. if you calculate roughly 5 rows per page from table x, and your query returns 20 rows, the 'perfect' number of reads would be 4, plus some overhead of navigating indexes (assuming of course that the rows are clustered 'perfectly' for your query) - so utopia would be around say 5-10 pages.
For a performance critical query, you can use the actual reads vs 'utopian' reads to micro-optimise, e.g.:
Whether I can fit more rows per page in the cluster (table), e.g. replacing non-searched strings with varchar() not char, or using varchar not nvarchar() or using smaller integer types etc.
Whether the clustered index could be changed such that fewer pages would need to be fetched (e.g. if the 20 rows for the above query were scattered across different pages, then reads would be > 4)
Failing which (since you can only one CI), whether covering indexes could replace the need to go to the table data (cluster) at all, since covering indexes fitting your query will have higher 'row' densities
And for indexes, density improvements such as fillfactors or narrower indexing for indexes can mean less index reads
You might find this article useful
HTH!
321 reads with a CPU value of 0 sounds pretty good, but it all depends.
How often is this query run? Why are table-returning UDFs used instead of just doing joins? What is the context of database use (how many users, number of transactions per second, database size, is it OLTP or data warehousing)?
The extra data reads come from:
All the other data in the pages needed to satisfy the reads done in the execution plan. Note this includes clustered and nonclustered indexes. Examining the execution plan will give you a better idea of what exactly is being read. You'll see references to all sorts of indexes and tables, and whether a seek or scan was required. Note that a scan means every page in the whole index or table was read. That is why seeks are desirable over scans.
All the related data in tables INNER JOINed to in the views regardless of whether these JOINs are needed to give correct results for the query you're performing, since the optimizer doesn't know that these INNER JOINs will or won't exclude/include rows until it JOINs them.
If you provide the queries and execution plans, as requested, I would probably be able to give you better advice. Since you're using table-valued UDFs, I would also need to see the UDFs themselves or at least the execution plan of the UDFs (which is only possibly by tearing out its meat and running outside a function context, or converting it to a stored procedure).

How do database perform on dense data?

Suppose you have a dense table with an integer primary key, where you know the table will contain 99% of all values from 0 to 1,000,000.
A super-efficient way to implement such a table is an array (or a flat file on disk), assuming a fixed record size.
Is there a way to achieve similar efficiency using a database?
Clarification - When stored in a simple table / array, access to entries are O(1) - just a memory read (or read from disk). As I understand, all databases store their nodes in trees, so they cannot achieve identical performance - access to an average node will take a few hops.
Perhaps I don't understand your question but a database is designed to handle data. I work with database all day long that have millions of rows. They are efficiency enough.
I don't know what your definition of "achieve similar efficiency using a database" means. In a database (from my experience) what are exactly trying to do matters with performance.
If you simply need a single record based on a primary key, the the database should be naturally efficient enough assuming it is properly structure (For example, 3NF).
Again, you need to design your database to be efficient for what you need. Furthermore, consider how you will write queries against the database in a given structure.
In my work, I've been able to cut query execution time from >15 minutes to 1 or 2 seconds simply by optimizing my joins, the where clause and overall query structure. Proper indexing, obviously, is also important.
Also, consider the database engine you are going to use. I've been assuming SQL server or MySql, but those may not be right. I've heard (but have never tested the idea) that SQLite is very quick - faster than either of the a fore mentioned. There are also many other options, I'm sure.
Update: Based on your explanation in the comments, I'd say no -- you can't. You are asking about mechanizes designed for two completely different things. A database persist data over a long amount of time and is usually optimized for many connections and data read/writes. In your description the data in an array, in memory is for a single program to access and that program owns the memory. It's not (usually) shared. I do not see how you could achieve the same performance.
Another thought: The absolute closest thing you could get to this, in SQL server specifically, is using a table variable. A table variable (in theory) is held in memory only. I've heard people refer to table variables as SQL server's "array". Any regular table write or create statements prompts the RDMS to write to the disk (I think, first the log and then to the data files). And large data reads can also cause the DB to write to private temp tables to store data for later or what-have.
There is not much you can do to specify how data will be physically stored in database. Most you can do is to specify if data and indices will be stored separately or data will be stored in one index tree (clustered index as Brian described).
But in your case this does not matter at all because of:
All databases heavily use caching. 1.000.000 of records hardly can exceed 1GB of memory, so your complete database will quickly be cached in database cache.
If you are reading single record at a time, main overhead you will see is accessing data over database protocol. Process goes something like this:
connect to database - open communication channel
send SQL text from application to database
database analyzes SQL (parse SQL, checks if SQL command is previously compiled, compiles command if it is first time issued, ...)
database executes SQL. After few executions data from your example will be cached in memory, so execution will be very fast.
database packs fetched records for transport to application
data is sent over communication channel
database component in application unpacks received data into some dataset representation (e.g. ADO.Net dataset)
In your scenario, executing SQL and finding records needs very little time compared to total time needed to get data from database to application. Even if you could force database to store data into array, there will be no visible gain.
If you've got a decent amount of records in a DB (and 1MM is decent, not really that big), then indexes are your friend.
You're talking about old fixed record length flat files. And yes, they are super-efficient compared to databases, but like structure/value arrays vs. classes, they just do not have the kind of features that we typically expect today.
Things like:
searching on different columns/combintations
variable length columns
nullable columns
editiablility
restructuring
concurrency control
transaction control
etc., etc.
Create a DB with an ID column and a bit column. Use a clustered index for the ID column (the ID column is your primary key). Insert all 1,000,000 elements (do so in order or it will be slow). This is kind of inefficient in terms of space (you're using nlgn space instead of n space).
I don't claim this is efficient, but it will be stored in a similar manner to how an array would have been stored.
Note that the ID column can be marked as being a counter in most DB systems, in which case you can just insert 1000000 items and it will do the counting for you. I am not sure if such a DB avoids explicitely storing the counter's value, but if it does then you'd only end up using n space)
When you have your primary key as a integer sequence it would be a good idea to have reverse index. This kind of makes sure that the contiguous values are spread apart in the index tree.
However, there is a catch - with reverse indexes you will not be able to do range searching.
The big question is: efficient for what?
for oracle ideas might include:
read access by id: index organized table (this might be what you are looking for)
insert only, no update: no indexes, no spare space
read access full table scan: compressed
high concurrent write when id comes from a sequence: reverse index
for the actual question, precisely as asked: write all rows in a single blob (the table contains one column, one row. You might be able to access this like an array, but I am not sure since I don't know what operations are possible on blobs. Even if it works I don't think this approach would be useful in any realistic scenario.

Caching Function Results in SQL Server 2000

I want to memoize function results for performance, i.e. lazily populate a cache indexed on the function arguments. The first time I call a function, the cache won't have anything for the input arguments, so it will calculate it and store it before returning it. Subsequent calls just use the cache.
However, it seems that SQL Server 2000 has a stupid arbitrary rule about functions being "deterministic". INSERTs, UPDATEs, and regular stored procedure calls are forbidden. However, extended stored procedures are allowed. How is this deterministic? If another session modifies the database state, the function output will change anyways.
I'm steaming mad. I had thought I could make caching transparent to the user. Is this possible? I don't have the permissions to deploy extended stored procedures.
EDIT:
This limitation is still in 2008. You can't call RAND, for God's sake!
The cache would be implemented by me in the DB. A cache is any data store used for caching...
EDIT:
There are no cases where the same arguments to a function will yield different results, outside of changes to the underlying data. This is a BI platform, and the only changes come from scheduled ETL, at which time I would TRUNCATE the cache table.
These are I/O intensive time series calculations, on the order of O(n^4). I don't have the mandate to change the underlying table or indexes. Also, a lot of these functions use the same intermediate functions, and caching allows those to be used.
UDFs are not truly deterministic, unless they account for changes in database state. What's the point? Is SQL Server caching? (Ironic.) If SQL Server is caching, then it must be expiring on changes to tables that are schema bound. If they're schema bound, then why not bind tables that the function modifies? I can see why procs aren't allowed, although that's just sloppy; just schema bind procs. And, BTW, why allow extended stored procs? You can't possibly track what those do to ensure determinism!!! Argh!!!
EDIT:
My question is: Is there any way to lazily cache function results in a way that can be used in a view?
Deterministic means that the same inputs return the same output independent of time and database.
SQL Server (any version) does no caching of UDFs - I believe it will avoid calling the UDF twice on a single row, but that's it.
One trick I've used is to (I think I posted it here on SO):
Refactor the UDF if you can so that there are effectively a usable discrete subset of values returned for a given set of inputs. For numerical calculations, one can sometimes refactor the logic to return a factor or rate which is multiplied outside the UDF instead of multiplied inside the UDF from a passed in value.
Call the UDF over the DISTINCT rowset and cache the results to a temporary table. If you are only calling the UDF with 100,000 tuples of parameters over a 17,000,000 row set, this is very much more efficient.
JOIN to the temporary table (basically converting from code-based logic to table-based logic) to get values.
This table can be re-used as necessary or even kept.
Addition to the table can be done by first LEFT JOINing to find missing cached entries.
This works for both single-row table-valued UDFs and scalar UDFs. I'm mainly using it for table-valued UDFs. There is a hotfix to SQL Server 2005 which is supposed to address the UDF performance - I'm waiting on mthe DBAs to test it before deploying to production.

Does the order of columns in a WHERE clause matter?

Does the order of the columns in a WHERE clause effect performance?
e.g.
Say I put a column that has a higher potential for uniqueness first or visa versa?
With a decent query optimiser: it shouldn't.
But in practice, I suspect it might.
You can only tell for your cases by measuring. And the measurements will likely change as the distribution of data changes in the database.
For Transact-SQL there is a defined precedence for operators in the condition of the WHERE clause. The optimizer may re-order this evaluation, so you shouldn't rely on short-circuiting behavior for correctness. The order is generally left to right, but selectivity/availability of indexes probably also matters. Simplifying your search condition should improve the ability of the optimizer to handle it.
Ex:
WHERE (a OR b) AND (b OR c)
could be simplified to
WHERE b OR (a AND c)
Clearly in this case if the query can be constructed to find if b holds first it may be able to skip the evaluation of a and c and thus would run faster. Whether the optimizer can do this simple transformation I can't answer (it may be able to), but the point is that it probably can't do arbitrarily complex transformations and you may be able to effect query performance by rearranging your condition. If b is more selective or has an index, the optimizer would likely be able to construct a query using it first.
EDIT: With regard to your question about ordering based on uniqueness, I would assume that the any hints you can provide to the optimizer based on your knowledge (actual, not assumed) of the data couldn't hurt. Pretend that it won't do any optimization and construct your query as if you needed to define it from most to least selective, but don't obsess about it until performance is actually a problem.
Quoting from the reference above:
The order of precedence for the logical operators is NOT (highest),
followed by AND, followed by OR. Parentheses can be used to override
this precedence in a search condition. The order of evaluation of
logical operators can vary depending on choices made by the query
optimizer.
For SQL Server 2000 / 20005 / 2008, the query optimizer usually will give you identical results no matter how you arrange the columns in the WHERE clause. Having said this, over the years of writing thousands of T-SQL commands I have found a few corner cases where the order altered the performance. Here are some characteristics of the queries that appeared to be subject to this problem:
If you have a large number of tables in your query (10 or more).
If you have several EXISTS, IN, NOT EXISTS, or NOT IN statements in your WHERE clause
If you are using nested CTE's (common-table expressions) or a large number of CTE's.
If you have a large number of sub-queries in your FROM clause.
Here are some tips on trying to evaluate the best way to resolve the performance issue quickly:
If the problem is related to 1 or 2, then try reordering the WHERE clause and compare the sub-tree cost of the queries in the estimated query plans.
If the problem is related to 3 or 4, then try moving the sub-queries and CTE's out of the query and have them load temporary tables. The query plan optimizer is FAR more efficient at estimating query plans if you reduce the number of complex joins and sub-queries from the body of the T-SQL statement.
If you are using temporary tables, then make certain you have specified primary keys for the temporary tables. This means avoid using SELECT INTO FROM to generate the table. Instead, explicitly create the table and specify a primary KEY before using an INSERT INTO SELECT statement.
If you are using temporary tables and MANY processes on the server use temporary tables as well, then you may want to make a more permanent staging table that is truncated and reloaded during the query process. You are more likely to encounter disk contention issues if you are using the TempDB to store your working / staging tables.
Move the statements in the WHERE clause that will filter the most data to the beginning of the WHERE clause. Please note that if this is your solution to the problem, then you will probably have poor performance again down the line when the query plan gets confused again about generating and picking the best execution plan. You are BEST off finding a way to reduce the complexity of the query so that the order of the WHERE clause is no longer relevant.
I hope you find this information helpful. Good luck!
It all depends on the DBMS, query optimizer and rules, but generally it does affect performance.
If a where clause is ordered such that the first condition reduces the resultset significantly, the remaining conditions will only need to be evaluated for a smaller set. Following that logic, you can optimize a query based on condition order in a where clause.
In theory any two queries that are equivalent should produce identical query plans. As the order of WHERE clauses has no effect on the logical meaning of the query, this should mean that the order of the WHERE clause should have no effect.
This is because of the way that the query optimiser works. In a vastly simplified overview:
First SQL Server parses the query and constructs a tree of logical operators (e.g JOIN or SELECT).
Then it translates these logical operators into a "tree of physcial operations" (e.g. "Nested Loops" or "Index scan", i.e. an execution plan)
Next it permutates through the set of equivalent "trees of physcial operations" (i.e. execution plans) by swapping out equivalent operations, estimating the cost of each plan until it finds the optimal one.
The second step is done is a completely nieve way - it simply chooses the first / most obvious physical tree that it can, however in the 3rd step the query optimiser is able to look through all equivalent physical trees (i.e. execution plans), and so as long as the queries are actually equivalent it doesn't matter what initial plan we get in step 2, the set of plans all plans to be considered in step 3 is the same.
(I can't remember the real names for the logical / physical trees, they are in a book but unfortunately the book is the other side of the world from me right now)
See the following series of blog articles for more detail Inside the Optimizer: Constructing a Plan - Part 1
In reality however often the query optimiser doesn't have the chance to consider all equivalent trees in step 3 (for complex queries there can be a massive number of possible plans), and so after a certain cutoff time step 3 is cut short and the query optimiser has to choose the best plan that it has found so far - in this case not all plans will be considered.
There is a lot of behind the sceene magic that goes on to ensure that the query optimiser selectively and inteligently chooses plans to consider, and so most of the time the plan choses is "good enough" - even if its not the absolute fastest plan, its probably not that much slower than the theoretical fastest,
What this means however is that if we have a different starting plan in step 2 (which might happen if we write our query differently), this potentially means that a different subset of plans is considered in step 3, and so in theory SQL Server can come up with different query plans for equivalent queries depending on the way that they were written.
In reality however 99% of the time you aren't going to notice the difference (for many simple plans there wont be any difference as the optimiser will actually consider all plans). Also you can't predict how any of this is going to work and so things that might seem sensible (like putting the WHERE clauses in a certain order), might not have anything like the expected effect.
In the vast majority of cases the query optimizer will determine the most efficient way to select the data you have requested, irrespective of the ordering of the SARGS defined in the WHERE clause.
The ordering is determined by factors such as the selectivity of the column in question (which SQL Server knows based on statistics) and whether or not indexes can be used.
If you are ANDing conditions the first not true will return false, so order can affect performance.

Ways to avoid eager spool operations on SQL Server

I have an ETL process that involves a stored procedure that makes heavy use of SELECT INTO statements (minimally logged and therefore faster as they generate less log traffic). Of the batch of work that takes place in one particular stored the stored procedure several of the most expensive operations are eager spools that appear to just buffer the query results and then copy them into the table just being made.
The MSDN documentation on eager spools is quite sparse. Does anyone have a deeper insight into whether these are really necessary (and under what circumstances)? I have a few theories that may or may not make sense, but no success in eliminating these from the queries.
The .sqlplan files are quite large (160kb) so I guess it's probably not reasonable to post them directly to a forum.
So, here are some theories that may be amenable to specific answers:
The query uses some UDFs for data transformation, such as parsing formatted dates. Does this data transformation necessitate the use of eager spools to allocate sensible types (e.g. varchar lengths) to the table before it constructs it?
As an extension of the question above, does anyone have a deeper view of what does or does not drive this operation in a query?
My understanding of spooling is that it's a bit of a red herring on your execution plan. Yes, it accounts for a lot of your query cost, but it's actually an optimization that SQL Server undertakes automatically so that it can avoid costly rescanning. If you were to avoid spooling, the cost of the execution tree it sits on will go up and almost certainly the cost of the whole query would increase. I don't have any particular insight into what in particular might cause the database's query optimizer to parse the execution that way, especially without seeing the SQL code, but you're probably better off trusting its behavior.
However, that doesn't mean your execution plan can't be optimized, depending on exactly what you're up to and how volatile your source data is. When you're doing a SELECT INTO, you'll often see spooling items on your execution plan, and it can be related to read isolation. If it's appropriate for your particular situation, you might try just lowering the transaction isolation level to something less costly, and/or using the NOLOCK hint. I've found in complicated performance-critical queries that NOLOCK, if safe and appropriate for your data, can vastly increase the speed of query execution even when there doesn't seem to be any reason it should.
In this situation, if you try READ UNCOMMITTED or the NOLOCK hint, you may be able to eliminate some of the Spools. (Obviously you don't want to do this if it's likely to land you in an inconsistent state, but everyone's data isolation requirements are different). The TOP operator and the OR operator can occasionally cause spooling, but I doubt you're doing any of those in an ETL process...
You're right in saying that your UDFs could also be the culprit. If you're only using each UDF once, it would be an interesting experiment to try putting them inline to see if you get a large performance benefit. (And if you can't figure out a way to write them inline with the query, that's probably why they might be causing spooling).
One last thing I would look at is that, if you're doing any joins that can be re-ordered, try using a hint to force the join order to happen in what you know to be the most selective order. That's a bit of a reach but it doesn't hurt to try it if you're already stuck optimizing.

Resources