Why use temporary tables in stored procedures that are returning large result sets. How does this help performance? Is there an example out there of maybe a join of several tables returning a large set of data and how a temporary table may help performance of this query in a stored procedure?
In my experience they may be helpful in limited situations when a query is so complex that the query optimizer is struggling to come up with a decent plan. Breaking such a query apart and storing intermediate results in temp tables may help if done right. I use this strategy as a last resort because temp tables are expensive and for large results sets they may be very expensive.
I found this excelent article quite useful in answering this question:
Paul White - temporary tables in stored procedures
Just to underline some concepts from the article:
Temporary tables can be very useful as a way of simplifying a large query into smaller parts, giving the optimizer a better chance of finding good execution plans, providing statistical information about an intermediate result set
Temporary objects may be cached across executions, despite explicit CREATE and DROP statements
Statistics associated with a cached temporary object are also cached
On my behalf, I would just add that, if the stored procedure accesses the same data in another server more than once and we have a slow connection, it may be usefull to bring it into a temporary table. Of course, a table variable would also be valid for this purpose.
Related
While attempting to improve performance on a stored procedure, the execution plan reported some missing indexes (obvious wins). I see awful indexes on the table now - some repeated, some overlaps, some missing columns. I expect to drop some indexes entirely, update / consolidate others, and I might get to add one or two new ones (though I doubt it).
I've tuned indexes in the past, but on tables with relatively few sp's. This table has been identified as a problem but nobody's clear how to effectively test hundreds of dependent sp's. I believe I'll have to run every stored procedure, repeatedly, both before and after indexing, to demonstrate that any change is useful.
I've seen load-testing tools, and that inspired my first plan of attack. Is there an open-source tool that analyses the code / table and provides meaningful parameters, then executes hundreds of sp's in independent loops, multi-threaded? I hope not to have to hand-curate the parameter values. The server is rebooted weekly so historical patterns take a while to collect.
Second, is this the best approach? I've tuned indexes where only a few stored procedures were impacted, never anything at this scope - is there a better approach?
Thanks!
In order to load data from multiple data sources and a big amount of Data using SQL Server 2014.
My ETL Scripts are in T-SQL and it taking a lot of time to execute because my TempDB are full.
In your opinion, which is the best way to lead with this:
Using Commit Transactions?
Clean TempDB?
etc.
They only way to answer this question is with a very high level general response.
You have a few options:
Simply allocate more space to TempDB.
Optimize your ETL queries and tune your indexes.
Option 2 is often the better apporoach. Excessive use of TempDB indicates that inefficient sorts or joins are occurring. To resolve this, you need to analyze the actual execution plans of your ETL code. Look for the following:
Exclamation marks in your query plan. This often indicates that a join or a sort operation had to spill over to TempDB because the optimizer under estimated the amount of memory required. You might have statistics which needs to be updated.
Look for large differences in the estimated number of rows and actual number of rows. This can also indicate statistics that are out of date of parameter sniffing issues.
Look for sort operations. It is often possible to remove these by adding indexes to your tables.
Look for inefficient access methods. These can often be resolved by adding covering indexes. E.g table scan if you only need a small number of rows from a large table. Just note that table scans are often the best approach when loading data warehouses.
Hope this was helpful.
Marius
We have a query in our system that has been a problem in the amount of logical reads it is using. The query is run often enough (a few times a day), but it is report in nature (i.e. gathering data, it is not transactional).
After having a couple of people look at it we are mulling over a few different options.
Using OPTION (FORCE ORDER) and a few MERGE JOIN hints to get the optimizer to process the data more efficiently (at least on the data that has been tested).
Using temp tables to break up the query so the optimizer isn't dealing with a very large query which is allowing it process it more efficiently.
We do not really have the option of doing a major schema change or anything, tuning the query is kind of the rallying point for this issue.
The query hints option is performing a little better than the other option, but both options are acceptable in terms of performance at this point.
So the question is, which would you prefer? The query hints are viewed as slightly dangerous because it we are overriding the optimizer etc. The temp table solution needs to write out to the tempdb etc.
In the past we have been able to see large performance gains using temp tables on our larger reporting queries but that has generally been for queries that are run less frequently than this query.
if you have exhausted optimizing via indexes and removed non-SARGABLE sql then I recommend going for the temp tables option:
temp tables provide repeatable performance, provided they do not put excessive pressure on the tempdb in terms of size increase and performance - you will need to monitor those
sql hints may stop being effective because of other table/index changes in the future
remember to clean up temp tables when you are finished.
Can anyone break it down in plain English the performance difference between using temp tables vs. CTE's vs. table variables in MSSQL. I have used temporary tables quite frequently and have started using CTE's just because of the clear syntax but I have found them to be slower. I think that temp tables are using system memory and that is why they seem fast but may be a bottleneck if trying to do multiple jobs. Table variables I have used sparingly and do not know a great deal about. Looking for some advice from the guru's out there!
This question is well covered in Books Online, MSDN and this site.
About temp tables and table variables you can read here What's the difference between a temp table and table variable in SQL Server?.
There you will find that in many cases temp tables cause recompilation of a procedure which is their main disadvantage.
CTEs are well described here http://blogs.msdn.com/b/craigfr/archive/2007/10/18/ctes-common-table-expressions.aspx
CTEs are performance-neutral. They simplify a query for the developer by abstracting out SQL statements - usually complicated JOINs or built-in functions applied to fields. The database engine just in-lines the CTE into the query that uses it. So, the CTE itself isn't "slow", but you may find you are having better performance with temp tables because the database engine is creating better query plans on the queries using the temp tables.
This question was answered here and here.
Briefly, this is a different tools fo different tasks.
Table variables can lead to fewer stored procedure recompilations than temporary tables
A temp table is good for re-use or to perform multiple processing passes on a set of data
I have to write a query with lot of computations.
Is it a good idea to create indexed view with this computed columns instead of writing a stored proc?
It depends!
If you create an indexed view you'll be trading increased costs in terms of greaer storage space requirements and slower inserts, updates & deletes for increased speed of accessing these computed values. If you only want to use these values once or occassionally, you might be better off computing them on demand in a SP but, like I said, it depends!
There are other factors to consider to, including: over how many records do these computations need to execued? If it's just a few, the indexed view approach may not be appropriate because it may affect all rows unless you limit it with sutable WHERE/HAVING clauses - remember that an indexed view isn't parameterised.
I would only suggest an indexed view when you are dealing with a very high amount of records found in places like Datawarehouse or Datamarts. Here you will be doing much more reading of the data then writing to it so the index will help you more than hurt you.