SQL Server stored procedure intermediate tables - sql-server

In SQL Server 2005, I have a query that involves a bunch of large-ish joins (each table is on the order of a few thousand rows to a few million rows, and the tables average probably the equivalent of 10-15 columns of integers and datetimes.
To make the query faster, I am thinking about splitting up the one big query into a stored procedure that does a couple of the joins, stores that result in some temporary table, and then joins that temporary table with another temporary table that was also the result of a few joins.
I am currently using table variables to store the intermediate tables, and the performance one off is noticeably better. But in production, tempdb seems to be having an IO bottleneck.
Is there a better way to think about solving such a problem? I mean, is using table variables way off base here?

Table Variables can take up quite a lot of memory in TempDB.
In large production environments, I have seen better SQL coders than I utilize standard tables for this purpose; They are essentially temp tables, but they create them as regular tables and give them a special prefix or suffix. This has the added benefit (as with temp tables) of being able to utilize indexes to help with execution.
If you can use a standard table or utilize a temp table that is accessed by all steps of your complex execution, you may be able to resolve your memory problem.
Think of it as a place to cache data. In fact, you can update this "cache" each time your main stored procedure runs, just make sure to utilize appropriate transactions and locking.
Imagine the alternative -- If you use an enormous table variable in a stored procedure, and the stored procedure is executed 10 or 20 times simultaneously... that table variable may no longer be living just in memory.

Related

use same table multiple times in SP performance

I have a very long lines of code in SP that performs poorly. This sp requires me to validate data from multiple select statement with the same table multiple times.
Is it a good idea to dump data from physical table into temp table first or is it ok to reference it multiple times in multiple select statement within the same SP?
Per your description , you would like to improve the performance. Could you please show us your script of your SP and your execution plan? So that we have a right direction and make some test.
There are some simple yet useful tips and optimization to improve stored procedure performance.
Use SET NOCOUNT ON
Use fully qualified procedure name
sp_executesql instead of Execute for dynamic queries
Using IF EXISTS AND SELECT
Avoid naming user stored procedure as sp_procedurename.
Use set based queries wherever possible.
Keep transaction short and crisp
For more details , you can refer to it : https://www.sqlservergeeks.com/improve-stored-procedure-performance-in-sql-server/
If you think it does not satisfy your requirement, please share us more information.
Best Regards,
Rachel
Is it a good idea to dump data from physical table into temp table first or is it ok to reference it multiple times in multiple select statement within the same SP?
If it is a local temp table, each session using this stored procedure will create a separate temp table for themselves, although it will reduce the weight on original table, but it will increase the usage of memory and tempdb.
If it is a global temp table, we can only create one for all session, then we will need to create one manually before someone using it and then delete it if it is useless.
For me, I will use the Indexed Views, https://learn.microsoft.com/en-us/sql/relational-databases/views/create-indexed-views?view=sql-server-2017
It's hard to answer without the detail. However with such a large SP and such a small table it is likely that a particular select or join is slow rather than just repeatedly hitting the table (SQL server is perfectly happy to cache bits of tables or indexes in memory).
If possible can you get the execution plan of each part of the SP? or log some timings? or run each bit with statistics on?
That will tell you which bit is slow and we can help you fix it.

What is that makes temp tables more efficient than table variables when working with large data?

In SQL Server, the performance of temp tables is much better (in the means of time) compared to table variables when working with large data (say inserting or updating 100000 rows) (reference: SQL Server Temp Table vs Table Variable Performance Testing)
I've seen many articles comparing temp table and table variable, but still don't get what exactly makes temp tables more efficient when working with large data? Is it just how they are designed to behave or anything else?
Table variables don't have statistics, so cardinality estimation of table variable is 1.
You can force at least correct cardinality estimation using recompile option, but in no way can you produce column statistics, i.e. there is no data distribution of column values that exists for temporary tables.
The consequences are evident: every query that uses table variable will have underestimation.
Another con is this one:
Queries that insert into (or otherwise modify) #table_variables cannot
have a parallel plan, #temp_tables are not restricted in this manner.
You can read more on it here:
Parallelism with temp table but not table variable?
The answer in that topic has another link to additional reading that is very helpfull

When should I use a table variable vs temporary table in sql server?

I'm learning more details in table variable. It says that temp tables are always on disk, and table variables are in memory, that is to say, the performance of table variable is better than temp table because table variable uses less IO operations than temp table.
But sometimes, if there are too many records in a table variable that can not be contained in memory, the table variable will be put on disk like the temp table.
But I don't know what the "too many records" is. 100,000 records? or 1000,000 records? How can I know if a table variable I'm using is in memory or is on disk? Is there any function or tool in SQL Server 2005 to measure the scale of the table variable or letting me know when the table variable is put on disk from memory?
Your question shows you have succumbed to some of the common misconceptions surrounding table variables and temporary tables.
I have written quite an extensive answer on the DBA site looking at the differences between the two object types. This also addresses your question about disk vs memory (I didn't see any significant difference in behaviour between the two).
Regarding the question in the title though as to when to use a table variable vs a local temporary table you don't always have a choice. In functions, for example, it is only possible to use a table variable and if you need to write to the table in a child scope then only a #temp table will do
(table-valued parameters allow readonly access).
Where you do have a choice some suggestions are below (though the most reliable method is to simply test both with your specific workload).
If you need an index that cannot be created on a table variable then you will of course need a #temporary table. The details of this are version dependant however. For SQL Server 2012 and below the only indexes that could be created on table variables were those implicitly created through a UNIQUE or PRIMARY KEY constraint. SQL Server 2014 introduced inline index syntax for a subset of the options available in CREATE INDEX. This has been extended since to allow filtered index conditions. Indexes with INCLUDE-d columns or columnstore indexes are still not possible to create on table variables however.
If you will be repeatedly adding and deleting large numbers of rows from the table then use a #temporary table. That supports TRUNCATE (which is more efficient than DELETE for large tables) and additionally subsequent inserts following a TRUNCATE can have better performance than those following a DELETE as illustrated here.
If you will be deleting or updating a large number of rows then the temp table may well perform much better than a table variable - if it is able to use rowset sharing (see "Effects of rowset sharing" below for an example).
If the optimal plan using the table will vary dependent on data then use a #temporary table. That supports creation of statistics which allows the plan to be dynamically recompiled according to the data (though for cached temporary tables in stored procedures the recompilation behaviour needs to be understood separately).
If the optimal plan for the query using the table is unlikely to ever change then you may consider a table variable to skip the overhead of statistics creation and recompiles (would possibly require hints to fix the plan you want).
If the source for the data inserted to the table is from a potentially expensive SELECT statement then consider that using a table variable will block the possibility of this using a parallel plan.
If you need the data in the table to survive a rollback of an outer user transaction then use a table variable. A possible use case for this might be logging the progress of different steps in a long SQL batch.
When using a #temp table within a user transaction locks can be held longer than for table variables (potentially until the end of transaction vs end of statement dependent on the type of lock and isolation level) and also it can prevent truncation of the tempdb transaction log until the user transaction ends. So this might favour the use of table variables.
Within stored routines, both table variables and temporary tables can be cached. The metadata maintenance for cached table variables is less than that for #temporary tables. Bob Ward points out in his tempdb presentation that this can cause additional contention on system tables under conditions of high concurrency. Additionally, when dealing with small quantities of data this can make a measurable difference to performance.
Effects of rowset sharing
DECLARE #T TABLE(id INT PRIMARY KEY, Flag BIT);
CREATE TABLE #T (id INT PRIMARY KEY, Flag BIT);
INSERT INTO #T
output inserted.* into #T
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY ##SPID), 0
FROM master..spt_values v1, master..spt_values v2
SET STATISTICS TIME ON
/*CPU time = 7016 ms, elapsed time = 7860 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 6234 ms, elapsed time = 7236 ms.*/
DELETE FROM #T
/* CPU time = 828 ms, elapsed time = 1120 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 672 ms, elapsed time = 980 ms.*/
DELETE FROM #T
DROP TABLE #T
Use a table variable if for a very small quantity of data (thousands of bytes)
Use a temporary table for a lot of data
Another way to think about it: if you think you might benefit from an index, automated statistics, or any SQL optimizer goodness, then your data set is probably too large for a table variable.
In my example, I just wanted to put about 20 rows into a format and modify them as a group, before using them to UPDATE / INSERT a permanent table. So a table variable is perfect.
But I am also running SQL to back-fill thousands of rows at a time, and I can definitely say that the temporary tables perform much better than table variables.
This is not unlike how CTE's are a concern for a similar size reason - if the data in the CTE is very small, I find a CTE performs as good as or better than what the optimizer comes up with, but if it is quite large then it hurts you bad.
My understanding is mostly based on http://www.developerfusion.com/article/84397/table-variables-v-temporary-tables-in-sql-server/, which has a lot more detail.
Microsoft says here
Table variables does not have distribution statistics, they will not trigger recompiles. Therefore, in many cases, the optimizer will build a query plan on the assumption that the table variable has no rows. For this reason, you should be cautious about using a table variable if you expect a larger number of rows (greater than 100). Temp tables may be a better solution in this case.
I totally agree with Abacus (sorry - don't have enough points to comment).
Also, keep in mind it doesn't necessarily come down to how many records you have, but the size of your records.
For instance, have you considered the performance difference between 1,000 records with 50 columns each vs 100,000 records with only 5 columns each?
Lastly, maybe you're querying/storing more data than you need? Here's a good read on SQL optimization strategies. Limit the amount of data you're pulling, especially if you're not using it all (some SQL programmers do get lazy and just select everything even though they only use a tiny subset). Don't forget the SQL query analyzer may also become your best friend.
Variable table is available only to the current session, for example, if you need to EXEC another stored procedure within the current one you will have to pass the table as Table Valued Parameter and of course this will affect the performance, with temporary tables you can do this with only passing the temporary table name
To test a Temporary table:
Open management studio query editor
Create a temporary table
Open another query editor window
Select from this table "Available"
To test a Variable table:
Open management studio query editor
Create a Variable table
Open another query editor window
Select from this table "Not Available"
something else I have experienced is: If your schema doesn't have GRANT privilege to create tables then use variable tables.
writing data in tables declared declare #tb and after joining with other tables, I realized that the response time compared to temporary tables tempdb .. # tb is much higher.
When I join them with #tb the time is much longer to return the result, unlike #tm, the return is almost instantaneous.
I did tests with a 10,000 rows join and join with 5 other tables

SQL Server Multi-statement UDF - way to store data temporarily required

I have a relatively complex query, with several self joins, which works on a rather large table.
For that query to perform faster, I thus need to only work with a subset of the data.
Said subset of data can range between 12 000 and 120 000 rows depending on the parameters passed.
More details can be found here: SQL Server CTE referred in self joins slow
As you can see, I was using a CTE to return the data subset before, which caused some performance problems as SQL Server was re-running the Select statement in the CTE for every join instead of simply being run once and reusing its data set.
The alternative, using temporary tables worked much faster (while testing the query in a separate window outside the UDF body).
However, when I tried to implement this in a multi-statement UDF, I was harshly reminded by SQL Server that multi-statement UDFs do not support temporary tables for some reason...
UDFs do allow table variables however, so I tried that, but the performance is absolutely horrible as it takes 1m40 for my query to complete whereas the CTE version only took 40 seconds.
I believe the table variables is slow for reasons listed in this thread: Table variable poor performance on insert in SQL Server Stored Procedure
Temporary table version takes around 1 seconds, but I can't make it into a function due to the SQL Server restrictions, and I have to return a table back to the caller.
Considering that CTE and table variables are both too slow, and that temporary tables are rejected in UDFs, What are my options in order for my UDF to perform quickly?
Thanks a lot in advance.
In many such cases all we need to do is to declare primary keys for those table variables, and it is fast again.
Set up and use a Process-Keyed Table, See the article: from How to Share Data Between Stored Procedures by Erland Sommarskog
One kludgey work-around I've used involves code like so (psuedo code follows):
CREATE TEMP TABLE #foo
EXECUTE MyStoredProcedure
SELECT *
from #foo
GO
-- Stored procedure definition
CREATE PROCEDURE MyStoredProcedure
AS
INSERT #foo values (whatever)
RETURN
GO
In short, the stored procedure references and uses a temp table created by the calling procedure (or routine). This will work, but it can be confusing for others to follow what's going on if you don't document it clearly, and you will get recompiles, statistics recalcs, and other oddness that may consume unwanted clock cycles.

SQL Server 2000 temp table vs table variable

What would be more efficient in storing some temp data (50k rows in one and 50k in another) to perform come calculation. I'll be doing this process once, nightly.
How do you check the efficiency when comparing something like this?
The results will vary on which will be easier to store the data, in disk (#temp) or in memory (#temp).
A few excerpts from the references below
A temporary table is created and populated on disk, in the system database tempdb.
A table variable is created in memory, and so performs slightly better than #temp tables (also because there is even less locking and logging in a table variable). A table variable might still perform I/O to tempdb (which is where the performance issues of #temp tables make themselves apparent), though the documentation is not very explicit about this.
Table variables result in fewer recompilations of a stored procedure as compared to temporary tables.
[Y]ou can create indexes on the temporary table to increase query performance.
Regarding your specific case with 50k rows:
As your data size gets larger, and/or the repeated use of the temporary data increases, you will find that the use of #temp tables makes more sense
References:
Should I use a #temp table or a #table variable?
MSKB 305977 - SQL Server 2000 - Table Variables
There can be a big performance difference between using table variables and temporary tables. In most cases, temporary tables are faster than table variables. I took the following tip from the private SQL Server MVP newsgroup and received permission from Microsoft to share it with you. One MVP noticed that although queries using table variables didn't generate parallel query plans on a large SMP box, similar queries using temporary tables (local or global) and running under the same circumstances did generate parallel plans.
More from SQL Mag (subscription required unfortunately, I'll try and find more resources momentarily)
EDIT: Here is some more in depth information from CodeProject

Resources