What would be more efficient in storing some temp data (50k rows in one and 50k in another) to perform come calculation. I'll be doing this process once, nightly.
How do you check the efficiency when comparing something like this?
The results will vary on which will be easier to store the data, in disk (#temp) or in memory (#temp).
A few excerpts from the references below
A temporary table is created and populated on disk, in the system database tempdb.
A table variable is created in memory, and so performs slightly better than #temp tables (also because there is even less locking and logging in a table variable). A table variable might still perform I/O to tempdb (which is where the performance issues of #temp tables make themselves apparent), though the documentation is not very explicit about this.
Table variables result in fewer recompilations of a stored procedure as compared to temporary tables.
[Y]ou can create indexes on the temporary table to increase query performance.
Regarding your specific case with 50k rows:
As your data size gets larger, and/or the repeated use of the temporary data increases, you will find that the use of #temp tables makes more sense
References:
Should I use a #temp table or a #table variable?
MSKB 305977 - SQL Server 2000 - Table Variables
There can be a big performance difference between using table variables and temporary tables. In most cases, temporary tables are faster than table variables. I took the following tip from the private SQL Server MVP newsgroup and received permission from Microsoft to share it with you. One MVP noticed that although queries using table variables didn't generate parallel query plans on a large SMP box, similar queries using temporary tables (local or global) and running under the same circumstances did generate parallel plans.
More from SQL Mag (subscription required unfortunately, I'll try and find more resources momentarily)
EDIT: Here is some more in depth information from CodeProject
Related
In SQL Server, the performance of temp tables is much better (in the means of time) compared to table variables when working with large data (say inserting or updating 100000 rows) (reference: SQL Server Temp Table vs Table Variable Performance Testing)
I've seen many articles comparing temp table and table variable, but still don't get what exactly makes temp tables more efficient when working with large data? Is it just how they are designed to behave or anything else?
Table variables don't have statistics, so cardinality estimation of table variable is 1.
You can force at least correct cardinality estimation using recompile option, but in no way can you produce column statistics, i.e. there is no data distribution of column values that exists for temporary tables.
The consequences are evident: every query that uses table variable will have underestimation.
Another con is this one:
Queries that insert into (or otherwise modify) #table_variables cannot
have a parallel plan, #temp_tables are not restricted in this manner.
You can read more on it here:
Parallelism with temp table but not table variable?
The answer in that topic has another link to additional reading that is very helpfull
there are many threads about temporary tables vs table variables.
However, my question is, what makes temporary table much faster than normal table in Microsoft SQL Server?
My stored procedure runs 5 times as fast if I insert to temp table, and then just move data out of it into normal table.
The most obvious answer would be keys/indexs, but the normal table is without any keys at all.
What else would make it run faster?
Thanks
Several possibilities:
TempDB could be on faster I/O
Query could be parallel and taking advantage of multiple tempdb data files if you've configured it to do so
TempDB could have room for the inserts while the user database has to wait for an autogrow
User database could have constraints on logging as well for similar reasons
User database might have to "wake up" - e.g. if it has AutoClose set to ON
I'm learning more details in table variable. It says that temp tables are always on disk, and table variables are in memory, that is to say, the performance of table variable is better than temp table because table variable uses less IO operations than temp table.
But sometimes, if there are too many records in a table variable that can not be contained in memory, the table variable will be put on disk like the temp table.
But I don't know what the "too many records" is. 100,000 records? or 1000,000 records? How can I know if a table variable I'm using is in memory or is on disk? Is there any function or tool in SQL Server 2005 to measure the scale of the table variable or letting me know when the table variable is put on disk from memory?
Your question shows you have succumbed to some of the common misconceptions surrounding table variables and temporary tables.
I have written quite an extensive answer on the DBA site looking at the differences between the two object types. This also addresses your question about disk vs memory (I didn't see any significant difference in behaviour between the two).
Regarding the question in the title though as to when to use a table variable vs a local temporary table you don't always have a choice. In functions, for example, it is only possible to use a table variable and if you need to write to the table in a child scope then only a #temp table will do
(table-valued parameters allow readonly access).
Where you do have a choice some suggestions are below (though the most reliable method is to simply test both with your specific workload).
If you need an index that cannot be created on a table variable then you will of course need a #temporary table. The details of this are version dependant however. For SQL Server 2012 and below the only indexes that could be created on table variables were those implicitly created through a UNIQUE or PRIMARY KEY constraint. SQL Server 2014 introduced inline index syntax for a subset of the options available in CREATE INDEX. This has been extended since to allow filtered index conditions. Indexes with INCLUDE-d columns or columnstore indexes are still not possible to create on table variables however.
If you will be repeatedly adding and deleting large numbers of rows from the table then use a #temporary table. That supports TRUNCATE (which is more efficient than DELETE for large tables) and additionally subsequent inserts following a TRUNCATE can have better performance than those following a DELETE as illustrated here.
If you will be deleting or updating a large number of rows then the temp table may well perform much better than a table variable - if it is able to use rowset sharing (see "Effects of rowset sharing" below for an example).
If the optimal plan using the table will vary dependent on data then use a #temporary table. That supports creation of statistics which allows the plan to be dynamically recompiled according to the data (though for cached temporary tables in stored procedures the recompilation behaviour needs to be understood separately).
If the optimal plan for the query using the table is unlikely to ever change then you may consider a table variable to skip the overhead of statistics creation and recompiles (would possibly require hints to fix the plan you want).
If the source for the data inserted to the table is from a potentially expensive SELECT statement then consider that using a table variable will block the possibility of this using a parallel plan.
If you need the data in the table to survive a rollback of an outer user transaction then use a table variable. A possible use case for this might be logging the progress of different steps in a long SQL batch.
When using a #temp table within a user transaction locks can be held longer than for table variables (potentially until the end of transaction vs end of statement dependent on the type of lock and isolation level) and also it can prevent truncation of the tempdb transaction log until the user transaction ends. So this might favour the use of table variables.
Within stored routines, both table variables and temporary tables can be cached. The metadata maintenance for cached table variables is less than that for #temporary tables. Bob Ward points out in his tempdb presentation that this can cause additional contention on system tables under conditions of high concurrency. Additionally, when dealing with small quantities of data this can make a measurable difference to performance.
Effects of rowset sharing
DECLARE #T TABLE(id INT PRIMARY KEY, Flag BIT);
CREATE TABLE #T (id INT PRIMARY KEY, Flag BIT);
INSERT INTO #T
output inserted.* into #T
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY ##SPID), 0
FROM master..spt_values v1, master..spt_values v2
SET STATISTICS TIME ON
/*CPU time = 7016 ms, elapsed time = 7860 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 6234 ms, elapsed time = 7236 ms.*/
DELETE FROM #T
/* CPU time = 828 ms, elapsed time = 1120 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 672 ms, elapsed time = 980 ms.*/
DELETE FROM #T
DROP TABLE #T
Use a table variable if for a very small quantity of data (thousands of bytes)
Use a temporary table for a lot of data
Another way to think about it: if you think you might benefit from an index, automated statistics, or any SQL optimizer goodness, then your data set is probably too large for a table variable.
In my example, I just wanted to put about 20 rows into a format and modify them as a group, before using them to UPDATE / INSERT a permanent table. So a table variable is perfect.
But I am also running SQL to back-fill thousands of rows at a time, and I can definitely say that the temporary tables perform much better than table variables.
This is not unlike how CTE's are a concern for a similar size reason - if the data in the CTE is very small, I find a CTE performs as good as or better than what the optimizer comes up with, but if it is quite large then it hurts you bad.
My understanding is mostly based on http://www.developerfusion.com/article/84397/table-variables-v-temporary-tables-in-sql-server/, which has a lot more detail.
Microsoft says here
Table variables does not have distribution statistics, they will not trigger recompiles. Therefore, in many cases, the optimizer will build a query plan on the assumption that the table variable has no rows. For this reason, you should be cautious about using a table variable if you expect a larger number of rows (greater than 100). Temp tables may be a better solution in this case.
I totally agree with Abacus (sorry - don't have enough points to comment).
Also, keep in mind it doesn't necessarily come down to how many records you have, but the size of your records.
For instance, have you considered the performance difference between 1,000 records with 50 columns each vs 100,000 records with only 5 columns each?
Lastly, maybe you're querying/storing more data than you need? Here's a good read on SQL optimization strategies. Limit the amount of data you're pulling, especially if you're not using it all (some SQL programmers do get lazy and just select everything even though they only use a tiny subset). Don't forget the SQL query analyzer may also become your best friend.
Variable table is available only to the current session, for example, if you need to EXEC another stored procedure within the current one you will have to pass the table as Table Valued Parameter and of course this will affect the performance, with temporary tables you can do this with only passing the temporary table name
To test a Temporary table:
Open management studio query editor
Create a temporary table
Open another query editor window
Select from this table "Available"
To test a Variable table:
Open management studio query editor
Create a Variable table
Open another query editor window
Select from this table "Not Available"
something else I have experienced is: If your schema doesn't have GRANT privilege to create tables then use variable tables.
writing data in tables declared declare #tb and after joining with other tables, I realized that the response time compared to temporary tables tempdb .. # tb is much higher.
When I join them with #tb the time is much longer to return the result, unlike #tm, the return is almost instantaneous.
I did tests with a 10,000 rows join and join with 5 other tables
In SQL Server 2005, I have a query that involves a bunch of large-ish joins (each table is on the order of a few thousand rows to a few million rows, and the tables average probably the equivalent of 10-15 columns of integers and datetimes.
To make the query faster, I am thinking about splitting up the one big query into a stored procedure that does a couple of the joins, stores that result in some temporary table, and then joins that temporary table with another temporary table that was also the result of a few joins.
I am currently using table variables to store the intermediate tables, and the performance one off is noticeably better. But in production, tempdb seems to be having an IO bottleneck.
Is there a better way to think about solving such a problem? I mean, is using table variables way off base here?
Table Variables can take up quite a lot of memory in TempDB.
In large production environments, I have seen better SQL coders than I utilize standard tables for this purpose; They are essentially temp tables, but they create them as regular tables and give them a special prefix or suffix. This has the added benefit (as with temp tables) of being able to utilize indexes to help with execution.
If you can use a standard table or utilize a temp table that is accessed by all steps of your complex execution, you may be able to resolve your memory problem.
Think of it as a place to cache data. In fact, you can update this "cache" each time your main stored procedure runs, just make sure to utilize appropriate transactions and locking.
Imagine the alternative -- If you use an enormous table variable in a stored procedure, and the stored procedure is executed 10 or 20 times simultaneously... that table variable may no longer be living just in memory.
What are the difference between a #myTable and a declare #myable table
In a stored procedure, you often have
a need for storing a set of data
within the procedure, without
necessarily needing that data to
persist beyond the scope of the
procedure. If you actually need a
table structure, there are basically
four ways you can "store" this data:
local temporary tables (#table_name),
global temporary tables
(##table_name), permanent tables
(table_name), and table variables
(#table_name).
Should I use a #temp table or a #table variable?
Both local and global temporary tables
are physical tables within the tempdb
database, indexes can be created
.Because temp tables are physical
tables, you can also create a primary
key on them via the CREATE TABLE
command or via the ALTER TABLE
command. You can use the ALTER TABLE
command to add any defaults, new
columns, or constraints that you need
to within your code.
Unlike local and global temporary
tables, table variables cannot have
indexes created on them. The exception
is that table variables can have a
primary key defined upon creation
using the DECLARE #variable TABLE
command. This will then create a
clustered or non-clustered index on
the table variable. The CREATE INDEX
command does not recognize table
variables. Therefore, the only index
available to you is the index that
accompanies the primary key and is
created upon table variable
declaration. Also transaction logs are not recorded for the table variables. Hence, they are out of scope of the transaction mechanism
Please see:
TempDB:: Table variable vs local temporary table
Table Variables In T-SQL
It is often said that #table variables are kept in memory as opposed to tempdb; this is not necessarily correct.
Table variables do not have statistics, and this can affect performance in certain situations.
Table variables are all well and good when dealing with relatively small datasets, but be aware that they do not scale well. In particular, a change in behaviour between SQL Server 2000 and SQL Server 2005 resulted in performance dropping through the floor with large datasets.
This was a particularly nasty gotcha for me on one particular occasion with a very complex stored procedure on SQL Server 2000. Research and testing indicated that using table variables was the more performant approach. However, following an upgrade to SQL Server 2008 performance degraded considerably. It took a while to cotton on to the use of table variables as the culprit because all the prior testing etc had ruled out temp tables as being any faster. However, due to this change between SQL Server versions, the opposite was now true and following a significant refactoring, what was taking well into double digit hours to complete started completing in a couple of minutes!
So be aware that there is no definitive answer as to which is best - you need to assess your circumstances, carry out your own testing, and make your decision based on your findings. And always re-evaluate following a server upgrade.
Read this article for more detailed information and sample timings - http://www.sql-server-performance.com/articles/per/temp_tables_vs_variables_p1.aspx
Update: On a separate note, be aware that there is also a third type of temporary table - ##xyz. These are global and visible to all SQL Server connections and not scoped to the current connection like regular temporary tables. They are only dropped when the final connection accessing it is closed.
'#myTable is a temporary table and benefits from being able to have constraints and indexes etc and uses more resources.
#myTable is a table variable that you define as having one or more columns. They use less resources and are scoped to the procedure you use them in.
In most cases where a temp table is used a table variable could be used instead which may offer performance benefits.
The table #tabel1 is a local temporary table stored in tempdb.
##table1 is global temporary table stored in tempdb.
and #table is table variable.
Check the link for their differences