I am wondering, when selecting rows and inserting them into a temp table, is the data actually copied or just referenced?
For example:
SELECT * INTO #Temp FROM SomeTable
If the table is very large, is this going to be a costly operation?
From my tests it seems to execute about as fast as a simple SELECT, but I'd like a better insight about how it actually works.
Cheers.
Temporary tables are allocated in tempdb. SQL server will generally try to keep the tempdb pages in memory, but large tables may end up written out to disk.
And yes, the data is always copied. So, for instance, if an UPDATE occurs on another connection between selecting into your temporary table and a later usage, the temporary table will contain the old value(s)
Related
Does doing a
SELECT * FROM (SELECT foo.id, bar.name FROM foo LEFT JOIN bar ON bar.foo_id = foo.id)
\--------------- Will this be a temp table? --------------------------/
create the same type of temporary table as declaring a table with # does in a stored procedure? Or does it create a view or perhaps some other magic?
A quick search on temporary tables only showed them being used in stored procedures.
it does yes, but internally only.
you wouldn't have access to it after the whole query is executed.
Temp table will be created in memory then as said, lost. For complex sub queries this will obviously put some strain on memory, but as with most things, memory is faster than disk.
You could do nested queries as individual ones, using a temp table (on disk) to then perform another query, eventually dropping the table.
In your example, the outer query does nothing. So the query optimizer will optimize it away.
You can see the flow of result sets in the query execution plan. For example, a loop join results in a temporary structure that contains the resulting rows. The temporary structure could be stored in tempdb much like a temporary table is. But it won't be visible to you, and it will be deallocated after the query completes.
I'm learning more details in table variable. It says that temp tables are always on disk, and table variables are in memory, that is to say, the performance of table variable is better than temp table because table variable uses less IO operations than temp table.
But sometimes, if there are too many records in a table variable that can not be contained in memory, the table variable will be put on disk like the temp table.
But I don't know what the "too many records" is. 100,000 records? or 1000,000 records? How can I know if a table variable I'm using is in memory or is on disk? Is there any function or tool in SQL Server 2005 to measure the scale of the table variable or letting me know when the table variable is put on disk from memory?
Your question shows you have succumbed to some of the common misconceptions surrounding table variables and temporary tables.
I have written quite an extensive answer on the DBA site looking at the differences between the two object types. This also addresses your question about disk vs memory (I didn't see any significant difference in behaviour between the two).
Regarding the question in the title though as to when to use a table variable vs a local temporary table you don't always have a choice. In functions, for example, it is only possible to use a table variable and if you need to write to the table in a child scope then only a #temp table will do
(table-valued parameters allow readonly access).
Where you do have a choice some suggestions are below (though the most reliable method is to simply test both with your specific workload).
If you need an index that cannot be created on a table variable then you will of course need a #temporary table. The details of this are version dependant however. For SQL Server 2012 and below the only indexes that could be created on table variables were those implicitly created through a UNIQUE or PRIMARY KEY constraint. SQL Server 2014 introduced inline index syntax for a subset of the options available in CREATE INDEX. This has been extended since to allow filtered index conditions. Indexes with INCLUDE-d columns or columnstore indexes are still not possible to create on table variables however.
If you will be repeatedly adding and deleting large numbers of rows from the table then use a #temporary table. That supports TRUNCATE (which is more efficient than DELETE for large tables) and additionally subsequent inserts following a TRUNCATE can have better performance than those following a DELETE as illustrated here.
If you will be deleting or updating a large number of rows then the temp table may well perform much better than a table variable - if it is able to use rowset sharing (see "Effects of rowset sharing" below for an example).
If the optimal plan using the table will vary dependent on data then use a #temporary table. That supports creation of statistics which allows the plan to be dynamically recompiled according to the data (though for cached temporary tables in stored procedures the recompilation behaviour needs to be understood separately).
If the optimal plan for the query using the table is unlikely to ever change then you may consider a table variable to skip the overhead of statistics creation and recompiles (would possibly require hints to fix the plan you want).
If the source for the data inserted to the table is from a potentially expensive SELECT statement then consider that using a table variable will block the possibility of this using a parallel plan.
If you need the data in the table to survive a rollback of an outer user transaction then use a table variable. A possible use case for this might be logging the progress of different steps in a long SQL batch.
When using a #temp table within a user transaction locks can be held longer than for table variables (potentially until the end of transaction vs end of statement dependent on the type of lock and isolation level) and also it can prevent truncation of the tempdb transaction log until the user transaction ends. So this might favour the use of table variables.
Within stored routines, both table variables and temporary tables can be cached. The metadata maintenance for cached table variables is less than that for #temporary tables. Bob Ward points out in his tempdb presentation that this can cause additional contention on system tables under conditions of high concurrency. Additionally, when dealing with small quantities of data this can make a measurable difference to performance.
Effects of rowset sharing
DECLARE #T TABLE(id INT PRIMARY KEY, Flag BIT);
CREATE TABLE #T (id INT PRIMARY KEY, Flag BIT);
INSERT INTO #T
output inserted.* into #T
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY ##SPID), 0
FROM master..spt_values v1, master..spt_values v2
SET STATISTICS TIME ON
/*CPU time = 7016 ms, elapsed time = 7860 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 6234 ms, elapsed time = 7236 ms.*/
DELETE FROM #T
/* CPU time = 828 ms, elapsed time = 1120 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 672 ms, elapsed time = 980 ms.*/
DELETE FROM #T
DROP TABLE #T
Use a table variable if for a very small quantity of data (thousands of bytes)
Use a temporary table for a lot of data
Another way to think about it: if you think you might benefit from an index, automated statistics, or any SQL optimizer goodness, then your data set is probably too large for a table variable.
In my example, I just wanted to put about 20 rows into a format and modify them as a group, before using them to UPDATE / INSERT a permanent table. So a table variable is perfect.
But I am also running SQL to back-fill thousands of rows at a time, and I can definitely say that the temporary tables perform much better than table variables.
This is not unlike how CTE's are a concern for a similar size reason - if the data in the CTE is very small, I find a CTE performs as good as or better than what the optimizer comes up with, but if it is quite large then it hurts you bad.
My understanding is mostly based on http://www.developerfusion.com/article/84397/table-variables-v-temporary-tables-in-sql-server/, which has a lot more detail.
Microsoft says here
Table variables does not have distribution statistics, they will not trigger recompiles. Therefore, in many cases, the optimizer will build a query plan on the assumption that the table variable has no rows. For this reason, you should be cautious about using a table variable if you expect a larger number of rows (greater than 100). Temp tables may be a better solution in this case.
I totally agree with Abacus (sorry - don't have enough points to comment).
Also, keep in mind it doesn't necessarily come down to how many records you have, but the size of your records.
For instance, have you considered the performance difference between 1,000 records with 50 columns each vs 100,000 records with only 5 columns each?
Lastly, maybe you're querying/storing more data than you need? Here's a good read on SQL optimization strategies. Limit the amount of data you're pulling, especially if you're not using it all (some SQL programmers do get lazy and just select everything even though they only use a tiny subset). Don't forget the SQL query analyzer may also become your best friend.
Variable table is available only to the current session, for example, if you need to EXEC another stored procedure within the current one you will have to pass the table as Table Valued Parameter and of course this will affect the performance, with temporary tables you can do this with only passing the temporary table name
To test a Temporary table:
Open management studio query editor
Create a temporary table
Open another query editor window
Select from this table "Available"
To test a Variable table:
Open management studio query editor
Create a Variable table
Open another query editor window
Select from this table "Not Available"
something else I have experienced is: If your schema doesn't have GRANT privilege to create tables then use variable tables.
writing data in tables declared declare #tb and after joining with other tables, I realized that the response time compared to temporary tables tempdb .. # tb is much higher.
When I join them with #tb the time is much longer to return the result, unlike #tm, the return is almost instantaneous.
I did tests with a 10,000 rows join and join with 5 other tables
What is best practice for handling the dropping of a temp table. I have read that you should explicitly handle the drop and also that sql server should handle the drop....what is the correct method? I was always under the impression that you should do your own clean up of the temp tables you create in a sproc, etc. But, then I found other bits that suggest otherwise.
Any insight would be greatly appreciated. I am just concerned I am not following best practice with the temp tables I create.
Thanks,
S
My view is, first see if you really need a temp table - or - can you make do with a Common Table Expression (CTE). Second, I would always drop my temp tables. Sometimes you need to have a temp table scoped to the connection (e.g. ##temp), so if you run the query a second time, and you have explicit code to create the temp table, you'll get an error that says the table already exists. Cleaning up after yourself is ALWAYS a good software practice.
EDIT: 03-Nov-2021
Another alternative is a TABLE variable, which will fall out of scope once the query completes:
DECLARE #MyTable AS TABLE (
MyID INT,
MyText NVARCHAR(256)
)
INSERT INTO
#MyTable
VALUES
(1, 'One'),
(2, 'Two'),
(3, 'Three')
SELECT
*
FROM
#MyTable
CREATE TABLE (Transact-SQL)
Temporary tables are automatically dropped when they go out of scope, unless explicitly dropped by using DROP TABLE:
A local temporary table created in a stored procedure is dropped automatically when the stored procedure is finished. The table can be referenced by any nested stored procedures executed by the stored procedure that created the table. The table cannot be referenced by the process that called the stored procedure that created the table.
All other local temporary tables are dropped automatically at the end of the current session.
Global temporary tables are automatically dropped when the session that created the table ends and all other tasks have stopped referencing them. The association between a task and a table is maintained only for the life of a single Transact-SQL statement. This means that a global temporary table is dropped at the completion of the last Transact-SQL statement that was actively referencing the table when the creating session ended.
I used to fall into the crowd of letting the objects get cleaned up by background server processes, however, recently having issues with extreme TempDB log file growth has changed my opinion. I'm not sure if this has always been the case with every version of SQL Server, but since moving to SQL 2016 and putting the drives on a PureStorage SSD array, things run a bit differently. Processes are typically CPU bound rather than I/O bound, and explicitly dropping the temp objects results in no issues with log growth. While I haven't dug in too deeply as to why, I suspect it's not unlike garbage collection in the .NET world where it's synchronous when called explicitly and asynchronous when left to the system. This would matter because the explicit drop would release the storage in the log file, and make it available at the next log backup, whereas this appears to not be the case when not explicitly dropping the object. On most systems this is likely not a big issue, but on a system supporting a high volume ERP and web storefront with many concurrent transactions, and heavy TempDB use, it has had a big impact. As for why to create the TempDB objects in the first place, with the amount of data in most of the queries, it would spill over into TempDB storage anyway, so it's usually more efficient to create the object with the necessary indexes rather than let the system handle it automatically.
In a multi-threaded scenario where each thread creates its own set of tables and the number of threads is throttled, not dropping your own tables means that the governor will consider your thread done and spawn more threads... however the temp tables are still around (and thus the connections to the server) thus you'll exceed the limits of your governor. if you manually drop the temp tables then the thread doesn't finish until they've been dropped and no new threads are spawned, thus maintaining the governor's ability to keep from overwhelming the SQL engine
As per my view. No need to drop temp tables explicitly. SQL server will handle to drop temp tables stored in temp db in case of shorage of space to process query.
I have a doubt Why Should we use temporary table is there any special thing in temporary table and where should we use the temporary tables. Can you please explain me or any reference thank you.
When writing T-SQL code, you often need a table in which to store data temporarily when it comes time to execute that code. You have four table options: normal tables, local temporary tables, global temporary tables and table variables. Each of the four table options has its own purpose and use, and each has its benefits and issues:
* Normal tables are exactly that, physical tables defined in your database.
* Local temporary tables are temporary tables that are available only to the session that created them. These tables are automatically destroyed at the termination of the procedure or session that created them.
* Global temporary tables are temporary tables that are available to all sessions and all users. They are dropped automatically when the last session using the temporary table has completed. Both local temporary tables and global temporary tables are physical tables created within the tempdb database.
* Table variables are stored within memory but are laid out like a table. Table variables are partially stored on disk and partially stored in memory. It's a common misconception that table variables are stored only in memory. Because they are partially stored in memory, the access time for a table variable can be faster than the time it takes to access a temporary table.
Which one to use:
* If you have less than 100 rows generally use a table variable. Otherwise use a temporary table. This is because SQL Server won't create statistics on table variables.
* If you need to create indexes on it then you must use a temporary table.
* When using temporary tables always create them and create any indexes and then use them. This will help reduce recompilations. The impact of this is reduced starting in SQL Server 2005 but it's still a good idea.
Please refer this page
http://www.sqlteam.com/article/temporary-tables
There are many uses for temporary tables. They can be very useful in handling data in complex queries. Your question is vague, and does not really have an answer, but I am linking to some temporary table documentation.
They have most of the same capabilities of table, including constraints and indexing. They can be global or limited to current scope. They can also be inefficient, so be cautious as always.
http://www.sqlservercentral.com/articles/T-SQL/temptablesinsqlserver/1279/
http://msdn.microsoft.com/en-us/library/aa258255%28SQL.80%29.aspx
Temporary tables can be created at runtime and can do the all kinds of operations that one normal table can do. But, based on the table types, the scope is limited. These tables are created inside tempdb database.
SQL Server provides two types of temp tables based on the behavior and scope of the table. These are:
•Local Temp Table
•Global Temp Table
Local Temp Table
Local temp tables are only available to the current connection for the user; and they are automatically deleted when the user disconnects from instances. Local temporary table name is stared with hash ("#") sign.
Global Temp Table
Global Temporary tables name starts with a double hash ("##"). Once this table has been created by a connection, like a permanent table it is then available to any user by any connection. It can only be deleted once all connections have been closed.
When to Use Temporary Tables?
•When we are doing large number of row manipulation in stored procedures.
•This is useful to replace the cursor. We can store the result set data into a temp table, then we can manipulate the data from there.
•When we are having a complex join operation.
Points to Remember Before Using Temporary Tables -
•Temporary table created on tempdb of SQL Server. This is a separate database. So, this is an additional overhead and can causes performance issues.
•Number of rows and columns need to be as minimum as needed.
•Tables need to be deleted when they are done with their work.
Alternative Approach: Table Variable-
Alternative of Temporary table is the Table variable which can do all kinds of operations that we can perform in Temp table. Below is the syntax for using Table variable.
When to Use Table Variable Over Temp Table -
Tablevariable is always useful for less data. If the result set returns a large number of records, we need to go for temp table.
What would be more efficient in storing some temp data (50k rows in one and 50k in another) to perform come calculation. I'll be doing this process once, nightly.
How do you check the efficiency when comparing something like this?
The results will vary on which will be easier to store the data, in disk (#temp) or in memory (#temp).
A few excerpts from the references below
A temporary table is created and populated on disk, in the system database tempdb.
A table variable is created in memory, and so performs slightly better than #temp tables (also because there is even less locking and logging in a table variable). A table variable might still perform I/O to tempdb (which is where the performance issues of #temp tables make themselves apparent), though the documentation is not very explicit about this.
Table variables result in fewer recompilations of a stored procedure as compared to temporary tables.
[Y]ou can create indexes on the temporary table to increase query performance.
Regarding your specific case with 50k rows:
As your data size gets larger, and/or the repeated use of the temporary data increases, you will find that the use of #temp tables makes more sense
References:
Should I use a #temp table or a #table variable?
MSKB 305977 - SQL Server 2000 - Table Variables
There can be a big performance difference between using table variables and temporary tables. In most cases, temporary tables are faster than table variables. I took the following tip from the private SQL Server MVP newsgroup and received permission from Microsoft to share it with you. One MVP noticed that although queries using table variables didn't generate parallel query plans on a large SMP box, similar queries using temporary tables (local or global) and running under the same circumstances did generate parallel plans.
More from SQL Mag (subscription required unfortunately, I'll try and find more resources momentarily)
EDIT: Here is some more in depth information from CodeProject