Temp Tables need Unique Index Names - sql-server

I'm working on updating a legacy stored procedure (which calls several other child stored procedures.) Within a transaction, it manipulates data in about a dozen or so tables and performs lots of calculations in the process, sometimes triggering lock escalation up to a table lock. This process could take 20 minutes or more to complete in some cases. Obviously, locking tables for that long is a big no no. So I'm working on a 2-stage plan to the reduce the blocking caused by this sproc in phase 1 and completely rewrite it to be more efficient and not take an inordinate amount of time in phase 2.
In order to reduce the blocking, wherever there is manipulation on the database tables, I plan to move that manipulation into a temporary table. By doing all of the work in temporary table and then updating the real tables with the final results at the very end of the process, I should be able to reduce the time spent blocking other users, significantly. (That's the "quick fix" for phase 1.)
Here's my issue: some of these temp table might have 100,000 rows or more in them while I use them for various calculations. Because of this I would like to generate indexes on the temp tables to keep performance up. And since these are temp tables that are created within a stored procedure, they need to have unique names to avoid errors if multiple users execute the sproc at the same time. I know that I can manually declare the temp tables using CREATE TABLE statements, and if I do that I can specify an index without a name and let SQL Server create the name for me. What I'm hoping to be able to do is use SELECT * INTO to generate the temp table and find another way to get SQL Server to auto-generate index names. I'm sure you're asking "Why?" My company has several changes in store for the system that I'm working with. If I can manage to use the SELECT INTO method, then, if a column gets added or resized or whatever, then there won't be an issue with the developers needing to know that they have to go back into these stored procedures and change their temp table definitions to match. Using SELECT INTO will automatically keep the temp tables matching the layout of the "real" tables.
So, does anyone know of a way to get SQL Server to auto-generate the name for an index on a temp table (aside from doing it as part of the CREATE TABLE syntax)?
Thank you!

And since these are temp tables that are created within a stored procedure, they need to have unique names to avoid errors if multiple users execute the sproc at the same time.
No they don't. Each session will have their own temp tables, and they will be automatically cleaned up.
And indexes don't have global name scope, so each temp table can have the same index names. eg
create procedure TempTest
as
begin
select * into #t from sys.objects
create index foo on #t(name)
waitfor delay '00:00:10'
select * from #t
end
And you can run
exec temptest
go 10
from multiple sessions.

Related

use same table multiple times in SP performance

I have a very long lines of code in SP that performs poorly. This sp requires me to validate data from multiple select statement with the same table multiple times.
Is it a good idea to dump data from physical table into temp table first or is it ok to reference it multiple times in multiple select statement within the same SP?
Per your description , you would like to improve the performance. Could you please show us your script of your SP and your execution plan? So that we have a right direction and make some test.
There are some simple yet useful tips and optimization to improve stored procedure performance.
Use SET NOCOUNT ON
Use fully qualified procedure name
sp_executesql instead of Execute for dynamic queries
Using IF EXISTS AND SELECT
Avoid naming user stored procedure as sp_procedurename.
Use set based queries wherever possible.
Keep transaction short and crisp
For more details , you can refer to it : https://www.sqlservergeeks.com/improve-stored-procedure-performance-in-sql-server/
If you think it does not satisfy your requirement, please share us more information.
Best Regards,
Rachel
Is it a good idea to dump data from physical table into temp table first or is it ok to reference it multiple times in multiple select statement within the same SP?
If it is a local temp table, each session using this stored procedure will create a separate temp table for themselves, although it will reduce the weight on original table, but it will increase the usage of memory and tempdb.
If it is a global temp table, we can only create one for all session, then we will need to create one manually before someone using it and then delete it if it is useless.
For me, I will use the Indexed Views, https://learn.microsoft.com/en-us/sql/relational-databases/views/create-indexed-views?view=sql-server-2017
It's hard to answer without the detail. However with such a large SP and such a small table it is likely that a particular select or join is slow rather than just repeatedly hitting the table (SQL server is perfectly happy to cache bits of tables or indexes in memory).
If possible can you get the execution plan of each part of the SP? or log some timings? or run each bit with statistics on?
That will tell you which bit is slow and we can help you fix it.

When should I use a table variable vs temporary table in sql server?

I'm learning more details in table variable. It says that temp tables are always on disk, and table variables are in memory, that is to say, the performance of table variable is better than temp table because table variable uses less IO operations than temp table.
But sometimes, if there are too many records in a table variable that can not be contained in memory, the table variable will be put on disk like the temp table.
But I don't know what the "too many records" is. 100,000 records? or 1000,000 records? How can I know if a table variable I'm using is in memory or is on disk? Is there any function or tool in SQL Server 2005 to measure the scale of the table variable or letting me know when the table variable is put on disk from memory?
Your question shows you have succumbed to some of the common misconceptions surrounding table variables and temporary tables.
I have written quite an extensive answer on the DBA site looking at the differences between the two object types. This also addresses your question about disk vs memory (I didn't see any significant difference in behaviour between the two).
Regarding the question in the title though as to when to use a table variable vs a local temporary table you don't always have a choice. In functions, for example, it is only possible to use a table variable and if you need to write to the table in a child scope then only a #temp table will do
(table-valued parameters allow readonly access).
Where you do have a choice some suggestions are below (though the most reliable method is to simply test both with your specific workload).
If you need an index that cannot be created on a table variable then you will of course need a #temporary table. The details of this are version dependant however. For SQL Server 2012 and below the only indexes that could be created on table variables were those implicitly created through a UNIQUE or PRIMARY KEY constraint. SQL Server 2014 introduced inline index syntax for a subset of the options available in CREATE INDEX. This has been extended since to allow filtered index conditions. Indexes with INCLUDE-d columns or columnstore indexes are still not possible to create on table variables however.
If you will be repeatedly adding and deleting large numbers of rows from the table then use a #temporary table. That supports TRUNCATE (which is more efficient than DELETE for large tables) and additionally subsequent inserts following a TRUNCATE can have better performance than those following a DELETE as illustrated here.
If you will be deleting or updating a large number of rows then the temp table may well perform much better than a table variable - if it is able to use rowset sharing (see "Effects of rowset sharing" below for an example).
If the optimal plan using the table will vary dependent on data then use a #temporary table. That supports creation of statistics which allows the plan to be dynamically recompiled according to the data (though for cached temporary tables in stored procedures the recompilation behaviour needs to be understood separately).
If the optimal plan for the query using the table is unlikely to ever change then you may consider a table variable to skip the overhead of statistics creation and recompiles (would possibly require hints to fix the plan you want).
If the source for the data inserted to the table is from a potentially expensive SELECT statement then consider that using a table variable will block the possibility of this using a parallel plan.
If you need the data in the table to survive a rollback of an outer user transaction then use a table variable. A possible use case for this might be logging the progress of different steps in a long SQL batch.
When using a #temp table within a user transaction locks can be held longer than for table variables (potentially until the end of transaction vs end of statement dependent on the type of lock and isolation level) and also it can prevent truncation of the tempdb transaction log until the user transaction ends. So this might favour the use of table variables.
Within stored routines, both table variables and temporary tables can be cached. The metadata maintenance for cached table variables is less than that for #temporary tables. Bob Ward points out in his tempdb presentation that this can cause additional contention on system tables under conditions of high concurrency. Additionally, when dealing with small quantities of data this can make a measurable difference to performance.
Effects of rowset sharing
DECLARE #T TABLE(id INT PRIMARY KEY, Flag BIT);
CREATE TABLE #T (id INT PRIMARY KEY, Flag BIT);
INSERT INTO #T
output inserted.* into #T
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY ##SPID), 0
FROM master..spt_values v1, master..spt_values v2
SET STATISTICS TIME ON
/*CPU time = 7016 ms, elapsed time = 7860 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 6234 ms, elapsed time = 7236 ms.*/
DELETE FROM #T
/* CPU time = 828 ms, elapsed time = 1120 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 672 ms, elapsed time = 980 ms.*/
DELETE FROM #T
DROP TABLE #T
Use a table variable if for a very small quantity of data (thousands of bytes)
Use a temporary table for a lot of data
Another way to think about it: if you think you might benefit from an index, automated statistics, or any SQL optimizer goodness, then your data set is probably too large for a table variable.
In my example, I just wanted to put about 20 rows into a format and modify them as a group, before using them to UPDATE / INSERT a permanent table. So a table variable is perfect.
But I am also running SQL to back-fill thousands of rows at a time, and I can definitely say that the temporary tables perform much better than table variables.
This is not unlike how CTE's are a concern for a similar size reason - if the data in the CTE is very small, I find a CTE performs as good as or better than what the optimizer comes up with, but if it is quite large then it hurts you bad.
My understanding is mostly based on http://www.developerfusion.com/article/84397/table-variables-v-temporary-tables-in-sql-server/, which has a lot more detail.
Microsoft says here
Table variables does not have distribution statistics, they will not trigger recompiles. Therefore, in many cases, the optimizer will build a query plan on the assumption that the table variable has no rows. For this reason, you should be cautious about using a table variable if you expect a larger number of rows (greater than 100). Temp tables may be a better solution in this case.
I totally agree with Abacus (sorry - don't have enough points to comment).
Also, keep in mind it doesn't necessarily come down to how many records you have, but the size of your records.
For instance, have you considered the performance difference between 1,000 records with 50 columns each vs 100,000 records with only 5 columns each?
Lastly, maybe you're querying/storing more data than you need? Here's a good read on SQL optimization strategies. Limit the amount of data you're pulling, especially if you're not using it all (some SQL programmers do get lazy and just select everything even though they only use a tiny subset). Don't forget the SQL query analyzer may also become your best friend.
Variable table is available only to the current session, for example, if you need to EXEC another stored procedure within the current one you will have to pass the table as Table Valued Parameter and of course this will affect the performance, with temporary tables you can do this with only passing the temporary table name
To test a Temporary table:
Open management studio query editor
Create a temporary table
Open another query editor window
Select from this table "Available"
To test a Variable table:
Open management studio query editor
Create a Variable table
Open another query editor window
Select from this table "Not Available"
something else I have experienced is: If your schema doesn't have GRANT privilege to create tables then use variable tables.
writing data in tables declared declare #tb and after joining with other tables, I realized that the response time compared to temporary tables tempdb .. # tb is much higher.
When I join them with #tb the time is much longer to return the result, unlike #tm, the return is almost instantaneous.
I did tests with a 10,000 rows join and join with 5 other tables

Explicitly drop temp table or let SQL Server handle it

What is best practice for handling the dropping of a temp table. I have read that you should explicitly handle the drop and also that sql server should handle the drop....what is the correct method? I was always under the impression that you should do your own clean up of the temp tables you create in a sproc, etc. But, then I found other bits that suggest otherwise.
Any insight would be greatly appreciated. I am just concerned I am not following best practice with the temp tables I create.
Thanks,
S
My view is, first see if you really need a temp table - or - can you make do with a Common Table Expression (CTE). Second, I would always drop my temp tables. Sometimes you need to have a temp table scoped to the connection (e.g. ##temp), so if you run the query a second time, and you have explicit code to create the temp table, you'll get an error that says the table already exists. Cleaning up after yourself is ALWAYS a good software practice.
EDIT: 03-Nov-2021
Another alternative is a TABLE variable, which will fall out of scope once the query completes:
DECLARE #MyTable AS TABLE (
MyID INT,
MyText NVARCHAR(256)
)
INSERT INTO
#MyTable
VALUES
(1, 'One'),
(2, 'Two'),
(3, 'Three')
SELECT
*
FROM
#MyTable
CREATE TABLE (Transact-SQL)
Temporary tables are automatically dropped when they go out of scope, unless explicitly dropped by using DROP TABLE:
A local temporary table created in a stored procedure is dropped automatically when the stored procedure is finished. The table can be referenced by any nested stored procedures executed by the stored procedure that created the table. The table cannot be referenced by the process that called the stored procedure that created the table.
All other local temporary tables are dropped automatically at the end of the current session.
Global temporary tables are automatically dropped when the session that created the table ends and all other tasks have stopped referencing them. The association between a task and a table is maintained only for the life of a single Transact-SQL statement. This means that a global temporary table is dropped at the completion of the last Transact-SQL statement that was actively referencing the table when the creating session ended.
I used to fall into the crowd of letting the objects get cleaned up by background server processes, however, recently having issues with extreme TempDB log file growth has changed my opinion. I'm not sure if this has always been the case with every version of SQL Server, but since moving to SQL 2016 and putting the drives on a PureStorage SSD array, things run a bit differently. Processes are typically CPU bound rather than I/O bound, and explicitly dropping the temp objects results in no issues with log growth. While I haven't dug in too deeply as to why, I suspect it's not unlike garbage collection in the .NET world where it's synchronous when called explicitly and asynchronous when left to the system. This would matter because the explicit drop would release the storage in the log file, and make it available at the next log backup, whereas this appears to not be the case when not explicitly dropping the object. On most systems this is likely not a big issue, but on a system supporting a high volume ERP and web storefront with many concurrent transactions, and heavy TempDB use, it has had a big impact. As for why to create the TempDB objects in the first place, with the amount of data in most of the queries, it would spill over into TempDB storage anyway, so it's usually more efficient to create the object with the necessary indexes rather than let the system handle it automatically.
In a multi-threaded scenario where each thread creates its own set of tables and the number of threads is throttled, not dropping your own tables means that the governor will consider your thread done and spawn more threads... however the temp tables are still around (and thus the connections to the server) thus you'll exceed the limits of your governor. if you manually drop the temp tables then the thread doesn't finish until they've been dropped and no new threads are spawned, thus maintaining the governor's ability to keep from overwhelming the SQL engine
As per my view. No need to drop temp tables explicitly. SQL server will handle to drop temp tables stored in temp db in case of shorage of space to process query.

Table variable poor performance on insert in SQL Server Stored Procedure

We are experiencing performance problems using a table variable in a Stored Procedure.
Here is what actually happens :
DECLARE #tblTemp TABLE(iId_company INT)
INSERT INTO #tblTemp(iId_company)
SELECT id FROM .....
The SELECT returns 138 results, but inserting in the TABLE variable takes 1min15 but when I use a temp table with the same SELECT, woops, takes 0sec :
CREATE TABLE #temp (iId_company INT)
INSERT INTO #temp(iId_company)
SELECT id FROM ...
What could cause the behavior ?
Use a temporary table. You will see much better performance.
A detailed explanation for the reasoning behind this is beyond the scope of the initial
question however to summarise:
A table variable is optimized for one
row, by SQL Server i.e. it assumes 1
row will be returned.
A table variable does not create
statistics.
Google temp table Vs. table variable for a wealth of resources and discussions. If you then need specific assistance, fire me an email or contact me on Twitter.
Generally, for smaller sets of data, a table variable should be faster than a temp table. For larger sets of data, performance will fall off because table variables don't support parallelism (see this post).
With that said, I haven't experienced, or found experience with such a small set of data being slower for a table variable vs a temp table.
Not that it should matter but what does your select look like? I had an issue in SQL Server 2005 where my select on it's own ran relatively fast for what my query was doing say 5 minutes to return all the data over the wire about 150,000 rows. But when I tried to insert that same select into a temp table or table variable the statement ran for more than 1 hour before I killed it. I have yet to figure out what really was going on. I ended up adding the query hint force order and it started inserting faster.
Key point about temp tables also is that you can put indexes, etc on them whereas you can't with table variables.

Are temporary tables thread-safe?

I'm using SQL Server 2000, and many of the stored procedures it use temp tables extensively. The database has a lot of traffic, and I'm concerned about the thread-safety of creating and dropping temp tables.
Lets say I have a stored procedure which creates a few temp tables, it may even join temp tables to other temp tables, etc. And lets also say that two users execute the stored procedure at the same time.
Is it possible for one user to run the sp and which creates a temp table called #temp, and the another user runs the same sp but gets stopped because a table called #temp already exists in the database?
How about if the same user executes the same stored procedure twice on the same connection?
Are there any other weird scenarios that might cause two users queries to interfere with one another?
For the first case, no, it is not possible, because #temp is a local temporary table, and therefore not visible to other connections (it's assumed that your users are using separate database connections). The temp table name is aliased to a random name that is generated and you reference that when you reference your local temp table.
In your case, since you are creating a local temp table in a stored procedure, that temp table will be dropped when the scope of the procedure is exited (see the "remarks section").
A local temporary table created in a stored procedure is dropped automatically when the stored procedure completes. The table can be referenced by any nested stored procedures executed by the stored procedure that created the table. The table cannot be referenced by the process which called the stored procedure that created the table.
For the second case, yes, you will get this error, because the table already exists, and the table lasts for as long as the connection does. If this is the case, then I recommend you check for the existence of the table before you try to create it.
Local-scope temp tables (with a single #) are created with an identifier at the end of them that makes them unique; multiple callers (even with the same login) should never overlap.
(Try it: create the same temp table from two connections and same login. Then query tempdb.dbo.sysobjects to see the actual tables created...)
Local temp tables are thread-safe, because they only exist within the current context. Please don't confuse context with current connection (from MSDN: "A local temporary table created in a stored procedure is dropped automatically when the stored procedure is finished"), the same connection can safely call two or more times a stored procedure that creates a local temp table (like #TMP).
You can test this behavior by executing the following stored procedure from two connections. This SP will wait 30 seconds so we can be sure the two threads will be running their over their own versions of the #TMP table at the same time:
CREATE PROCEDURE myProc(#n INT)
AS BEGIN
RAISERROR('running with (%d)', 0, 1, #n);
CREATE TABLE #TMP(n INT);
INSERT #TMP VALUES(#n);
INSERT #TMP VALUES(#n * 10);
INSERT #TMP VALUES(#n * 100);
WAITFOR DELAY '00:00:30';
SELECT * FROM #TMP;
END;
The short answer is:
Isolation of temporary tables is guaranteed per query, and there's
nothing to worry about either in regard to threading, locks, or
concurrent access.
I'm not sure why answers here talk about a significance of 'connections' and threads as these are programming concepts, whereas query isolation is handled at the database level.
Local temporary objects are separated by Session in SQL server. If you have two queries running concurrently, then they are two completely separate sessions and won't intefere with one another. The Login doesn't matter, so for example if you are using a single connection string using ADO.NET (meaning that multiple concurrent queries will use the same SQL server 'login'), your queries will all still run in separate sessions. Connection Pooling also doesn't matter. Local temporary objects (Tables and Stored Procedures) are completely safe from being seen by other sessions.
To clarify how this works; while your code has a single, common name for the local temporary objects, SQL Server appends a unique string to each object per each session to keep them separate. You can see this by running the following in SSMS:
CREATE TABLE #T (Col1 INT)
SELECT * FROM tempdb.sys.tables WHERE [name] LIKE N'#T%';
You will see something like the following for the name:
T_______________00000000001F
Then, without closing that query tab, open up a new query tab and paste in that same query and run it again. You should now see something like the following:
T_______________00000000001F
T_______________000000000020
So, each time your code references #T, SQL Server will translate it to the proper name based on the session. The separation is all handled auto-magically.
Temp tables are tied to the session, so if different users run your procedure simultaneously there's no conflict...
Temp tables are created only in the context of the query or proc that creates them. Each new query gets a context on the database that is free of other queries' temp tables. As such, name collision is not a problem.
If you look in the temps database you can see the temporary tables there, and they have system generated names. So other than regular deadlocks you should be OK.
unless you use two pound signs ##temp the temp table will be local and only exists for that local connection to the user
First let's make sure you are using real temp tables, do they start with # or ##? If you are creating actual tables on the fly and then dropping and recreating them repeatedly, you will indeed have problems with concurrent users. If you are createing global temp tables (ones that start with ##) you can also have issues. If you do not want concurrency issues use local temp tables (They start with #). It is also a good practice to explicitly close them at the end of the proc (or when they are no longer needed by the proc if you are talking long multi-step procs) and to check for existence (and drop if so) before creating.

Resources