temp DB advice - Using Temporary tables - sql-server

I am working a report, where in the result is combination of multiple #temp tables. Structure is as below
Stored procedure 1 which has a temp table which gives a 0.5 million rows
Stored procedure 2 which has a temp table which give 0.1 million rows
Finally i need to combine the result set of above 2 SP , again use a temp table and make one final result set for report. Now i am worried about the performance, later if data increases, will it effect temp db. We usually stage the data monthly , in a month it Database may contain about 1 million rows. How much is the maximum capacity temp db accommodates. Will it effect with above approach.

First, it is not about number of rows it is about size of row. So, if you have 7KB per row then for .6 million rows it would be roughly around 4 GB. Now, this is not the end, SQL Server use TempDb for storing internal objects, version objects and user objects which also include intermediate result. You can expect the size to be grown more than 4 GB in your case.
There are two possible ways to overcome this:
Tune your queries, minimize use of Temp table, Table variable, CTE, large objects like VARCHAR(MAX) or cursors.
Increase your Tempdb file size. Calculate the max size either based on observation [re-building indexes is a best bet]
In real world scenerio, there is always a chance for improvement in query itself. Check if you can avoid using tempdb by joining table correctly or by using views.

Size of the tempdb is limited only by the size of disk on which it is stored. (Or can be limited in the properties of the database.)
As for 1 million rows. Nowadays it is not much, even "a little". Especially, if we talk about data for a report.
But, I'd checked if you really need that temp tables. Getting rid of them (if they are unnecessary) you can speed up the query and decrease the tempdb usage.

Related

What is that makes temp tables more efficient than table variables when working with large data?

In SQL Server, the performance of temp tables is much better (in the means of time) compared to table variables when working with large data (say inserting or updating 100000 rows) (reference: SQL Server Temp Table vs Table Variable Performance Testing)
I've seen many articles comparing temp table and table variable, but still don't get what exactly makes temp tables more efficient when working with large data? Is it just how they are designed to behave or anything else?
Table variables don't have statistics, so cardinality estimation of table variable is 1.
You can force at least correct cardinality estimation using recompile option, but in no way can you produce column statistics, i.e. there is no data distribution of column values that exists for temporary tables.
The consequences are evident: every query that uses table variable will have underestimation.
Another con is this one:
Queries that insert into (or otherwise modify) #table_variables cannot
have a parallel plan, #temp_tables are not restricted in this manner.
You can read more on it here:
Parallelism with temp table but not table variable?
The answer in that topic has another link to additional reading that is very helpfull

Unallocate unused space in tempdb sql server

any script in sql server to find space used by temporary tables + the database name where that temp table was created in tempdb?
The size of my tempDb has grown up to 100 gb and i am not able to recover the space and am unsure what is occupying so much of space.
Thanks for any help.
Temporary tables always gets created in TempDb. However, it is not necessary that size of TempDb is only due to temporary tables. TempDb is used in various ways
Internal objects (Sort & spool, CTE, index rebuild, hash join etc)
User objects (Temporary table, table variables)
Version store (AFTER/INSTEAD OF triggers, MARS)
So, as it is clear that it is being use in various SQL operations so size can grow due to other reasons also
You can check what is causing TempDb to grow its size with below query
SELECT
SUM (user_object_reserved_page_count)*8 as usr_obj_kb,
SUM (internal_object_reserved_page_count)*8 as internal_obj_kb,
SUM (version_store_reserved_page_count)*8 as version_store_kb,
SUM (unallocated_extent_page_count)*8 as freespace_kb,
SUM (mixed_extent_page_count)*8 as mixedextent_kb
FROM sys.dm_db_file_space_usage
if above query shows,
Higher number of user objects then it means that there is more usage of Temp tables , cursors or temp variables
Higher number of internal objects indicates that Query plan is using a lot of database. Ex: sorting, Group by etc.
Higher number of version stores shows Long running transaction or high transaction throughput
based on that you can configure TempDb file size. I've written an article recently about TempDB configuration best practices. You can read that here
Perhaps you can use following SQL command on temp db files seperately
DBCC SHRINKFILE
Please refer to https://support.microsoft.com/en-us/kb/307487 for more information

Efficient DELETE TOP?

Is it more efficient and ultimately FASTER to delete rows from a DB in blocks of 1000 or 10000? I am having to remove approx 3 million rows from many tables. I first did the deletes in blocks of 100K rows but the performance wasn't looking good. I changed to 10000 and seem to be removing faster. Wondering if even smaller like 1K per DELETE statement is even better.
Thoughts?
I am deleting like this:
DELETE TOP(10000)
FROM TABLE
WHERE Date < '1/1/2012'
Yes, it is. It all depends on your server though. I mean, last time I did that i was using this approeach to delete things in 64 million increments (on a table that had at that point around 14 billion rows, 80% Of which got ultimately deleted). I got a delete through every 10 seconds or so.
It really depends on your hardware. Going moreg granular is more work but it means less waiting for tx logs for other things operating on the table. You have to try out and find where you are comfortable - there is no ultimate answer because it is totally dependend on usage of the table and hardware.
We used Table Partitioning to remove 5 million rows in less than a sec but this was from just one table. It took some work up-front but ultimately was the best way. This may not be the best way for you.
From our document about partitioning:
Let’s say you want to add 5 million rows to a table but don’t want to lock the table up while you do it. I ran into a case in an ordering system where I couldn’t insert the rows without stopping the system from taking orders. BAD! Partitioning is one way of doing it if you are adding rows that don’t overlap current data.
WHAT TO WATCH OUT FOR:
Data CANNOT overlap current data. You have to partition the data on a value. The new data cannot be intertwined within the currently partitioned data. If removing data, you have to remove an entire partition or partitions. You will not have a WHERE clause.
If you are doing this on a production database and want to limit the locking on the table, create your indexes with “ONLINE = ON”.
OVERVIEW OF STEPS:
FOR ADDING RECORDS
Partition the table you want to add records to (leave a blank partition for the new data). Do not forget to partition all of your indexes.
Create new table with the exact same structure (keys, data types, etc.).
Add a constraint to the new table to limit that data so that it would fit into the blank partition in the old table.
Insert new rows into new table.
Add indexes to match old table.
Swap the new table with the blank partition of the old table.
Un-partition the old table if you wish.
FOR DELETING RECORDS
Partition the table into sets so that the data you want to delete is all on partitions by itself (this could be many different partitions).
Create a new table with the same partitions.
Swap the partitions with the data you want to delete to the new table.
Un-partition the old table if you wish.
Yes, no, it depends on the usage of table due to locking. I would try to delete the records in a slower pace. So the opposite of the op's question.
set rowcount 10000
while ##rowcount > 0
begin
waitfor delay '0:0:1'
delete
from table
where date < convert(datetime, '20120101', 112)
end
set rowcount 0

When should I use a table variable vs temporary table in sql server?

I'm learning more details in table variable. It says that temp tables are always on disk, and table variables are in memory, that is to say, the performance of table variable is better than temp table because table variable uses less IO operations than temp table.
But sometimes, if there are too many records in a table variable that can not be contained in memory, the table variable will be put on disk like the temp table.
But I don't know what the "too many records" is. 100,000 records? or 1000,000 records? How can I know if a table variable I'm using is in memory or is on disk? Is there any function or tool in SQL Server 2005 to measure the scale of the table variable or letting me know when the table variable is put on disk from memory?
Your question shows you have succumbed to some of the common misconceptions surrounding table variables and temporary tables.
I have written quite an extensive answer on the DBA site looking at the differences between the two object types. This also addresses your question about disk vs memory (I didn't see any significant difference in behaviour between the two).
Regarding the question in the title though as to when to use a table variable vs a local temporary table you don't always have a choice. In functions, for example, it is only possible to use a table variable and if you need to write to the table in a child scope then only a #temp table will do
(table-valued parameters allow readonly access).
Where you do have a choice some suggestions are below (though the most reliable method is to simply test both with your specific workload).
If you need an index that cannot be created on a table variable then you will of course need a #temporary table. The details of this are version dependant however. For SQL Server 2012 and below the only indexes that could be created on table variables were those implicitly created through a UNIQUE or PRIMARY KEY constraint. SQL Server 2014 introduced inline index syntax for a subset of the options available in CREATE INDEX. This has been extended since to allow filtered index conditions. Indexes with INCLUDE-d columns or columnstore indexes are still not possible to create on table variables however.
If you will be repeatedly adding and deleting large numbers of rows from the table then use a #temporary table. That supports TRUNCATE (which is more efficient than DELETE for large tables) and additionally subsequent inserts following a TRUNCATE can have better performance than those following a DELETE as illustrated here.
If you will be deleting or updating a large number of rows then the temp table may well perform much better than a table variable - if it is able to use rowset sharing (see "Effects of rowset sharing" below for an example).
If the optimal plan using the table will vary dependent on data then use a #temporary table. That supports creation of statistics which allows the plan to be dynamically recompiled according to the data (though for cached temporary tables in stored procedures the recompilation behaviour needs to be understood separately).
If the optimal plan for the query using the table is unlikely to ever change then you may consider a table variable to skip the overhead of statistics creation and recompiles (would possibly require hints to fix the plan you want).
If the source for the data inserted to the table is from a potentially expensive SELECT statement then consider that using a table variable will block the possibility of this using a parallel plan.
If you need the data in the table to survive a rollback of an outer user transaction then use a table variable. A possible use case for this might be logging the progress of different steps in a long SQL batch.
When using a #temp table within a user transaction locks can be held longer than for table variables (potentially until the end of transaction vs end of statement dependent on the type of lock and isolation level) and also it can prevent truncation of the tempdb transaction log until the user transaction ends. So this might favour the use of table variables.
Within stored routines, both table variables and temporary tables can be cached. The metadata maintenance for cached table variables is less than that for #temporary tables. Bob Ward points out in his tempdb presentation that this can cause additional contention on system tables under conditions of high concurrency. Additionally, when dealing with small quantities of data this can make a measurable difference to performance.
Effects of rowset sharing
DECLARE #T TABLE(id INT PRIMARY KEY, Flag BIT);
CREATE TABLE #T (id INT PRIMARY KEY, Flag BIT);
INSERT INTO #T
output inserted.* into #T
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY ##SPID), 0
FROM master..spt_values v1, master..spt_values v2
SET STATISTICS TIME ON
/*CPU time = 7016 ms, elapsed time = 7860 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 6234 ms, elapsed time = 7236 ms.*/
DELETE FROM #T
/* CPU time = 828 ms, elapsed time = 1120 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 672 ms, elapsed time = 980 ms.*/
DELETE FROM #T
DROP TABLE #T
Use a table variable if for a very small quantity of data (thousands of bytes)
Use a temporary table for a lot of data
Another way to think about it: if you think you might benefit from an index, automated statistics, or any SQL optimizer goodness, then your data set is probably too large for a table variable.
In my example, I just wanted to put about 20 rows into a format and modify them as a group, before using them to UPDATE / INSERT a permanent table. So a table variable is perfect.
But I am also running SQL to back-fill thousands of rows at a time, and I can definitely say that the temporary tables perform much better than table variables.
This is not unlike how CTE's are a concern for a similar size reason - if the data in the CTE is very small, I find a CTE performs as good as or better than what the optimizer comes up with, but if it is quite large then it hurts you bad.
My understanding is mostly based on http://www.developerfusion.com/article/84397/table-variables-v-temporary-tables-in-sql-server/, which has a lot more detail.
Microsoft says here
Table variables does not have distribution statistics, they will not trigger recompiles. Therefore, in many cases, the optimizer will build a query plan on the assumption that the table variable has no rows. For this reason, you should be cautious about using a table variable if you expect a larger number of rows (greater than 100). Temp tables may be a better solution in this case.
I totally agree with Abacus (sorry - don't have enough points to comment).
Also, keep in mind it doesn't necessarily come down to how many records you have, but the size of your records.
For instance, have you considered the performance difference between 1,000 records with 50 columns each vs 100,000 records with only 5 columns each?
Lastly, maybe you're querying/storing more data than you need? Here's a good read on SQL optimization strategies. Limit the amount of data you're pulling, especially if you're not using it all (some SQL programmers do get lazy and just select everything even though they only use a tiny subset). Don't forget the SQL query analyzer may also become your best friend.
Variable table is available only to the current session, for example, if you need to EXEC another stored procedure within the current one you will have to pass the table as Table Valued Parameter and of course this will affect the performance, with temporary tables you can do this with only passing the temporary table name
To test a Temporary table:
Open management studio query editor
Create a temporary table
Open another query editor window
Select from this table "Available"
To test a Variable table:
Open management studio query editor
Create a Variable table
Open another query editor window
Select from this table "Not Available"
something else I have experienced is: If your schema doesn't have GRANT privilege to create tables then use variable tables.
writing data in tables declared declare #tb and after joining with other tables, I realized that the response time compared to temporary tables tempdb .. # tb is much higher.
When I join them with #tb the time is much longer to return the result, unlike #tm, the return is almost instantaneous.
I did tests with a 10,000 rows join and join with 5 other tables

SQL Server stored procedure intermediate tables

In SQL Server 2005, I have a query that involves a bunch of large-ish joins (each table is on the order of a few thousand rows to a few million rows, and the tables average probably the equivalent of 10-15 columns of integers and datetimes.
To make the query faster, I am thinking about splitting up the one big query into a stored procedure that does a couple of the joins, stores that result in some temporary table, and then joins that temporary table with another temporary table that was also the result of a few joins.
I am currently using table variables to store the intermediate tables, and the performance one off is noticeably better. But in production, tempdb seems to be having an IO bottleneck.
Is there a better way to think about solving such a problem? I mean, is using table variables way off base here?
Table Variables can take up quite a lot of memory in TempDB.
In large production environments, I have seen better SQL coders than I utilize standard tables for this purpose; They are essentially temp tables, but they create them as regular tables and give them a special prefix or suffix. This has the added benefit (as with temp tables) of being able to utilize indexes to help with execution.
If you can use a standard table or utilize a temp table that is accessed by all steps of your complex execution, you may be able to resolve your memory problem.
Think of it as a place to cache data. In fact, you can update this "cache" each time your main stored procedure runs, just make sure to utilize appropriate transactions and locking.
Imagine the alternative -- If you use an enormous table variable in a stored procedure, and the stored procedure is executed 10 or 20 times simultaneously... that table variable may no longer be living just in memory.

Resources