Fast replace table in T-SQL with another - sql-server

I have two tables with same structure (keys/columns/etc). I want to replace data in first table with data from second one. I use following code to do it:
DROP TABLE T1
SELECT *
INTO T1
FROM T2
DROP TABLE T2
but this code is slow enough when T2 is large. T2 table is temporary so I want to rewrite it as:
drop table T1
EXEC sp_rename 'T2', 'T1'
This should execute very fast for any size table, but am I missing something here? Some side effects that may break this code? I'm not very familiar with dynamic SQL so please advice.

Renaming the tables should be fine. Sometimes, there can be issues with triggers or foreign key constraints (and the like). However, you are dropping T1 anyway, so this is not a concern.
The one issue is where the data is actually stored. If by temporary table, you mean a table name that starts with #, then this is not a good approach, because temporary tables are often stored separately from other tables. Instead, create the table in the same place where T1 would be stored, perhaps calling it something like temp_T1.
You might want to revisit your logic to see if there is a way to "reconstruct" T1 in place. However, when there are large numbers of updates and deletes in the processing, recreating the table is often the fastest approach.

Related

Delete large amount of data on SQL server

I need to delete 900,000.00 million records in SQL Server.
I would like to know the best way.
I did the following SELECT.
DeleteTable:
DELETE TOP(1000) TAB1
FROM TABLE1 TAB1
LEFT JOIN TABLE2 TAB2 ON TAB1.ID_PRODUCT = AB2.ID_PRODUCT
WHERE TAB2.ID_PRODUCT IS NULL;
IF ##ROWCOUNT <> 0 goto DeleteTable;
I would like to know if there is how I can optimize this query for better delete performance
Thank you.
Deleting 900,000,000 rows is going to take a long time and you might run out of temporary storage -- unless you have lots and lots of storage. Your approach of deleting rows in increments in one approach.
If your logging is not set to "simple", then you might want to consider that. With your incremental delete approach, that will at least prevent the log from filling up.
For your query, you want tab2(id_product) to have an index. I'm not sure if an index on tab1(id_product) would really help.
Another is just to recreate the table, because inserts and table creation is much more efficient.
For this, you can essentially do:
select t1.*
into temp_tab1
from tab1 t1
where exists (select 1 from table2 t2 where t2.id_product = t1.id_product);
truncate table tab1; -- back it up first!
insert into tab1
select *
from temp_tab1;
Note: If you have an identity column, you may want to set identity insert on. Also, if you have foreign key constraints to this table, then you need extra care.
Finally, if this is something that you need to do repeatedly, then you should consider partitioning the table. It is much more efficient to drop partitions than to delete rows.
You need to be careful if the database is highly transactional and the table has heavy read-write activity mainly because you may be blocking other sessions while delete is in progress. A slower but less impactful approach is to use a cursor to delete the records. The way to do it is by throwing product_id into #table and deleting from the actual table using the product_id as a predicate.

Indexing a single-use temporary table

A colleague works in a business which uses Microsoft SQL Server. Their team creates stored procedures that are executed daily to create data extracts. The underlying tables are huge (some have billions of rows), so most stored procedures are designed such that first they extract only the relevant rows of these huge tables into temporary tables, and then the temp tables are joined with each other and with other smaller tables to create a final extract. Something similar to this:
SELECT COL1, COL2, COL3
INTO #TABLE1
FROM HUGETABLE1
WHERE COL4 IN ('foo', 'bar');
SELECT COL1, COL102, COL103
INTO #TABLE2
FROM HUGETABLE2
WHERE COL14 = 'blah';
SELECT COL1, COL103, COL306
FROM #TABLE1 AS T1
JOIN #TABLE2 AS T2
ON T1.COL1 = T2.COL1
LEFT JOIN SMALLTABLE AS ST
ON T1.COL3 = ST.COL3
ORDER BY T1.COL1;
Generally, the temporary tables are not modified after their creation (so no subsequent ALTER, UPDATE or INSERT operations). For the purpose of this discussion, let's assume the temporary tables are only used once later on (so only one SELECT query would rely on them).
Here is the question: is it a good idea to index these temporary tables after they are created and before they are used in the subsequent query?
My colleague believes that creating an index will make the join and the sort operations faster. I believe, however, that the total time will be larger, because index creation takes time. In other words, I assume that except for edge cases (like a temporary table which itself is extremely large, or the final SELECT query is very complex), SQL Server will use the statistics it has on the temporary tables to optimize the final query, and in doing so it will effectively index the temp tables as it sees fit.
In other words, I am used to think that creating an index is only useful if you know that table is used often; a single-use temporary table that is dropped once the stored procedure is complete is not worth indexing.
Neither of us knows enough about SQL Server optimizer to know in what ways we are right or wrong. Can you please help us better understand which of our assumptions are closer to truth?
Your friend is probably correct, because even if a table's going to be used in a single query, without seeing the query (and even if we do, we still don't have a great idea of what it's execution plan looks like) we have no idea how many times SQL Server will need to find data within various columns of each of those tables for joins, sorts, etc.
However, we'll never know for sure until it's actually done both ways and the results measured and compared.
If you are doing daily data extracts with billions of rows, I would recommend you use a staging tables instead of a temporary table. This will isolate your extracts from other resources using tempdb.
Here is the question: is it a good idea to index these temporary tables after they are created and before they are used in the subsequent query?
Create the index after loading the data into temp table. This will eliminate fragmentation and statistics will be created.
the optimizer will use statistics to generate the optimal plan. So if you don't have a statistics, it could dramatically affect your query performance especially for large datasets.
Example below query the before and after comparison of index creation in temp table:
/* Create index after data load into temp table -- stats is created */
CREATE TABLE #temp ( [text] varchar(50), [num] int);
INSERT INTO #temp([text], [num]) VALUES ('aaa', 1), ('bbb', 2) , ('ccc',3);
CREATE UNIQUE CLUSTERED INDEX [IX_num] ON #temp (num);
DBCC SHOW_STATISTICS ('tempdb..#temp', 'IX_num');
/* Create index before data load into temp table -- stats is not created */
CREATE TABLE #temp_nostats ( [text] varchar(50), [num] int);
CREATE UNIQUE CLUSTERED INDEX [IX_num] ON #temp_nostats (num);
INSERT INTO #temp_nostats([text], [num]) VALUES ('aaa', 1), ('bbb', 2) , ('ccc',3);
DBCC SHOW_STATISTICS ('tempdb..#temp_nostats', 'IX_num');
You need to test if the index will help you or not. You need to balance how many index you can have because it can also impact your performance if you have too many index.

Why this procedure is not working?

this is my first question here. I am very new into SQL Server and T-SQL.
I would like to create a table, with a column that is using data from another column. I thought I could use select function, but it is not allowed.
How to do it?
It is very simple to create view in this way, but I would like to have a table not view.
It should look like
Column A, ColumnB,
Column C=select count(*) from [another table] where....
Could you please advise?
SELECT [COLUMN A],[COLUMN B],COUNT(*) as [COLUMN C]
INTO [destination table] FROM [another table] where...
You should use an alias
You create a table using the create table syntax because you will need to define the field names and sizes. Look the syntax up in Books Online. Do not ever use SELECT INTO unless you are creating a staging table for one-time use or a temp table. It is not a good choice for creating a new table. Plus, you don't say where any of the other columns come from except the column one, so it is may be impossible to properly set up the correct field sizes from the initial insert. Further, well frankly you should take the time to think about what columns you need and what data types they should be, it is irresponsible to avoid doing this for a table that will be permanently used.
To populate you use the Insert statement with a select instead of the values statement. If only column c come from another table, then it might be something like":
Insert table1 (colA, Colb, colC)
select 'test', 10, count(*)
from tableb
where ...
If you have to get the data from multiple tables, then you may need a join.
If you need to maintain the computed column as the values change in TableB, then you may need to write triggers on TableB or better (easier to develop and maintain and less likely to be buggy or create a data integrity problem) use a view for this instead of a separate table.

Speed of using SQL temp table vs long list of variables in stored procedure

I have a stored procedure with a list of about 50 variables of different types repeated about 8 times as part of different groups (declaration, initialization, loading, calculations, result, e.t.c.).
In order to avoid duplication I want to use temp tables instead (not table variable, which does not bring advantages that I seek - inferred type).
I've read that temp tables may start as "in memory table" and then are spilled to disk as they grow depending on amount of memory and many other conditions.
My question is - if I use temp table to store and manipulate one record with 50 fields, will it be much slower than using 50 variables ?
I would not use a temp #Table unless I need to store temporary results for multiple rows. Our code uses lots of variables in some stored procedures. The ability to initialize during declaration helps reduce clutter.
Temp #Tables have some interesting side effects with regards to query compilation. If your stored procedure calls any child procedures, and queries in the child procs refer to this #Table, then these queries will be recompiled upon every execution.
Also, note that if you modify the temp #Table schema in any way, then SQL Server will not be able to cache the table definition. You'll be incurring query recompilation penalties in every query that refers to the table. Also, SQL Server will hammer various system tables as it continually creates and drops the table metadata.
On the other hand, if you don't call child procs, and you don't change the #Table schema, it might perform OK.
But stylistically, it does not make sense to me to add another join to a query just to get a variable for use in a WHERE clause. In other words, I'd rather see a lot of this:
declare #id
select #id = ...
select tbl.id, ...
from tbl
inner join tbl2 ...
where tbl.id = #id
Instead of this:
create table #VarTbl (...)
insert into #VarTbl (...) select ...
select tbl.id, ...
from tbl
inner join tbl2 ...
cross join #VariableTable
where tbl.id = VarTbl_ID
Another thought: can you break apart the stored procedure into logical groups of operations? That might help readability. It can also help reduce query recompilations. If one child proc needs to be recompiled, this will not affect the parent proc or other child procs.
No, it will not be much slower; you would probably even have a hard time showing it is slower at all in normal use cases.
I always use temp tables in this instance; the performance difference is negligible and readability and ease of use is better in my opinion. I normally start looking at using a temp table if I get above 10 variables, especially if those are related.

Data Warehouseing with minimal changes

Ok, I have a table that has 10 years worth of data, and performance is taking a hit. I am planning on moving the older data to a seperate historicaltable. the problem is i need to select from the first table if it is in there and the 2nd table if not. I do not want to do a join because then it will do a lookup on the 2nd table always. HELP?
IF you still need to query the data in no way would I move it to another table. How big is the table now? What are the indexes? Have you considered partioning the table?
If you must move to another table, you could query in stored procs with an if statement. Query the main table first and then if the rowcount = 0 query the other table. It will be slower for records not in the main table but should stay fast if they are in there. However, it wouldn't know when you need records from both.
Sample of code to do this:
CREATE PROC myproc (#test INT)
AS
SELECT field1, field2 from table1field1, field2 from table1
IF ##rowcount = 0
BEGIN
SELECT field1, field2 FROM table2 field1, field2 from table1
END
But really the partioning and indexing correctly is probaly your best choice. Also optimize existing queries. If you are using known poorly performing techniques such as cursors, correlated subqueries, views that call views, scalar functions, nonsargable where clauses, etc. just fixing your queries may mean you don't have to archive.
Sometimes, buying a better server would help as well.
Rather than using a separate historical table, you might want to look into partitioning the table by some function of the date (year perhaps?) to improve performance instead.

Resources