Fastest way to implement delete, then insert from one db to another? - sql-server

I have this SQL now:
delete from [db_dest].[dbo].[Table1];
insert into [db_dest].[dbo].[Table1] select * from [db_src].[dbo].[Table1];
What other solid options do I have? Could MERGE be faster?

In general, if you need to remove all records from a table - consider using TRUNCATE TABLE instead. This is way faster, especially when the table contains many records, as each individual record deletion is logged, when using DELETE FROM.
Merge will generally not be faster than a pure TRUNCATE TABLE followed by INSERT, as Merge will have to compare the records, field by field, in the two tables, to be able to update a record that has changed from the source, or remove a record that is no longer available in the source.

Related

Postgres lookup tables over clustered data

Background
This is a simplified version of the postgres database I am managing:
TableA: id,name
TableB: id,id_a,prop1,prop2
This database has a peculiarity: when I select data, I only consider rows of TableB that have the same id_a. So I am never interested in selecting data from TableB with mixed values of id_a. Therefore, queries are always of this kind:
SELECT something FROM TableB INNER JOIN TableA ON TableA.id=id_a
Some time ago, the number of rows in TableA grew up to 20000 rows and TableB up to 10^7 rows.
To first speedup queries I added a binary tree lookup table to the TableB properties. Something like the following:
"my_index" btree (prop1)
The problem
Now I have to insert new data, and the database size will became more than the double of its current size. Inserting data to TableB became too slow.
I understood that the slowness cames from the updating of my_index.
When I add a new row of TableB, the database have to reorder the my_index lookup table.
I feel like this would be speeded up, if my_index was not over all elements.
But I do not need the new row, with a given id_a property to be sorted with a row having a different id_a property
The question
How can I create an index on a table, where the elements are ordered only when they have a same common property (es. a column called id_a)?
You can't.
The question that I would immediately ask you if you want such an index is: Yes, but for what values of id_a do you want the index? And your answer would be “for all of them”.
If you actually would want an index only for some values, you could use a partial index:
CREATE INDEX partidx ON tableb(prop1) WHERE id_a = 42;
But really you want an index for the whole table.
Besides, the INSERT would be just as slow unless the row inserted does not satisfy the WHERE condition of your index.
There are three things you can do to speed up INSERT:
Run as many INSERT statements as possible in a single transaction, ideally all of them.
Then you don't have to pay the price of COMMIT after every single INSERT, and COMMITs are quite expensive: they require data to be written to the disk hardware (not the cache), and this is incredibly slow (1 ms a decent time).
You can speed this up even more if you use prepared statements. That way the INSERT doesn't have to be parsed and prepared every time.
Use the SQL command COPY to insert many rows. COPY is specifically designed for bulk data import and will be faster than INSERT.
If COPY is to slow, usually because you need to INSERT a lot of data, the fatsest way is to drop all indexes, insert the data with COPY and then recreate the indexes. It can speed up the process by an order of magnitude, but of course the database is not fully available while the indexes are dropped.

Sql server bulk delete by known record ids

I need to delete many rows from sql server 2008 database, it must be scalable so I was thinking about bulk delete, the problem is that the are not many references on this, at least in my case.
The first factor is that I will exactly know the ID of every row to delete, so any tips with TOP are not an option, also I will delete less rows that I want to retain so there will be no need for some "drop/temp table/re-create" methods.
So I was thinking to use WHERE IN , either suppling IDs or xml data with IDs, there is also an option to use MERGE to delete rows.
If I will have to delete 1000+ rows, sending all ids to WHERE IN could be a problem ? And what with MERGE - it is really a cure for all bulk insert/update/delete problems? What to choose?
One option would be to store the known ID's into a "controller" table, and then delete rows from your main data table that have their ID show up in the controller table.
That way, you could easily "batch" up your deletes, e.g.
DELETE FROM dbo.YourMainDataTable
WHERE ID IN (SELECT TOP (250) ID FROM dbo.DeleteControllerTable)
You could easily run this delete statement in e.g. a SQL Agent Job which comes around every 15 minutes to check if there's anything to delete. In the meantime, you could add more ID's to your DeleteController table, thus you could "decouple" the process of entering the ID's to be deleted from the actual deletion process.

SQL Server trigger - copy row before updating

I'd like to copy a table's row before updating and I'm trying to do it like this:
CREATE TRIGGER first_trigger_test
on Triggertest
FOR UPDATE
AS
insert into Triggertest select * from Inserted
Unfortunately, I get the error message
Msg 8101, Level 16, State 1, Procedure first_trigger_test, Line 6
An explicit value for the identity column in table 'Triggertest' can only be specified when a column list is used and IDENTITY_INSERT is ON.
I assume it's because of the id-column; can't I do something like 'except' id? I do not want to list all the columns in the trigger as it should be as dynamic as possible...
You can't, basically. You'll either have to specify the columns, or use a separate table:
CREATE TRIGGER first_trigger_test
on Triggertest
FOR UPDATE
AS
insert into Triggertest_audit select * from deleted
(where Triggertest_audit is a second table that looks like Triggertest, but without the primary key/identity/etc - commonly multiple rows per logical source row; not I assumed you actually wanted to copy the old values, not the new ones)
The problem happens because you are trying to set an identity column in Triggertest.
Is that your plan?
If you want to copy the new identity columns from INSERTED into Triggertest, then define the column in Triggertest without IDENTITY
If Triggertest has it's own IDENTITY columns, use this:
insert into Triggertest (col1, col2, col3) select col1, col2, col3 from Inserted
After comment:
No, you can't without dynamic SQL to detect what table and find all non-identity colums.
However, if you add or remove columns you'll then have a mis-match between trigger table and Triggertest and you'll get a different error.
If you really want it that dynamic, you'd have to concat all columns into one or use XML to ignore schema.
Finally:
Do all your tables have exactly the same number of columns and datatypes and nullability as TriggerTest... because this is the assumption here...
IF you want the table to be built each time the trigger runs then you have no choice but to use the the system tables to find the columns and create a table with those column definitions. Of course your first step will have to be to drop the existing table or the trigger won't work the second time someone updates a record.
However, I think you need to rethink this process. Dropping a table then creating a new one every time you change a record is a seriously bad idea. How is this table in anyway useful when it may get wiped out and rebuilt every second or so?
What you might consider doing instead is create a dynamic process to create the Create trigger scripts that have the correct information for that table but which are not dynamic. Then your configuration people need to run this process every time table changes are made.
Remember it is critical for triggers to do two things, run as fast as humanly possible and account for proccesing all the records inthe batch (triggers should never have row-by-row proccessing or other slow processses or assume only one row will be in inserted or deleted tables) Dynamic SQL in a trigger is porbably also a bad idea as you can't test out all the possibilites beforehand and can bring your whole production server to a screaming halt when some unexpected thing happens.

Delete all rows in table

Normaly i would do a delete * from XXX but on this table thats very slow, it normaly has about 500k to 1m rows in it ( one is a varbinary(MAX) if that mathers ).
Basicly im wondering if there is a quick way to emty the table of all content, its actualy quicker to drop and recreate it then to delete the content via the delete sql statement
The reason i dont want to recreate the table is because its heavly used and delete/recreate i assume will destroy indexs and stats gathered by sql server
Im also hoping there is a way to do this because there is a "clever" way to get row count via sys.sysindexes , so im hoping there is a equaly clever way to delete content
Truncate table is faster than delete * from XXX. Delete is slow because it works one row at a time. There are a few situations where truncate doesn't work, which you can read about on MSDN.
As other have said, TRUNCATE TABLE is far quicker, but it does have some restrictions (taken from here):
You cannot use TRUNCATE TABLE on tables that:
- Are referenced by a FOREIGN KEY constraint. (You can truncate a table that has a foreign key that references itself.)
- Participate in an indexed view.
- Are published by using transactional replication or merge replication.
For tables with one or more of these characteristics, use the DELETE statement instead.
The biggest drawback is that if the table you are trying to empty has foreign keys pointing to it, then the truncate call will fail.
You can rename the table in question, create a table with an identical schema, and then drop the original table at your leisure.
See the MySQL 5.1 Reference Manual for the [RENAME TABLE][1] and [CREATE TABLE][2] commands.
RENAME TABLE tbl TO tbl_old;
CREATE TABLE tbl LIKE tbl_old;
DROP TABLE tbl_old; -- at your leisure
This approach can help minimize application downtime.
I would suggest using TRUNCATE TABLE, it's quicker and uses less resources than DELETE FROM xxx
Here's the related MSDN article
Truncate table in MS Sql Server
Truncate table in Mysql

Fastest way to delete all the data in a large table

I had to delete all the rows from a log table that contained about 5 million rows. My initial try was to issue the following command in query analyzer:
delete from client_log
which took a very long time.
Check out truncate table which is a lot faster.
I discovered the TRUNCATE TABLE in the msdn transact-SQL reference. For all interested here are the remarks:
TRUNCATE TABLE is functionally identical to DELETE statement with no WHERE clause: both remove all rows in the table. But TRUNCATE TABLE is faster and uses fewer system and transaction log resources than DELETE.
The DELETE statement removes rows one at a time and records an entry in the transaction log for each deleted row. TRUNCATE TABLE removes the data by deallocating the data pages used to store the table's data, and only the page deallocations are recorded in the transaction log.
TRUNCATE TABLE removes all rows from a table, but the table structure and its columns, constraints, indexes and so on remain. The counter used by an identity for new rows is reset to the seed for the column. If you want to retain the identity counter, use DELETE instead. If you want to remove table definition and its data, use the DROP TABLE statement.
You cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint; instead, use DELETE statement without a WHERE clause. Because TRUNCATE TABLE is not logged, it cannot activate a trigger.
TRUNCATE TABLE may not be used on tables participating in an indexed view.
There is a common myth that TRUNCATE somehow skips transaction log.
This is misunderstanding, and is clearly mentioned in MSDN.
This myth is invoked in several comments here. Let's eradicate it together ;)
For reference TRUNCATE TABLE also works on MySQL
I use the following method to zero out tables, with the added bonus that it leaves me with an archive copy of the table.
CREATE TABLE `new_table` LIKE `table`;
RENAME TABLE `table` TO `old_table`, `new_table` TO `table`;
forget truncate and delete. maintain your table definitions (in case you want to recreate it) and just use drop table.
truncate table client_log
is your best bet, truncate kills all content in the table and indices and resets any seeds you've got too.
truncate table is not SQL-platform independent. If you suspect that you might ever change database providers, you might be wary of using it.
On SQL Server you can use the Truncate Table command which is faster than a regular delete and also uses less resources. It will reset any identity fields back to the seed value as well.
The drawbacks of truncate are that it can't be used on tables that are referenced by foreign keys and it won't fire any triggers. Also you won't be able to rollback the data if anything goes wrong.
Note that TRUNCATE will also reset any auto incrementing keys, if you are using those.
If you do not wish to lose your auto incrementing keys, you can speed up the delete by deleting in sets (e.g., DELETE FROM table WHERE id > 1 AND id < 10000). It will speed it up significantly and in some cases prevent data from being locked up.
Yes, well, deleting 5 million rows is probably going to take a long time. The only potentially faster way I can think of would be to drop the table, and re-create it. That only works, of course, if you want to delete ALL data in the table.
The suggestion of "Drop and recreate the table" is probably not a good one because that goofs up your foreign keys.
You ARE using foreign keys, right?
If you cannot use TRUNCATE TABLE because of foreign keys and/or triggers, you can consider to:
drop all indexes;
do the usual DELETE;
re-create all indexes.
This may speed up DELETE somewhat.
I am revising my earlier statement:
You should understand that by using
TRUNCATE the data will be cleared but
nothing will be logged to the
transaction log. Writing to the log
is why DELETE will take forever on 5
million rows. I use TRUNCATE often
during development, but you should be
wary about using it on a production
database because you will not be able
to roll back your changes. You should
immediately make a full database
backup after doing a TRUNCATE to
establish a new basis for restoration.
The above statement was intended to prompt you to be sure that you understand there is difference between the two. Unfortunately, it is poorly written and makes unsupported statements as I have not actually done any testing myself between the two. It is based on statements that I have heard from others.
From MSDN:
The DELETE statement removes rows one
at a time and records an entry in the
transaction log for each deleted row.
TRUNCATE TABLE removes the data by
deallocating the data pages used to
store the table's data, and only the
page deallocations are recorded in the
transaction log.
I just wanted to say that there is a fundamental difference between the two and because there is a difference, there will be applications where one or the other may be inappropriate.
DELETE * FROM table_name;
Premature optimization may be dangerous. Optimizing may mean doing something weird, but if it works you may want to take advantage of it.
SELECT DbVendor_SuperFastDeleteAllFunction(tablename, BOZO_BIT) FROM dummy;
For speed I think it depends on...
The underlying database: Oracle, Microsoft, MySQL, PostgreSQL, others, custom...
The table, it's content, and related tables:
There may be deletion rules. Is there an existing procedure to delete all content in the table? Can this be optimized for the specific underlying database engine? How much do we care about breaking things / related data? Performing a DELETE may be the 'safest' way assuming that other related tables do not depend on this table. Are there other tables and queries that are related / depend on the data within this table? If we don't care much about this table being around, using DROP might be a fast method, again depending on the underlying database.
DROP TABLE table_name;
How many rows are being deleted? Is there other information that is quickly gleaned that will optimize the deletion? For example, can we tell if the table is already empty? Can we tell if there are hundreds, thousands, millions, billions of rows?

Resources