I need to delete many rows from sql server 2008 database, it must be scalable so I was thinking about bulk delete, the problem is that the are not many references on this, at least in my case.
The first factor is that I will exactly know the ID of every row to delete, so any tips with TOP are not an option, also I will delete less rows that I want to retain so there will be no need for some "drop/temp table/re-create" methods.
So I was thinking to use WHERE IN , either suppling IDs or xml data with IDs, there is also an option to use MERGE to delete rows.
If I will have to delete 1000+ rows, sending all ids to WHERE IN could be a problem ? And what with MERGE - it is really a cure for all bulk insert/update/delete problems? What to choose?
One option would be to store the known ID's into a "controller" table, and then delete rows from your main data table that have their ID show up in the controller table.
That way, you could easily "batch" up your deletes, e.g.
DELETE FROM dbo.YourMainDataTable
WHERE ID IN (SELECT TOP (250) ID FROM dbo.DeleteControllerTable)
You could easily run this delete statement in e.g. a SQL Agent Job which comes around every 15 minutes to check if there's anything to delete. In the meantime, you could add more ID's to your DeleteController table, thus you could "decouple" the process of entering the ID's to be deleted from the actual deletion process.
Related
I have simple mapping which deletes records in the target table. I have not used "UPDATE STRATEGY" transformation, rather set session property "delete" in order to delete records.
The said table is having composite primary keys (having 10 columns). It is working fine if all these columns are having value. BUT there are few records in which one of the column has NULL value. In this scenario, it is not deleting that record.
Can someone let me know how to handle this situation?
Its possible, because informatica fires sql like this when deleting data - delete from tab where key1=v1 and key2=v2. So, if v2 is null, its possible, delete query will ignore the record.
You can use target update override property to do this. Write your own sql to delete data.
DELETE FROM
mytable
WHERE
ID = :TU.ID
AND ISNULL(EMP_NAME,'Unspecified') = ISNULL(:TU.EMP_NAME,'Unspecified')
Since you have keys defined in infa you shouldnt face any problem. But please note that these deletes will be done on a row by row basis, so if it's a large table, and delete doesnt follow primary key index, the delete it could take time to delete each row!
I have a tree table that has foreign keys to self (depth-unlimited parents and children). Given that SQL Server lacks the CASCADE (within same table, loops) support MySQL offers, I need to first order the IDs to delete before deleting them. I order them with a CTE from most-deletable (no children) to least-deletable (top parents).
I want to throw this set of IDs to a DELETE but I need them deleted in the exact order I send them to SQLServer. Can this be done in a single query?
I could write a stored procedure to delete the records or just DELETE one by one with multiple queries from C#... but I'm wondering if this can be achieved in a simple DELETE FROM WHERE IN query? I have no clue if DELETE retains order of the IN (...) collection, because if it doesn't... the FKC will block it. -- Coming from MySQL, never cared as they can cascade within the same table.
I have this SQL now:
delete from [db_dest].[dbo].[Table1];
insert into [db_dest].[dbo].[Table1] select * from [db_src].[dbo].[Table1];
What other solid options do I have? Could MERGE be faster?
In general, if you need to remove all records from a table - consider using TRUNCATE TABLE instead. This is way faster, especially when the table contains many records, as each individual record deletion is logged, when using DELETE FROM.
Merge will generally not be faster than a pure TRUNCATE TABLE followed by INSERT, as Merge will have to compare the records, field by field, in the two tables, to be able to update a record that has changed from the source, or remove a record that is no longer available in the source.
Is there a way with SQL Server to see what will be deleted before the actual delete is run?
Say I have a customer table that Foreign Keys to orders and addresses with cascading deletes.
If I delete a customer then SQL Server will delete all the orders and addresses for that customer too.
Is there a way to get the Primary keys for the tables that will have deletes done on them without running the actual delete statement?
NOTE: Obviously I could code a sproc by hand to do this for this specific example. But my question is asking if the way SQL Server does it (for the delete) can be used to get info rather than just do the delete.
The only way I know of to do this is to:
start a transaction
run your deletion
select from tables
roll back your transaction
I have a database in which there is a parent "Account" row that then has a 1-Many relationship with another table, and that table has a 1-Many relationship with another table. This goes on about 6 levels deep (with Account at the top). At the very bottom there could possibly be thousands (can even go beyond 100k) of rows. On each table there is a foreign key set to cascade on delete.
The issue is, that if I try to delete the very top row (an "Account"), it can take minutes, sometimes well over 10 minutes. Is there a faster way to delete all the rows (such as maybe going from the bottom up in individual delete statements) or is cascading pretty much it?
I am using MSSQL 2005 & MSSQL 2008 for the server, ans L2S to perform the delete, although i can use a T-SQL statement if it is faster.
Ive tried doing the delete from the SQL Management Studio too, and that takes just as long.
edit: we have tried re-indexing the database, with negligible difference, maybe a minute or two difference. I appreciate all your answers, it looks like i am going to have to start writing some code to do soft deletes!
A delete is a delete, and if you want to delete massive amounts of rows (100k), it will take a while.
If you do a soft delete (set a status to "D" for example) you can then run a job to actually delete the rows in batches of say 1,000 or so over time it may work better for you. The soft delete should update only the header row and would be very fast. You'd need to code your application to ignore these "D" status rows and their children though.
EDIT
To further #Kane's comment. you could only do a soft delete, or you could do a soft delete followed by a batch process to do the actual deletes if you really want to. I'd just stick with the soft deletes if drive space is not an issue.
Have you indexed all the foreign keys? That's a common issue.
It sounds like you might have indexing issues.
Assume a parent-to-child relationship on column ParentId. By definition, column ParentId in the Parent table must have a primary or unique constraint, and thus be indexed. The child table, however, need not be indexed on ParentId. When you delete a parent entry, SQL has to delete all rows in the child table that have been assigned that foreign key... and if that column is not indexed, the work will have to be done with table scans. This could occur once for each table in your "deletion chain".
Of course, it might just be volume. Deleting a few k rows from 100k+ databases with multiple indexes, even if the "delete lookup" field is indexed, could take significant time -- and dont' forget locking and blocking if you've got users accessing your system during the delete!
Deferring the delete until a schedule maintenance window, as KM suggests, would definitely be an option--though it might require a serious modification to your code base.