SQL Server : delete everything from the database for a specific user - sql-server

Just playing around in SQL Server to get better with query writing. I'm using the Northwind sample database from Microsoft.
I want to delete 'Robert King', EmployeeID = 7.
So normally I would do:
DELETE FROM Employees
WHERE EmployeeID = 7
but it's linked to another table and throws
The DELETE statement conflicted with the REFERENCE constraint "FK_Orders_Employees". The conflict occurred in database "Northwind", table "dbo.Orders", column 'EmployeeID'
So I have to delete the rows from the Orders table first, but I also get an error because the order ID are linked to yet another table [Order Details].
How can I delete everything at once?
I have a query what shows me everything for EmployeeID = 7, but how can I delete it in one go?
Query to show all data for EmployeeID = 7:
SELECT
Employees.EmployeeID,
Orders.OrderID,
Employees.FirstName,
Employees.LastName
FROM
Employees
INNER JOIN
Orders on Employees.EmployeeID = Orders.EmployeeID
INNER JOIN
[Order Details] on orders.OrderID = [Order Details].orderID
WHERE
Employees.EmployeeID = 7

can you change the design of database?
if you have access to change, The best way is to set "cascade" type for delete operation for employee table.

Don't do Physical Deletes of important data on a Source Of Truth RDBMS
If this an OLTP system, then what you are suggesting, i.e. deleting OrderId rows linked to an employee, looks dangerous as it could break the data integrity of your system.
An OrderDetails row is likely also foreign keyed to a parent Orders table. Deleting an OrderDetails row will likely corrupt your Order processing data (since the Order table Totals will no longer match the cumulative line item rows).
By deleting what appears to be important transactional data, you may be destroying important business records, which could have dire consequences for both yourself and your company.
If the employee has left service of the company, physical deletion of data is NOT the answer. Instead, you should reconsider the table design, possibly by using a soft delete pattern on the Employee (and potentially associated data, but not likely important transactional data like Orders fulfilled by an employee). This way data integrity and audit trail will be preserved.
For important business data like Orders, if the order itself was placed in error, a compensating mechanism or status indication on the order (e.g. Cancelled status) should be used in preferenced to physical data deletion.
Cascading Deletes on non-critical data
In general, if DELETING data in cascading fashion is a deliberate, designed-for use case for the tables, the original design could include ON DELETE CASCADE definitions on the applicable foreign keys. To repeat the concerns others have mentioned, this decision should be taken at design time of the tables, not arbitrarily taken once the database is in Production.
If the CASCADE DELETE triggers are not defined, and your team is in agreement that a cascading delete is warranted, then an alternative is to run a script or better, create a stored procedure) which simulates the cascading delete. This can be somewhat tedious, but providing all dependency tables with foreign keys ultimately dependent on your Employee row (#EmployeeId), the script is of the form (and note that you should define a transaction boundary around the deletions to ensure an all-or-nothing outcome):
BEGIN TRAN
-- Delete all Nth level nested dependencies via foreign keys
DELETE FROM [TableNth-Dependency]
WHERE ForeignKeyNId IN
(
SELECT PrimaryKey
FROM [TableNth-1 Dependency]
WHERE ForeignKeyN-1 IN
(
SELECT PrimaryKey
FROM [TableNth-2 Dependency]
WHERE ForeignKeyN-3 IN
(
... Innermost query is the first level foreign key
WHERE
ForeignKey = #PrimaryKey;
)
)
);
-- Repeat the delete for all intermediate levels. Each level becomes one level simpler
-- Finally delete the root level object by it's primary key
DELETE FROM dbo.SomeUnimportantTable
WHERE PrimaryKey = #PrimaryKey;
COMMIT TRAN

Related

Change dependent records on delete in SQL

I'm adding a new job category to a database. There are something like 20 tables that use jobCategoryID as a foreign key. Is there a way to create a function that would go through those tables and set the jobCategoryID to NULL if the category is ever deleted in the parent table? Inserting the line isn't the issue. It's just for a backout script if the product owners decide at a later date that they don't want to keep the new job category on.
You need some action. First of all update the dirty records to NULL. For each table use:
Update parent_table
Set jobCategoryID = NULL
WHERE jobCategoryID NOT IN (select jobCategoryID FROM Reerenced_tabble)
Then set delete rule of foreign keys to SET NULL.
If you care about performance issue, follow the below instruction too.
When you have foreign key but dirty records it means, that these constraints are not trusted. It means that SQL Optimizer can not use them for creating best plans. So run these code to see which of them are untrusted to optimizer:
Select * from sys.foreign_keys Where is_not_trusted = 1
For each constraint that become in result of above code edit below code to solve this issue:
ALTER TABLE Table_Name WITH CHECK CHECK CONSTRAINT FK_Name

How to find unused rows in a dimension table

I have a dimension table in my database that has grown too large. With that I mean that is has too many records - over a million - because it grew at the same pace as the linked facts. This is mostly due to a bad design, and I'm trying to clean it up.
One of the things I try to do is to remove dimension records which are no longer used. The fact tables are regularly maintained and old snapshots are removed. Because the dimensions were not maintained like that, there are many rows in the table whose primary key value no longer appears in any of the linked fact tables anymore.
All the fact tables have foreign key constraints.
Is there a way to locate table rows whose primary key value no longer appears in any of the tables which are linked with a foreign key constraint?
I tried writing a script to track this. Basically this:
select key from dimension
where not exists (select 1 from fact1 where fk = pk)
and not exists (select 1 from fact2 where fk = pk)
and not exists (select 1 from fact3 where fk = pk)
But with a lot of linked tables this query dies after some time - at least, my management studio crashed. So I'm not sure if there are any other options.
we had to do something similar to this at one of my clients. The query, like yours with "not exists.... and not exists.... and not exists...." was taking ~22 hours to run before we change our strategy to handle this in ~20 minutes.
As Nsousa suggest, you have to split the query so SQL Server doesn't have to handle all data in one shot, having to unnecessarily use tempdb and all other things.
First, create new table with all keys in it. The reason to create this table is to not have to read the full table scan for every query, having more keys on a 8k page and to deal with a smaller and smaller set of keys after each delete.
create table DimensionkeysToDelete (Dimkey char(32) primary key nonclustered);
insert into DimensionkeysToDelete
select key from dimension order by key;
Then, instead of deleting unused key, delete the keys that exists in facts table, beginning with the fact table that has the least numbers of rows.
Make sure facts table have proper indexing for performance.
delete from DimensionkeysToDelete
from DimensionkeysToDelete d
inner join fact1 on f.fk = d.Dimkey;
delete from DimensionkeysToDelete
from DimensionkeysToDelete d
inner join fact2 on f.fk = d.Dimkey;
delete from DimensionkeysToDelete
from DimensionkeysToDelete d
inner join fact3 on f.fk = d.Dimkey;
Once all facts tables done, only unused keys remains in DimensionkeysToDelete. To answers your question, just perform a select on this table to get all unused key for that particular dimension, or join it with the dimension to get data.
But, from what I understand of your needs for cleaning up you warehouse, use this table to delete from the orignal dimension table. At this step, you might also want take some action for auditing purposes (ie: insert in an audit table 'Key ' + key + ' deleted on + convert(datetime, getdate(),121) + ' by script X'.... )
I think this can be optimize, take a look at the execution plan, but my client was happy with it so we didn't have to put much effort in it.
You may want to split that into different queries. Check unused rows in fact1, then on fact2, etc, individually. Then intersect all those results to get to the rows that are unused in all fact tables.
I would also suggest a left outer join instead of nested queries, counting rows in the fact table for each pk, and filter out from the resultset those that have a non zero count.
Your query will struggle as it’ll scan every fact table at the same time.

SQL Server Error: Introducing Foreign Key Constraint [duplicate]

I have a problem when I try to add constraints to my tables. I get the error:
Introducing FOREIGN KEY constraint 'FK74988DB24B3C886' on table 'Employee' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
My constraint is between a Code table and an employee table. The Code table contains Id, Name, FriendlyName, Type and a Value. The employee has a number of fields that reference codes, so that there can be a reference for each type of code.
I need for the fields to be set to null if the code that is referenced is deleted.
Any ideas how I can do this?
SQL Server does simple counting of cascade paths and, rather than trying to work out whether any cycles actually exist, it assumes the worst and refuses to create the referential actions (CASCADE): you can and should still create the constraints without the referential actions. If you can't alter your design (or doing so would compromise things) then you should consider using triggers as a last resort.
FWIW resolving cascade paths is a complex problem. Other SQL products will simply ignore the problem and allow you to create cycles, in which case it will be a race to see which will overwrite the value last, probably to the ignorance of the designer (e.g. ACE/Jet does this). I understand some SQL products will attempt to resolve simple cases. Fact remains, SQL Server doesn't even try, plays it ultra safe by disallowing more than one path and at least it tells you so.
Microsoft themselves advises the use of triggers instead of FK constraints.
A typical situation with multiple cascasing paths will be this:
A master table with two details, let's say "Master" and "Detail1" and "Detail2". Both details are cascade delete. So far no problems. But what if both details have a one-to-many-relation with some other table (say "SomeOtherTable"). SomeOtherTable has a Detail1ID-column AND a Detail2ID-column.
Master { ID, masterfields }
Detail1 { ID, MasterID, detail1fields }
Detail2 { ID, MasterID, detail2fields }
SomeOtherTable {ID, Detail1ID, Detail2ID, someothertablefields }
In other words: some of the records in SomeOtherTable are linked with Detail1-records and some of the records in SomeOtherTable are linked with Detail2 records. Even if it is guaranteed that SomeOtherTable-records never belong to both Details, it is now impossible to make SomeOhterTable's records cascade delete for both details, because there are multiple cascading paths from Master to SomeOtherTable (one via Detail1 and one via Detail2).
Now you may already have understood this. Here is a possible solution:
Master { ID, masterfields }
DetailMain { ID, MasterID }
Detail1 { DetailMainID, detail1fields }
Detail2 { DetailMainID, detail2fields }
SomeOtherTable {ID, DetailMainID, someothertablefields }
All ID fields are key-fields and auto-increment. The crux lies in the DetailMainId fields of the Detail tables. These fields are both key and referential contraint. It is now possible to cascade delete everything by only deleting master-records. The downside is that for each detail1-record AND for each detail2 record, there must also be a DetailMain-record (which is actually created first to get the correct and unique id).
I would point out that (functionally) there's a BIG difference between cycles and/or multiple paths in the SCHEMA and the DATA. While cycles and perhaps multipaths in the DATA could certainly complicated processing and cause performance problems (cost of "properly" handling), the cost of these characteristics in the schema should be close to zero.
Since most apparent cycles in RDBs occur in hierarchical structures (org chart, part, subpart, etc.) it is unfortunate that SQL Server assumes the worst; i.e., schema cycle == data cycle. In fact, if you're using RI constraints you can't actually build a cycle in the data!
I suspect the multipath problem is similar; i.e., multiple paths in the schema don't necessarily imply multiple paths in the data, but I have less experience with the multipath problem.
Of course if SQL Server did allow cycles it'd still be subject to a depth of 32, but that's probably adequate for most cases. (Too bad that's not a database setting however!)
"Instead of Delete" triggers don't work either. The second time a table is visited, the trigger is ignored. So, if you really want to simulate a cascade you'll have to use stored procedures in the presence of cycles. The Instead-of-Delete-Trigger would work for multipath cases however.
Celko suggests a "better" way to represent hierarchies that doesn't introduce cycles, but there are tradeoffs.
There is an article available in which explains how to perform multiple deletion paths using triggers. Maybe this is useful for complex scenarios.
http://www.mssqltips.com/sqlservertip/2733/solving-the-sql-server-multiple-cascade-path-issue-with-a-trigger/
By the sounds of it you have an OnDelete/OnUpdate action on one of your existing Foreign Keys, that will modify your codes table.
So by creating this Foreign Key, you'd be creating a cyclic problem,
E.g. Updating Employees, causes Codes to changed by an On Update Action, causes Employees to be changed by an On Update Action... etc...
If you post your Table Definitions for both tables, & your Foreign Key/constraint definitions we should be able to tell you where the problem is...
This is because Emplyee might have Collection of other entity say Qualifications and Qualification might have some other collection Universities
e.g.
public class Employee{
public virtual ICollection<Qualification> Qualifications {get;set;}
}
public class Qualification{
public Employee Employee {get;set;}
public virtual ICollection<University> Universities {get;set;}
}
public class University{
public Qualification Qualification {get;set;}
}
On DataContext it could be like below
protected override void OnModelCreating(DbModelBuilder modelBuilder){
modelBuilder.Entity<Qualification>().HasRequired(x=> x.Employee).WithMany(e => e.Qualifications);
modelBuilder.Entity<University>.HasRequired(x => x.Qualification).WithMany(e => e.Universities);
}
in this case there is chain from Employee to Qualification and From Qualification to Universities. So it was throwing same exception to me.
It worked for me when I changed
modelBuilder.Entity<Qualification>().**HasRequired**(x=> x.Employee).WithMany(e => e.Qualifications);
To
modelBuilder.Entity<Qualification>().**HasOptional**(x=> x.Employee).WithMany(e => e.Qualifications);
Trigger is solution for this problem:
IF OBJECT_ID('dbo.fktest2', 'U') IS NOT NULL
drop table fktest2
IF OBJECT_ID('dbo.fktest1', 'U') IS NOT NULL
drop table fktest1
IF EXISTS (SELECT name FROM sysobjects WHERE name = 'fkTest1Trigger' AND type = 'TR')
DROP TRIGGER dbo.fkTest1Trigger
go
create table fktest1 (id int primary key, anQId int identity)
go
create table fktest2 (id1 int, id2 int, anQId int identity,
FOREIGN KEY (id1) REFERENCES fktest1 (id)
ON DELETE CASCADE
ON UPDATE CASCADE/*,
FOREIGN KEY (id2) REFERENCES fktest1 (id) this causes compile error so we have to use triggers
ON DELETE CASCADE
ON UPDATE CASCADE*/
)
go
CREATE TRIGGER fkTest1Trigger
ON fkTest1
AFTER INSERT, UPDATE, DELETE
AS
if ##ROWCOUNT = 0
return
set nocount on
-- This code is replacement for foreign key cascade (auto update of field in destination table when its referenced primary key in source table changes.
-- Compiler complains only when you use multiple cascased. It throws this compile error:
-- Rrigger Introducing FOREIGN KEY constraint on table may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION,
-- or modify other FOREIGN KEY constraints.
IF ((UPDATE (id) and exists(select 1 from fktest1 A join deleted B on B.anqid = A.anqid where B.id <> A.id)))
begin
update fktest2 set id2 = i.id
from deleted d
join fktest2 on d.id = fktest2.id2
join inserted i on i.anqid = d.anqid
end
if exists (select 1 from deleted)
DELETE one FROM fktest2 one LEFT JOIN fktest1 two ON two.id = one.id2 where two.id is null -- drop all from dest table which are not in source table
GO
insert into fktest1 (id) values (1)
insert into fktest1 (id) values (2)
insert into fktest1 (id) values (3)
insert into fktest2 (id1, id2) values (1,1)
insert into fktest2 (id1, id2) values (2,2)
insert into fktest2 (id1, id2) values (1,3)
select * from fktest1
select * from fktest2
update fktest1 set id=11 where id=1
update fktest1 set id=22 where id=2
update fktest1 set id=33 where id=3
delete from fktest1 where id > 22
select * from fktest1
select * from fktest2
This is an error of type database trigger policies. A trigger is code and can add some intelligences or conditions to a Cascade relation like Cascade Deletion. You may need to specialize the related tables options around this like Turning off CascadeOnDelete:
protected override void OnModelCreating( DbModelBuilder modelBuilder )
{
modelBuilder.Entity<TableName>().HasMany(i => i.Member).WithRequired().WillCascadeOnDelete(false);
}
Or Turn off this feature completely:
modelBuilder.Conventions.Remove<OneToManyCascadeDeleteConvention>();
Some databases, most notably SQL Server, have limitations on the cascade behaviors that form cycles.
There are two ways to handle this situation:
1.Change one or more of the relationships to not cascade delete.
2.Configure the database without one or more of these cascade deletes, then ensure all dependent entities are loaded so that EF Core can perform the cascading behavior.
please refer to this link:
Database cascade limitations
Mass database update to offset PKs: make a copy of the database instead.
Special use case: company A uses a database with the same schema as company B. Because they have merged, they want to use a single database. Hence, many tables from company B's database must have their primary keys offset to avoid collision with company A's records.
One solution could have been to define foreign keys as ON UPDATE CASCADE, and offset the primary keys having the foreign keys follow. But there are many hurdles if you do that (Msg 1785, Msg 8102, ...).
So a better idea that occurs to me is simply to make a copy of the database, DROP and re CREATE the tables that must have their PKs|FKs offset, and copy the data (and while doing so, offset the primary keys and the foreign keys).
Avoiding all the hassle.
My solution to this problem encountered using ASP.NET Core 2.0 and EF Core 2.0 was to perform the following in order:
Run update-database command in Package Management Console (PMC) to create the database (this results in the "Introducing FOREIGN KEY constraint ... may cause cycles or multiple cascade paths." error)
Run script-migration -Idempotent command in PMC to create a script that can be run regardless of the existing tables/constraints
Take the resulting script and find ON DELETE CASCADE and replace with ON DELETE NO ACTION
Execute the modified SQL against the database
Now, your migrations should be up-to-date and the cascading deletes should not occur.
Too bad I was not able to find any way to do this in Entity Framework Core 2.0.
Good luck!

How to speedup delete from table with multiple references

I have a table which depends on several other ones.
When I delete an entry in this table I should also delete entries in its "masters" (it's 1-1 relation). But here is a problem: when I delete it I get unnecessary table scans, because it checks a reference before deleting. I am sure that it's safe (becuase I get ids from OUTPUT clause):
DELETE TOP (#BatchSize) [doc].[Document]
OUTPUT DELETED.A, DELETED.B, DELETED.C, DELETED.D
INTO #DocumentParts
WHERE Id IN (SELECT d.Id FROM #DocumentIds d);
SET #r = ##ROWCOUNT;
DELETE [doc].[A]
WHERE Id IN (SELECT DISTINCT dp.A FROM #DocumentParts dp);
DELETE [doc].[B]
WHERE Id IN (SELECT DISTINCT dp.B FROM #DocumentParts dp);
DELETE [doc].[C]
WHERE Id IN (SELECT DISTINCT dp.C FROM #DocumentParts dp);
... several others
But here is what plan I get for each delete:
If I drop constraints from document table plan changes:
But problem is that I cannot drop constraints because inserts perform in parallel in other sessions. I also cannot lock a whole table becuase it's very large, and this lock will also lock a lot of others transactions.
The only way I found for now is create an index for every foreign key (which can be used instead of PK scan), but I wanted to avoid this scan at all (indexed or not), becuase I am SURE that documents with such ids doesn't exists becuase I used to delete them. Maybe there is some hint for SQL or some way to disable a reference check for one transaction insead of whole database.
SQL Server is rather stubborn in preserving the referential integrity, so no, you cannot "hint" to disable the check. The fact that you deleted the referencing rows doesn't matter at all (in a high transactional environment, there was plenty of time for some process to modify the tables between the deletes).
Creating the proper indexes is the way to go.

Does a foreign key make sense when I do not use Referential Integrity with On Delete/Update Cascade

I do not want the orders to be deleted when a customer is deleted. (On Delete Cascade)
I use identity columns so I do not need On Update Cascade
It should be possible to delete a customer table although there exist orders pointing/referencing to a customer. I do not care when the customer is gone because I still need the order table for other tables.
Does a foreign key make sense in this scenario when I do not use Referential Integrity with On Delete/Update Cascade ?
Yes. The foreign key is not in place only to clean up after yourself but primarily to make sure the data is right in the first place (it can also assist the optimizer in some cases). I use foreign keys all over the place but I have yet to find a need to implement on cascade actions. I do understand the purpose of cascade but I've always found it better to control those processes myself.
EDIT even though I tried to explain already that you can work around the cascade issue (thus still satisfying your third condition), I thought I would add an illustration:
You can certainly still allow for orders to remain after you've deleted a customer. The key is to make the Orders.CustomerID column nullable, e.g.
CREATE TABLE dbo.Customers(CustomerID INT PRIMARY KEY);
CREATE TABLE dbo.Orders(OrderID INT PRIMARY KEY, CustomerID INT NULL
FOREIGN KEY REFERENCES dbo.Customers(CustomerID));
Now when you want to delete a customer, assuming you control these operations via a stored procedure, you can do it this way, first setting their Orders.CustomerID to NULL:
CREATE PROCEDURE dbo.Customer_Delete
#CustomerID INT
AS
BEGIN
SET NOCOUNT ON;
UPDATE dbo.Orders SET CustomerID = NULL
WHERE CustomerID = #CustomerID;
DELETE dbo.Customers
WHERE CustomerID = #CustomerID;
END
GO
If you can't control ad hoc deletes from the Customers table, then you can still achieve this with an instead of trigger:
CREATE TRIGGER dbo.Cascade_CustomerDelete
ON dbo.Customers
INSTEAD OF DELETE
AS
BEGIN
SET NOCOUNT ON;
UPDATE o SET CustomerID = NULL
FROM dbo.Orders AS o
INNER JOIN deleted AS d
ON o.CustomerID = d.CustomerID;
DELETE c
FROM dbo.Customers AS c
INNER JOIN deleted AS d
ON c.CustomerID = d.CustomerID;
END
GO
That all said, I'm not sure I understand the purpose of deleting a customer and keeping their orders (or any indication at all about who placed that order).
So to be clear you have a FK from Customer to Orders presently. Cascade update/delete is not enabled on this relationship. Your plan is to delete customers but allow the orders to remain.
This would VIOLATE the foreign key constraint; and prevent the delete from occurring.
If you disable the constraint execute the delete then re-enable you could make it work.
However, this will leave orphaned order records in the system; which might make it harder to support in the long run. What's the next guy who has to support this going to think?
Wouldn't it be better to keep the records and add a status for Active/inactive or created and inactive dates?
I'm struggling with violating the integrity of the database to reduce space...? Or what's the main reason to remove?
If you don't want to have to always filter out the no longer active records use a view or a package which creates a collection of active customers. Eliminating some but not all data seems just wrong to me.

Resources