I'm adding a new job category to a database. There are something like 20 tables that use jobCategoryID as a foreign key. Is there a way to create a function that would go through those tables and set the jobCategoryID to NULL if the category is ever deleted in the parent table? Inserting the line isn't the issue. It's just for a backout script if the product owners decide at a later date that they don't want to keep the new job category on.
You need some action. First of all update the dirty records to NULL. For each table use:
Update parent_table
Set jobCategoryID = NULL
WHERE jobCategoryID NOT IN (select jobCategoryID FROM Reerenced_tabble)
Then set delete rule of foreign keys to SET NULL.
If you care about performance issue, follow the below instruction too.
When you have foreign key but dirty records it means, that these constraints are not trusted. It means that SQL Optimizer can not use them for creating best plans. So run these code to see which of them are untrusted to optimizer:
Select * from sys.foreign_keys Where is_not_trusted = 1
For each constraint that become in result of above code edit below code to solve this issue:
ALTER TABLE Table_Name WITH CHECK CHECK CONSTRAINT FK_Name
I have a dimension table in my database that has grown too large. With that I mean that is has too many records - over a million - because it grew at the same pace as the linked facts. This is mostly due to a bad design, and I'm trying to clean it up.
One of the things I try to do is to remove dimension records which are no longer used. The fact tables are regularly maintained and old snapshots are removed. Because the dimensions were not maintained like that, there are many rows in the table whose primary key value no longer appears in any of the linked fact tables anymore.
All the fact tables have foreign key constraints.
Is there a way to locate table rows whose primary key value no longer appears in any of the tables which are linked with a foreign key constraint?
I tried writing a script to track this. Basically this:
select key from dimension
where not exists (select 1 from fact1 where fk = pk)
and not exists (select 1 from fact2 where fk = pk)
and not exists (select 1 from fact3 where fk = pk)
But with a lot of linked tables this query dies after some time - at least, my management studio crashed. So I'm not sure if there are any other options.
we had to do something similar to this at one of my clients. The query, like yours with "not exists.... and not exists.... and not exists...." was taking ~22 hours to run before we change our strategy to handle this in ~20 minutes.
As Nsousa suggest, you have to split the query so SQL Server doesn't have to handle all data in one shot, having to unnecessarily use tempdb and all other things.
First, create new table with all keys in it. The reason to create this table is to not have to read the full table scan for every query, having more keys on a 8k page and to deal with a smaller and smaller set of keys after each delete.
create table DimensionkeysToDelete (Dimkey char(32) primary key nonclustered);
insert into DimensionkeysToDelete
select key from dimension order by key;
Then, instead of deleting unused key, delete the keys that exists in facts table, beginning with the fact table that has the least numbers of rows.
Make sure facts table have proper indexing for performance.
delete from DimensionkeysToDelete
from DimensionkeysToDelete d
inner join fact1 on f.fk = d.Dimkey;
delete from DimensionkeysToDelete
from DimensionkeysToDelete d
inner join fact2 on f.fk = d.Dimkey;
delete from DimensionkeysToDelete
from DimensionkeysToDelete d
inner join fact3 on f.fk = d.Dimkey;
Once all facts tables done, only unused keys remains in DimensionkeysToDelete. To answers your question, just perform a select on this table to get all unused key for that particular dimension, or join it with the dimension to get data.
But, from what I understand of your needs for cleaning up you warehouse, use this table to delete from the orignal dimension table. At this step, you might also want take some action for auditing purposes (ie: insert in an audit table 'Key ' + key + ' deleted on + convert(datetime, getdate(),121) + ' by script X'.... )
I think this can be optimize, take a look at the execution plan, but my client was happy with it so we didn't have to put much effort in it.
You may want to split that into different queries. Check unused rows in fact1, then on fact2, etc, individually. Then intersect all those results to get to the rows that are unused in all fact tables.
I would also suggest a left outer join instead of nested queries, counting rows in the fact table for each pk, and filter out from the resultset those that have a non zero count.
Your query will struggle as it’ll scan every fact table at the same time.
I have a problem when I try to add constraints to my tables. I get the error:
Introducing FOREIGN KEY constraint 'FK74988DB24B3C886' on table 'Employee' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
My constraint is between a Code table and an employee table. The Code table contains Id, Name, FriendlyName, Type and a Value. The employee has a number of fields that reference codes, so that there can be a reference for each type of code.
I need for the fields to be set to null if the code that is referenced is deleted.
Any ideas how I can do this?
SQL Server does simple counting of cascade paths and, rather than trying to work out whether any cycles actually exist, it assumes the worst and refuses to create the referential actions (CASCADE): you can and should still create the constraints without the referential actions. If you can't alter your design (or doing so would compromise things) then you should consider using triggers as a last resort.
FWIW resolving cascade paths is a complex problem. Other SQL products will simply ignore the problem and allow you to create cycles, in which case it will be a race to see which will overwrite the value last, probably to the ignorance of the designer (e.g. ACE/Jet does this). I understand some SQL products will attempt to resolve simple cases. Fact remains, SQL Server doesn't even try, plays it ultra safe by disallowing more than one path and at least it tells you so.
Microsoft themselves advises the use of triggers instead of FK constraints.
A typical situation with multiple cascasing paths will be this:
A master table with two details, let's say "Master" and "Detail1" and "Detail2". Both details are cascade delete. So far no problems. But what if both details have a one-to-many-relation with some other table (say "SomeOtherTable"). SomeOtherTable has a Detail1ID-column AND a Detail2ID-column.
Master { ID, masterfields }
Detail1 { ID, MasterID, detail1fields }
Detail2 { ID, MasterID, detail2fields }
SomeOtherTable {ID, Detail1ID, Detail2ID, someothertablefields }
In other words: some of the records in SomeOtherTable are linked with Detail1-records and some of the records in SomeOtherTable are linked with Detail2 records. Even if it is guaranteed that SomeOtherTable-records never belong to both Details, it is now impossible to make SomeOhterTable's records cascade delete for both details, because there are multiple cascading paths from Master to SomeOtherTable (one via Detail1 and one via Detail2).
Now you may already have understood this. Here is a possible solution:
Master { ID, masterfields }
DetailMain { ID, MasterID }
Detail1 { DetailMainID, detail1fields }
Detail2 { DetailMainID, detail2fields }
SomeOtherTable {ID, DetailMainID, someothertablefields }
All ID fields are key-fields and auto-increment. The crux lies in the DetailMainId fields of the Detail tables. These fields are both key and referential contraint. It is now possible to cascade delete everything by only deleting master-records. The downside is that for each detail1-record AND for each detail2 record, there must also be a DetailMain-record (which is actually created first to get the correct and unique id).
I would point out that (functionally) there's a BIG difference between cycles and/or multiple paths in the SCHEMA and the DATA. While cycles and perhaps multipaths in the DATA could certainly complicated processing and cause performance problems (cost of "properly" handling), the cost of these characteristics in the schema should be close to zero.
Since most apparent cycles in RDBs occur in hierarchical structures (org chart, part, subpart, etc.) it is unfortunate that SQL Server assumes the worst; i.e., schema cycle == data cycle. In fact, if you're using RI constraints you can't actually build a cycle in the data!
I suspect the multipath problem is similar; i.e., multiple paths in the schema don't necessarily imply multiple paths in the data, but I have less experience with the multipath problem.
Of course if SQL Server did allow cycles it'd still be subject to a depth of 32, but that's probably adequate for most cases. (Too bad that's not a database setting however!)
"Instead of Delete" triggers don't work either. The second time a table is visited, the trigger is ignored. So, if you really want to simulate a cascade you'll have to use stored procedures in the presence of cycles. The Instead-of-Delete-Trigger would work for multipath cases however.
Celko suggests a "better" way to represent hierarchies that doesn't introduce cycles, but there are tradeoffs.
There is an article available in which explains how to perform multiple deletion paths using triggers. Maybe this is useful for complex scenarios.
http://www.mssqltips.com/sqlservertip/2733/solving-the-sql-server-multiple-cascade-path-issue-with-a-trigger/
By the sounds of it you have an OnDelete/OnUpdate action on one of your existing Foreign Keys, that will modify your codes table.
So by creating this Foreign Key, you'd be creating a cyclic problem,
E.g. Updating Employees, causes Codes to changed by an On Update Action, causes Employees to be changed by an On Update Action... etc...
If you post your Table Definitions for both tables, & your Foreign Key/constraint definitions we should be able to tell you where the problem is...
This is because Emplyee might have Collection of other entity say Qualifications and Qualification might have some other collection Universities
e.g.
public class Employee{
public virtual ICollection<Qualification> Qualifications {get;set;}
}
public class Qualification{
public Employee Employee {get;set;}
public virtual ICollection<University> Universities {get;set;}
}
public class University{
public Qualification Qualification {get;set;}
}
On DataContext it could be like below
protected override void OnModelCreating(DbModelBuilder modelBuilder){
modelBuilder.Entity<Qualification>().HasRequired(x=> x.Employee).WithMany(e => e.Qualifications);
modelBuilder.Entity<University>.HasRequired(x => x.Qualification).WithMany(e => e.Universities);
}
in this case there is chain from Employee to Qualification and From Qualification to Universities. So it was throwing same exception to me.
It worked for me when I changed
modelBuilder.Entity<Qualification>().**HasRequired**(x=> x.Employee).WithMany(e => e.Qualifications);
To
modelBuilder.Entity<Qualification>().**HasOptional**(x=> x.Employee).WithMany(e => e.Qualifications);
Trigger is solution for this problem:
IF OBJECT_ID('dbo.fktest2', 'U') IS NOT NULL
drop table fktest2
IF OBJECT_ID('dbo.fktest1', 'U') IS NOT NULL
drop table fktest1
IF EXISTS (SELECT name FROM sysobjects WHERE name = 'fkTest1Trigger' AND type = 'TR')
DROP TRIGGER dbo.fkTest1Trigger
go
create table fktest1 (id int primary key, anQId int identity)
go
create table fktest2 (id1 int, id2 int, anQId int identity,
FOREIGN KEY (id1) REFERENCES fktest1 (id)
ON DELETE CASCADE
ON UPDATE CASCADE/*,
FOREIGN KEY (id2) REFERENCES fktest1 (id) this causes compile error so we have to use triggers
ON DELETE CASCADE
ON UPDATE CASCADE*/
)
go
CREATE TRIGGER fkTest1Trigger
ON fkTest1
AFTER INSERT, UPDATE, DELETE
AS
if ##ROWCOUNT = 0
return
set nocount on
-- This code is replacement for foreign key cascade (auto update of field in destination table when its referenced primary key in source table changes.
-- Compiler complains only when you use multiple cascased. It throws this compile error:
-- Rrigger Introducing FOREIGN KEY constraint on table may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION,
-- or modify other FOREIGN KEY constraints.
IF ((UPDATE (id) and exists(select 1 from fktest1 A join deleted B on B.anqid = A.anqid where B.id <> A.id)))
begin
update fktest2 set id2 = i.id
from deleted d
join fktest2 on d.id = fktest2.id2
join inserted i on i.anqid = d.anqid
end
if exists (select 1 from deleted)
DELETE one FROM fktest2 one LEFT JOIN fktest1 two ON two.id = one.id2 where two.id is null -- drop all from dest table which are not in source table
GO
insert into fktest1 (id) values (1)
insert into fktest1 (id) values (2)
insert into fktest1 (id) values (3)
insert into fktest2 (id1, id2) values (1,1)
insert into fktest2 (id1, id2) values (2,2)
insert into fktest2 (id1, id2) values (1,3)
select * from fktest1
select * from fktest2
update fktest1 set id=11 where id=1
update fktest1 set id=22 where id=2
update fktest1 set id=33 where id=3
delete from fktest1 where id > 22
select * from fktest1
select * from fktest2
This is an error of type database trigger policies. A trigger is code and can add some intelligences or conditions to a Cascade relation like Cascade Deletion. You may need to specialize the related tables options around this like Turning off CascadeOnDelete:
protected override void OnModelCreating( DbModelBuilder modelBuilder )
{
modelBuilder.Entity<TableName>().HasMany(i => i.Member).WithRequired().WillCascadeOnDelete(false);
}
Or Turn off this feature completely:
modelBuilder.Conventions.Remove<OneToManyCascadeDeleteConvention>();
Some databases, most notably SQL Server, have limitations on the cascade behaviors that form cycles.
There are two ways to handle this situation:
1.Change one or more of the relationships to not cascade delete.
2.Configure the database without one or more of these cascade deletes, then ensure all dependent entities are loaded so that EF Core can perform the cascading behavior.
please refer to this link:
Database cascade limitations
Mass database update to offset PKs: make a copy of the database instead.
Special use case: company A uses a database with the same schema as company B. Because they have merged, they want to use a single database. Hence, many tables from company B's database must have their primary keys offset to avoid collision with company A's records.
One solution could have been to define foreign keys as ON UPDATE CASCADE, and offset the primary keys having the foreign keys follow. But there are many hurdles if you do that (Msg 1785, Msg 8102, ...).
So a better idea that occurs to me is simply to make a copy of the database, DROP and re CREATE the tables that must have their PKs|FKs offset, and copy the data (and while doing so, offset the primary keys and the foreign keys).
Avoiding all the hassle.
My solution to this problem encountered using ASP.NET Core 2.0 and EF Core 2.0 was to perform the following in order:
Run update-database command in Package Management Console (PMC) to create the database (this results in the "Introducing FOREIGN KEY constraint ... may cause cycles or multiple cascade paths." error)
Run script-migration -Idempotent command in PMC to create a script that can be run regardless of the existing tables/constraints
Take the resulting script and find ON DELETE CASCADE and replace with ON DELETE NO ACTION
Execute the modified SQL against the database
Now, your migrations should be up-to-date and the cascading deletes should not occur.
Too bad I was not able to find any way to do this in Entity Framework Core 2.0.
Good luck!
I have a table which depends on several other ones.
When I delete an entry in this table I should also delete entries in its "masters" (it's 1-1 relation). But here is a problem: when I delete it I get unnecessary table scans, because it checks a reference before deleting. I am sure that it's safe (becuase I get ids from OUTPUT clause):
DELETE TOP (#BatchSize) [doc].[Document]
OUTPUT DELETED.A, DELETED.B, DELETED.C, DELETED.D
INTO #DocumentParts
WHERE Id IN (SELECT d.Id FROM #DocumentIds d);
SET #r = ##ROWCOUNT;
DELETE [doc].[A]
WHERE Id IN (SELECT DISTINCT dp.A FROM #DocumentParts dp);
DELETE [doc].[B]
WHERE Id IN (SELECT DISTINCT dp.B FROM #DocumentParts dp);
DELETE [doc].[C]
WHERE Id IN (SELECT DISTINCT dp.C FROM #DocumentParts dp);
... several others
But here is what plan I get for each delete:
If I drop constraints from document table plan changes:
But problem is that I cannot drop constraints because inserts perform in parallel in other sessions. I also cannot lock a whole table becuase it's very large, and this lock will also lock a lot of others transactions.
The only way I found for now is create an index for every foreign key (which can be used instead of PK scan), but I wanted to avoid this scan at all (indexed or not), becuase I am SURE that documents with such ids doesn't exists becuase I used to delete them. Maybe there is some hint for SQL or some way to disable a reference check for one transaction insead of whole database.
SQL Server is rather stubborn in preserving the referential integrity, so no, you cannot "hint" to disable the check. The fact that you deleted the referencing rows doesn't matter at all (in a high transactional environment, there was plenty of time for some process to modify the tables between the deletes).
Creating the proper indexes is the way to go.
I do not want the orders to be deleted when a customer is deleted. (On Delete Cascade)
I use identity columns so I do not need On Update Cascade
It should be possible to delete a customer table although there exist orders pointing/referencing to a customer. I do not care when the customer is gone because I still need the order table for other tables.
Does a foreign key make sense in this scenario when I do not use Referential Integrity with On Delete/Update Cascade ?
Yes. The foreign key is not in place only to clean up after yourself but primarily to make sure the data is right in the first place (it can also assist the optimizer in some cases). I use foreign keys all over the place but I have yet to find a need to implement on cascade actions. I do understand the purpose of cascade but I've always found it better to control those processes myself.
EDIT even though I tried to explain already that you can work around the cascade issue (thus still satisfying your third condition), I thought I would add an illustration:
You can certainly still allow for orders to remain after you've deleted a customer. The key is to make the Orders.CustomerID column nullable, e.g.
CREATE TABLE dbo.Customers(CustomerID INT PRIMARY KEY);
CREATE TABLE dbo.Orders(OrderID INT PRIMARY KEY, CustomerID INT NULL
FOREIGN KEY REFERENCES dbo.Customers(CustomerID));
Now when you want to delete a customer, assuming you control these operations via a stored procedure, you can do it this way, first setting their Orders.CustomerID to NULL:
CREATE PROCEDURE dbo.Customer_Delete
#CustomerID INT
AS
BEGIN
SET NOCOUNT ON;
UPDATE dbo.Orders SET CustomerID = NULL
WHERE CustomerID = #CustomerID;
DELETE dbo.Customers
WHERE CustomerID = #CustomerID;
END
GO
If you can't control ad hoc deletes from the Customers table, then you can still achieve this with an instead of trigger:
CREATE TRIGGER dbo.Cascade_CustomerDelete
ON dbo.Customers
INSTEAD OF DELETE
AS
BEGIN
SET NOCOUNT ON;
UPDATE o SET CustomerID = NULL
FROM dbo.Orders AS o
INNER JOIN deleted AS d
ON o.CustomerID = d.CustomerID;
DELETE c
FROM dbo.Customers AS c
INNER JOIN deleted AS d
ON c.CustomerID = d.CustomerID;
END
GO
That all said, I'm not sure I understand the purpose of deleting a customer and keeping their orders (or any indication at all about who placed that order).
So to be clear you have a FK from Customer to Orders presently. Cascade update/delete is not enabled on this relationship. Your plan is to delete customers but allow the orders to remain.
This would VIOLATE the foreign key constraint; and prevent the delete from occurring.
If you disable the constraint execute the delete then re-enable you could make it work.
However, this will leave orphaned order records in the system; which might make it harder to support in the long run. What's the next guy who has to support this going to think?
Wouldn't it be better to keep the records and add a status for Active/inactive or created and inactive dates?
I'm struggling with violating the integrity of the database to reduce space...? Or what's the main reason to remove?
If you don't want to have to always filter out the no longer active records use a view or a package which creates a collection of active customers. Eliminating some but not all data seems just wrong to me.