cascading deletes causing multiple cascade paths - sql-server

I am using SQlServer 2008, and an extract of some datatables is displayed below:
Users
Id (PK)
UserItems
UserId (PK)
ItemId (PK) - (Compound key of 2 columns)
...
UserItemVotes
UserId (PK)
ItemId (PK)
VoterId (PK) - (Compound key of 3 columns)
I have the following relationships defined:
User.Id -> UserItems.UserId
(UserItems.UserId, UserItems.ItemId) -> (UserItemVotes.UserId, UserItemVotes.ItemId)
UserId.Id -> UserItemVotes.VoterId
Now, I am having a problem when turning on cascading deletes. When adding the 3rd relationship I receive the error "...may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints."
I do not really want to do this, ideally if a user is deleted I would like to remove their useritem and/or their votes.
Is this a bad design? Or is there a way to get behaviour I want from SQL Server?

The approved answer is not a good answer. The scenario described is not bad design, nor is it "risky" to rely on the database to do its job.
The original question describes a perfectly valid scenario, and the design is well thought-out. Clearly, deleting a user should delete both the user's items (and any votes on them), and delete the user's votes on any item (even items belonging to other users). It is reasonable to ask the database to perform this cascading delete when the user record is deleted.
The problem is that SQL Server can't handle it. Its implementation of cascading deletes is deficient.

"UserItems.ItemId -> UserItemVotes.UserId"
This one seems extremely suspect.

I would lead toward bad design. While most DBMSs can manage cascading deletes, it is risky to use this built in functionality. Your scenario is a perfect example of why these types of things are often managed in application code. There you can determine exactly what needs to be deleted and in what order.

Related

Relational database: indirect reference to a "foreign key"

I have a data schema similar to the following:
USERS:
id
name
email
phone number
...
PHOTOS:
id
width
height
filepath
...
I have an auditing table for any changes to the system
LOGS:
id
acting_user
date
record_type (enum: "users", "photos", "...")
record_id
record_field
new_value
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table? And effectively, the record_type and record_id together are a foreign key to the record in the other table? Is this an anti-pattern? (Note: new_value, and all the thing we would be logging are the same data type, strings).
Is this an anti-pattern?
Yes. Any pattern that makes you enforce referential integrity manually1 is an anti-pattern.
Here is why using FOREIGN KEYs is so important and here is what to do in cases like yours.
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table?
There is no standard term that I know of, but I heard people calling it "generic" or "polymorphic" FKs.
1 As opposed to FOREIGN KEYs built-into the DBMS.
Actually, I think 'Anti-Pattern' is a pretty good name for this set up, but it can be a realistic way to go - especially in this example.
I'll add a similar example with a new table which records LIKES of users' photos, etc, and show why it's bad. Then I'll explain why it might not ne too bad for your LOGS example.
The LIKES table is:
Id
LikedByUserId
RecordType ("users", "photos", "...")
RecordId
This is pretty much the same as the LOGS table. The problem with this is that you cannot make RecordId a foreign key to the USERS table as well as to the PHOTOS table as well as any other tables. If User 1234 is being liked, you couldn't insert it unless there was a PHOTO with ID 1234 and so on. For this reason, all RDBMS's that I know of will not let a Foreign Key be defined with multiple Primary keys - after all, Primary means 'only one' amongst other things.
So you'ld have to create the LIKES table with no relational integrity. This may not be a bad thinbg sometimes, but in this case I'd think I'd want an important table such as LIKES to have valid entries.
To do LIKES properly, I would create the table as:
Id
LikedByUserId (allow null)
PhotoId (allow null)
OtherThingId (allow null)
...and create the appropriate foreign keys. This will actually make queries that read the data easier to read and maintain and probably more efficient too.
However, for a table like LOGS which probably isn't central to the functionality of my system and I'm only doing some ad-hoc querying from to check what's been happening, then I might not want to put in the extra effort and add the complexity that results in more efficient reading. I'm not sure I would actually skip it, though. It is an anti-pattern but depending on usage it might be OK.
To emphasise the point, I would only do this if the system never queried the table; if the only people who look at the data are admin's running ad-hoc queries against it then it might be OK.
Cheers -

Complex Database Relations (Junction Tables)

My Question is about the idea of combining two junction tables into one, for similarly related tables. Please read to see what I mean. Also note that this is indeed a problem I am faced with and therefore relevant to this forum. It is just a topic of broad consequence for which I'm hoping to elicit a bit more participation from various professionals to get a better census of "best practice" if you will.
I have this rather challenging database design problem. I'm hoping this will be sort of a wiki that many people can contribute to and learn from. To make this easier, I've created a set of graphics, and will break the problem down into 1) Process, and 2) Structure.
Process Steps
A request (DocRequest) for documentation (Publication) is made.
A new publication is created IF said publication does not already exist.
A running log (StatusReport) is kept for progress on fulfilling the request.
Note: For any given Publication there may be many DocRequests and StatusReports (including updates)
Database Structure
Note: Both the DocRequest and StatusReport tables have numerous fields and supporting tables not shown in the attached graphics. Furthermore, a particular Publication is the master record to which all records in those tables belong.
--Current Implementation--
Note: The major flaw with this design is that whenever you create either a new DocRequest and StatusReport record, you have to also create a new record in the Publications table (which acts like a junction table), but this also creates a new Publication as a result. This is not the desired behavior.
--Typical Implementation-- (for this type of relationship)
Note: This is ok, and probably ideal, but handles updates to either the DocRequest and StatusReport tables, independently linking them to the Publication to which they belong.
--My Preferred Implementation-- (for this special case)
Note: The idea I had here, was simply to combine the dual junction tables into one. In this case the junction table would get a new record anytime either the DocRequest or StatusReport had a insert occur. I will likely handle this with a trigger.
Discussion
Now for the discussion. I would like to know from my fellow Database Developers if you think this is a bad idea, and what issues might arise from this. I think the net number of records should be identical as with the two separate junction tables, and in fact uses slightly less space by saving an extra ID column. :)
Let me know what you guys think. I would really like to get many people involved in this discussion. Cheers! :)
I think you're hurting yourself by thinking in terms of junction tables. Just think of tables.
Since StatusReport has to do with the status of the document request,
you need a table that relates those two somehow.
"StatusReport" is an awful name for a table that stores facts about the status of a document request.
"ID" is an awful name for any column in any table.
The id number of the publication seems to have more to do with the document request than with the status of the request. (You said, "A new publication is created IF said publication does not already exist." Frankly, that's skating pretty close to the edge of not making sense.) So the publication number almost certainly belongs in the DocRequest table.
Referring to the diagram of your preferred implementation, I'd drop the table TripleJunction, and replace StatusReport with this.
-- Predicate: Document request number (doc_request_id) has status (status)
-- as of date and time (status_as_of).
create table document_request_status (
doc_request_id integer not null references DocRequest (id),
status_as_of timestamp not null default current_timestamp,
status varchar(10) not null,
-- other columns go here
primary key (doc_request_id, status_as_of)
);

Database schema consistency issue

Part of my database schema involves the entities:
Jobs
Agencies
Agents
and relation JobAgent
Each Job has one Agency it belongs to
Each Agent belongs to one agency
Each Job has 0-n agents
The database will be SQL Server 2008
Here is my schema:
My problem is that Jobs.agencyid must always be equal to Agents.agencyid when related through JobAgent.
If Jobs.agencyid were to be updated to a new agency, The Agents would then belong to a different Agency than the Job.
What would be the best way to redesign my schema to avoid relying on triggers or application code to ensure this consistency?
AGENCIES
agency_id (pk)
JOBS
job_id (pk)
agency_id (fk to AGENCIES.agency_id)
AGENTS
agent_id (pk)
agency_id (fk to AGENCIES.agency_id)
JOBAGENT
job_id
fk to JOBS.job_id
agent_id
fk to AGENTS.agent_id
agency_id
fk to JOB.agency_id
fk to AGENTS.agency_id
You can define more than one foreign key constraint to a column - it just means that the value in JOBAGENT has to satisfy BOTH foreign key constraints to be allowed. But you'll have fun if you ever want to update jobs to a different agency... ;) SQL Server supports composite foreign keys: http://msdn.microsoft.com/en-us/library/ms175464.aspx
Update Regarding Updating
You have two choices -
Perform by hand, because ON UPDATE CASCADE etc won't handle agency and agent updates without using triggers
Have a status column in JOB, so you can cancel a job in order to recreate the job with the new supporting records (Agent, jobagent, etc). Further cleanup can be automated, based on job status if you desire
The problem is that if a job moves from one agency to another (as you say, if Jobs.agencyid were to be updated...) then the corresponding records in JobAgent become meaningless: those agents can't be attached to a job that's no longer with their agency, so the JobAgent records connecting them to the jobs should therefore be deleted...
One way to enforce this is to add a JobAgent.agencyid field, and make it a foreign key on Jobs.agencyid, with ON UPDATE RESTRICT to force (manual) deletion of the relevant JobAgent records before Jobs.agencyid can be changed.
Edit: the other issue, which I hadn't really considered, is that when you first associate a job to an agent (ie create a new JobAgent record) you need to ensure they both belong to the same agency... for this, I think OMG's solution works best - I'm happy to defer to the better answer.
OMG also raises the question of how to handle updates: you can either
Change the Jobs.agencyid field and delete (by hand) all associated JobAgent records: in this case the old agents no longer work on this job, and you can assign someone from the new agency to work on it.
Change the Jobs.agencyid field and also change all associated JobAgent records (ie all those agents move with the job to the new agency) - but this is very messy, because those agents will also be associated with other jobs that are still with the original agency.
As OMG suggests, make a new Jobs record and mark the old one as defunct (for later deletion).
As above but keep the defunct Jobs record to preserve historical information.
Whether you choose 3 or 4 depends a bit on what your system is for: do you just want to maintain the current state of who-has-which-jobs? or do you need to keep some kind of history, for example if there's billing records attached to the job... that info needs to stay associated with the original agency (but this is all outside the scope of your original question).
You could use ON UPDATE CASCADE with the foreign keys. See this Wikipedia Page.
Or maybe, if agencyid is something that you expect to be mutable, you can have a unique constraint for it and use some other meaningless field for the agency id (say, an auto-increment column).
Does the following scheme answer your question?
Jobs Agents Agencies
^ ^ ^
| | |
\ | /
\ | /
AgentiatedJob
Normally, I have a single-field primary key for every table, because it is easier to match a registry on a table and to refer it on tables below. So following this approach the AgientiatedJob would have at least the fields:
AgentiatedJobId
JobId
AgentId
AgencyId

What would you do to avoid conflicting data in this database schema?

I'm working on a multi-user internet database-driven website with SQL Server 2008 / LinqToSQL / custom-made repositories as the DAL. I have run across a normalization problem which can lead to an inconsistent database state if exploited correctly and I am wondering how to deal with the problem.
The problem: Several different companies have access to my website. They should be able to track their Projects and Clients at my website. Some (but not all) of the projects should be assignable to clients.
This results in the following database schema:
**Companies:**
ID
CompanyName
**Clients:**
ID
CompanyID (not nullable)
FirstName
LastName
**Projects:**
ID
CompanyID (not nullable)
ClientID (nullable)
ProjectName
This leads to the following relationships:
Companies-Clients (1:n)
Companies-Projects (1:n)
Clients-Projects(1:n)
Now, if a user is malicious, he might for example insert a Project with his own CompanyID, but with a ClientID belonging to another user, leaving the database in an inconsistent state.
The problem occurs in a similar fashion all over my database schema, so I'd like to solve this in a generic way if any possible. I had the following two ideas:
Check for database writes that might lead to inconsistencies in the DAL. This would be generic, but requires some additional database queries before an update and create queries are performed, so it will result in less performance.
Create an additional table for the clients-Projects relationship and make sure the relationships created this way are consistent. This also requires some additional select queries, but far less than in the first case. On the other hand it is not generic, so it is easier to miss something in the long run, especially when adding more tables / dependencies to the database.
What would you do? Is there any better solution I missed?
Edit: You might wonder why the Projects table has a CompanyID. This is because I want users to be able to add projects with and without clients. I need to keep track of which company (and therefore which website user) a clientless project belongs to, which is why a project needs a CompanyID.
I'd go with with the latter, having one or more tables that define the allowable relationships between entities.
Note, there's no circularity in the references you have, so the title is misleading.
What you have is the possibility of conflicting data, that's different.
Why do you have "CompanyID" in the project table? The ID of the company involved is implicitly given by the client you link to. You don't need it.
Remove that column and you've removed your problem.
Additionally, what is the purpose of the "name" column in the client table? Can you have a client with one name, differing from the name of the company?
Or is "client" the person at that company?
Edit: Ok with the clarification about projects without companies, I would separate out the references, but you're not going to get rid of the problem you're describing without constraints that prevent multiple references being made.
A simple constraint for your existing tables would be that not both the CompanyID and ClientID fields of the project row could be non-null at the same time.
If you want to use the table like this and avoid the all the new queries just put triggers on the table and when user tries to insert row with wrong data the trigger with stop him.
Best Regards,
Iordan
My first thought would be to create a special client record for each company with name "No client". Then eliminate the CompanyId from the Project table, and if a project has no client, use the "No client" record rather than a "normal" client record. If processing of such no-client's is special, add a flag to the no-client record to explicitly identify it. (I'd hate to rely on the name being "No Client" or something like that -- too fuzzy.)
Then there would be no way to store inconsistent data so the problem would go away.
In the end I implemented a completely generic solution which solves my problem without much runtime overhead and without requiring any changes to the database. I'll describe it here in case someone else has the same problem.
First off, the approach only works because the only table that other tables are referencing through multiple paths is the Companies table. Since this is the case in my database, I only have to check whether all n:1 referenced entities of each entity that is to be created / updated / deleted are referencing the same company (or no company at all).
I am enforcing this by deriving all of my Linq entities from one of the following types:
SingleReferenceEntityBase - The norm. Only checks (via reflection) if there really is only one reference (no matter if transitive or intransitive) to the Companies table. If this is the case, the references to the companies table cannot become inconsistent.
MultiReferenceEntityBase - For special cases such as the Projects table above. Asks all directly referenced entities what company ID they are referencing. Raises an exception if there is an inconsistency. This costs me a few select queries per CRUD operation, but since MultiReferenceEntities are much rarer than SingleReferenceEntities, this is negligible.
Both of these types implement a "CheckReferences" and I am calling it whenever the linq entity is written to the database by partially implementing the OnValidate(System.Data.Linq.ChangeAction action) method which is automatically generated for all Linq entities.

database design: a 'code' table that get referenced by other entities

I am building a database as a simple exercise, it could be hosted on any database server, so I am trying to keep things as much standard as possible. Basically what I would like to do is a 'code' table that get referenced by other entities. I explain:
xcode
id code
r role
p property
code
r admin
r staff
p title
....
then I would like to have some view like:
role (select * from code where xcode='r')
r admin
r staff
property (select * from code where xcode='p')
p title
then, suppose we have an entity
myentity
id - 1
role - admin (foreign key to role)
title - title (foreign key to property)
Obviously I cannot create foreign key to a view, but this is to tell the idea I have in mind. How can I reflect such behaviour using whenever possible, standard sql syntax, then as a second option, database additional features like trigger ecc... ?
Because if I tell that role and title in myentity are foreign key to 'code', instead of the views, nothing would stop me to insert a role in title field.
I have worked on systems with a single table for all codes and others with one table per code. I definitely prefer the latter approach.
The advantages of a table per code are:
Foreign keys. As you have already spotted it is not possible to enforce compliance to permitted values through foreign keys with a single table. Using check constraints is an alternative approach but it has a higher maintenance cost.
Performance. Code lookups are not normally a performance bottle neck, but it undoubtedly helps the optimizer to make sensible decisions about execution paths if it knows it is retrieving records from a table with four rows rather than four hundred.
Code groups. Sometimes we want to organise a code into sub-divisions, usually to make it easier to render complex lists of values. If we have a table per code we have more flexibility when it comes to structure.
In addition I notice that you want to be able to deploy "on any database server". In that case avoid triggers. Triggers are usually bad news in most scenarios, but they have product-specific syntax.
What you are trying to do is in most cases an anti pattern and design mistake. Just create the different tables instead of views.
There are some rare cases where this kind of design makes sense. In this kind include the xcode field in the primary key/ foreign key. So your entity will look like this:
myentity
id - 1
role_xcode
role - admin (foreign key to role)
title_xcode
title - title (foreign key to property)
You then can create check constraints to enforce role_xcode='r' and title_xcode='p'
(sorry I don't know if they are standard, they do exist in oracle and are so simple that I'd expect them on other rdbms's as well)

Resources