I have encountered a problem in my project using PostgreSQL.
Say there are two tables A and B, both A and B have a (unique) field named ID. The ID column of table A is declared as a primary key, while the ID column of table B is declared as a foreign key pointing back to table A.
My problem is that every time we have new data inputted into database, the values in table B tend to be updated prior to the ones in table A (this problem can not be avoided as the project is designed this way). So I have to modify the relationship between A and B.
My goal is to achieve a situation where I can insert data into A and B separately while having the ON DELETE CASCADE clause enabled. What's more, INSERT and DELETE queries may happen at the same time.
Any suggestions?
It sounds like you have a badly designed project, if you can't use deferred constraints. Your basic problem is that you can't guarantee internal consistency of the data because transactions may occur which do not move the data from one consistent state to another.
Here is what I would do to be honest:
Catalog affected keys.
Drop affected key constraints.
Write a periodic job that looks for orphaned rows. Use LEFT JOIN because antijoins do not perform as well in PostgreSQL.
The problem with a third table is it doesn't solve your basic problem, which is that writes are not atomically consistent. And once you sacrifice that a lot of your transactional controls go out the window.
Long term, the project needs to be rewritten.
Related
Lets say I have the following 3 tables:
Table A, Table B & Table C.
Table C has a foreign key to Table A.
Table C has a foreign key to Table B.
When I delete a row from Table B I want it to delete the orphan in Table C but only if it doesn't hold any references to Table A. If it does hold a reference to table A I want it to delete the row in Table B and set the foreign key to null in table C.
Is that even possible? what is it's default behavior?
Your scenario is typical "Business requirement" not a "cascade" requirement.
As in detail discussed here: When/Why to use Cascading in SQL Server? (mechanism behind is the same for SQL cascade delete / NHibernate cascade delete)
...
Cascade Delete may make sense when the semantics of the relationship can involve an exclusive "is part of" description. For example, and OrderLine record is part of it's parent order, and OrderLines will never be shared between multiple orders. If the Order were to vanish, the OrderLine should as well, and a line without an Order would be a problem.
The canonical example for Cascade Delete is SomeObject and SomeObjectItems, where it doesn't make any sense for an items record to ever exist without a corresponding main record.
...
Again, while these are about feature on a SQL server, they apply to NHibernate cascade feature as well.
So, in your case, where there is a pretty complex logic about deletion, the answer should be:
move the deletion defintion outside of the Data layer (NHibernate mapping)
place it inside of the Business layer. Create rules arround that, assure in tests that your logic is working (regardless on pesistence engine, i.e. NHibernate)
profit in the future from your/custom deletion logic, once users will adjust their demands. You won't be limited by NHibernate...
I have two tables as follows:
department(alpha, college, etc.)
course(id, alpha, college, title, etc.)
College and alpha are present in both tables by design. I decided to de-normalize a little because the college and alpha are always desired when viewing a course.
I have one trigger that executes after the department table is updated so that it updates all rows in the course table with the new alpha and college values. I also have a trigger that executes before updating the course table to make sure that the alpha-college pair that the user submitted in his or her edits exists in the department table; if the pair isn't there it raises and application error.
These triggers conflict. The second one checks that the new values for the department table are in their, but they aren't yet so it fails like it should.
Is it possible to ignore the second trigger if the first trigger is executed first? I really don't want to execute the second trigger in this case, since I know the values are in the first table. If that's not possible, is there a better way to do this without changing my schema?
Your second trigger sounds like nothing more than a foreign key. Drop it and create a foreign key constraint on course instead. That works in my tests.
However, it seems like unnecessary work to support a denormalization that provides little benefit. If you just want to write simple queries, create a view that joins the two tables and use that in your queries. If you are concerned about the join performance, I doubt very much that it will be a problem, unless you are missing obvious indexes on the tables.
I would sincerely recommend removing your trigger approach all together since it's burdened by dirty reads. Whenever I faced a challenge such as this I would implement the DML using Stored Procedures only. You get all the advantages of triggers without the headaches if implemented properly.
If your fear is you want to make sure all updates to the department table follow your logic as do changes in course, remove update permissions to any user except the owner of the stored procedure. This ensures the only caller who can modify that table is the stored procedure you control and understand. And by coincidence, it becomes the only way to update the tables.
Just $0.02
Like most other cases implemented with triggers, you can see the burden here because the data-model itself has defects.
You can implement the same logic as below and maintain all rules using PK and FK constraints.
---Department references College...
Create table department(
department_id number primary key,
aplha varchar2(20) not null,
college varchar2(20) not null
);
***--Course belongs to a department.. so should be a child of department.
--If it's possible for different depts to give the same course (IT and CS),
--you'll have
--a dept_course_asc table***
Create table Course(
course_id number primary key
department_id number references department(department_id),
course_name varchar2(100) not null
);
if you have a student table, you'll associate it with the course table with another student_table association table.
It might appear these are a lot more tables than you intially showed, but if you want to avoid data redundancies and don't want to have the burden of updating columns in all tables whenever they change in the parent table, the above model is the only way to go.
There are two possible solutions to the problem.
Solution 1: Using DEFERRABLE FK constraint.
This solution is only possible if (alpha, college) combination is unique and can be defined as a PK for department table.
In this case, you do not need the trigger on the course table.
Instead, you define a DEFERRABLE FK (alpha, college) on course that referenced the department table.
And before the update on department you must execute SET CONSTRAINT ... DEFERRED statement see documentation.
Then the FK will be not verified until the commit.
Solution 2: Using system context
You switch off the second trigger using the local system_context.
The context must be created first. see User Created Contexts
The trigger on department set a variable in the context to a some value.
And in the second trigger on courses you check the value of the variable in the context
Do you ever use a separate table for "generating" artificial primary keys for DB (and why)? What I mean is to have a table with two columns, table name and current ID - with which you could get new "ID" for some table by simply locking the row with that table name, getting the current value of the key, increment it by one, and unlock the row. Why would you prefer this over standard integer identity column?
P.S. The "idea" is from Fowlers Patterns of Enterprise Application Architecture, btw...
This is called Hi/Lo assignment.
You would do this having either a trigger on INSERT on your tables getting the ID from this table and incrementing it before or after you get your ID, depending of your choice.
This is commonly used when you have to deal with multiple database engines. The autoincremental identifier in Oracle is through a SEQUENCE, which you increment with SEQUENCE.NEXTVALUE from within a BEFORE INSERT TRIGGER on your data table.
Oppositly, SQL Server has IDENTITY columns, autoincrementing natively and this is managed by the DBE itself.
In order for your software to work on both DBE, you have to come to some sort of a standard, then the most common "standard" used for this is the Hi/Lo assignment to the primary key.
This is one approach amongst others. These days, with ORM Mapping tools such as NHibernate, it is offered through configuration so that you need less to care on both the application and the database sides.
EDIT #1
Because this kind of maneuvre can't be used for a global scope, you'd have to have such a table per database, or database schema. This way, each schema is indenpendant from the other. However, data in one schema can't implicitly be moved toward another with the same key, as it would perhaps be conflicted with an already existing row.
As for a security schema, it accesses the same database as another schema or user, so no additional table should exist for specific security schema.
Whenever you can use sql server's identity or guid features, you should. However, there are a few situations where this may not be possible.
One example is that sql server only allows one identity column per table. Rarely, a table will have records that need both a private id and a public id, and a limit of one identity column means generating both as integers can be a pain. You could always use a guid for one, but you want the integer on the private id for speed and you may also want the public id to be more human readable than a guid.
In this situation, an extra table for generating the ids can make sense. However, I'd do it a bit differently. Still have two columns in the table, but make one "shadow" or "Id mapping" table for every real table. One of the columns will be your private id (unique constraint) and one will be your public id (identity with maybe an increment value of '7' or '13' or other number that's less obvious than '1').
The key difference here is that you don't want to do the locking yourself. Let sql server handle it.
The only time I have ever used this is when I had an application in BTrieve, and it didn't have an identity column. And I should also say when they tried to use this table, it caused a massive slow down when they tried to import data, because of all the extra reads and writes. My friend looked at it and rewrote how they did it to speed it up, but the moral of the story is that if you do something like this incorrectly, there can be brutal consequences.
Personally, I don't think I would ever want to do this. There is too much possibility for error. Two people try and use the same key, because they forgot to lock the table before grabbing the id. This just seems like something that should be left up to the RDBMS if at all possible. As Will brought up, it's easy to minimize this situation, but if you don't know what you are doing it can happen.
You wouldn't prefer it at all.
Whatever you gain by using the pattern or becoming DB agnostic, you'll lose in headaches, support and performance.
locking the row with that table name,
getting the current value of the key,
increment it by one, and unlock the
row
This sounds simple, doesn't it?
UPDATE TableOfId
SET Id += 1
OUTPUT Inserted.Id
WHERE Name = #Name;
In reality, its a disaster. No activity occurs in the application as a standalone operation: all operations are part of transactions. One cannot simply 'unlock' the row because the 'unlock' will actually occur only at commit time. Which means that all transactions that need an Id on a table are serialized and only one can proceed at any time. It also means that transaction that access more than one table will likely deadlock on updating the table of Ids because enforcing the 'get the next Id' update order is hard in practice.
To avoid complete serialization one needs to obtain the Ids on separate, standalone, transactions that can commit immediately (usually implicit auto-commit transaction on the UPDATE itself). But this complicates the application logic tremendously. Every operation needs to maintain two separate connections to the database, one to do the normal transaction logic and another one to obtain the needed Ids. Even then, the update of Ids can become such a hot spot that it can still cause visible contention and blocking (similar to the dreaded 'update page hit count +1' prevalent on web apps).
In short: use IDENTITY. The identity generation is optimized for high concurrency.
I have seen this pattern used when data created in one database needs to be migrated, backed-up, clustered or staged to another database. In this situation, first of all your want to ensure the primary keys will not need to change. Secondly the foreign keys. Thirdly, externally exposed keys or durable references.
I need help figuring out a workflow and I'm not sure how to go about it... Let's say I'm transforming (ETL?) data from Table A to Table B. Table A has a composite primary key A.a+A.b+A.c, while Table B has just an automatically populated identity column. How can I map the composite keys from A back to the identities created when inserting into B?
Preferably I would like to not have any columns in table B related to A's composite key because there are many other tables that need to undergo the same operation but don't have the same composite key structure.
If I understand you correctly, you can't relate records from table B back to the records of table A after the transformation unless you somehow capture a mapping between A's composite key and B's identifier during the transformation.
You could add a column to A and pre-compute the identifiers to be used when inserting into B. Then you would have a mapping. This could also be done using a separate mapping table, if you don't want to add a column to A.
If you don't want to override the default assignment of identifiers, then you will have to capture them during the load. Oracle provides the returning clause for insert in PL/SQL for this purpose. I'm not sure about SQL Server. It may also be possible to accomplish this by using a trigger on B to insert into a separate mapping table or update a column in A. Though that's likely to slow down your load considerably.
If nothing else, you could create additional columns in B to hold the keys of A during the load, query out the mappings into a separate table afterwards, and then drop the extra columns.
I hope that helps.
Ask yourself exactly what you need the original keys for. The answer may vary depending on the source system. This may lead you to maintain a "source system" column and a "original source keys" column. The latter may need to be a comma-delimited list of the original keys.
Or, you may find that you never actually need to map back, so don't need to keep anything.
I've got a table structure I'm not really certain of how to create the best way.
Basically I have two tables, tblSystemItems and tblClientItems. I have a third table that has a column that references an 'Item'. The problem is, this column needs to reference either a system item or a client item - it does not matter which. System items have keys in the 1..2^31 range while client items have keys in the range -1..-2^31, thus there will never be any collisions.
Whenever I query the items, I'm doing it through a view that does a UNION ALL between the contents of the two tables.
Thus, optimally, I'd like to make a foreign key reference the result of the view, since the view will always be the union of the two tables - while still keeping IDs unique. But I can't do this as I can't reference a view.
Now, I can just drop the foreign key, and all is well. However, I'd really like to have some referential checking and cascading delete/set null functionality. Is there any way to do this, besides triggers?
sorry for the late answer, I've been struck with a serious case of weekenditis.
As for utilizing a third table to include PKs from both client and system tables - I don't like that as that just overly complicates synchronization and still requires my app to know of the third table.
Another issue that has arisen is that I have a third table that needs to reference an item - either system or client, it doesn't matter. Having the tables separated basically means I need to have two columns, a ClientItemID and a SystemItemID, each having a constraint for each of their tables with nullability - rather ugly.
I ended up choosing a different solution. The whole issue was with easily synchronizing new system items into the tables without messing with client items, avoiding collisions and so forth.
I ended up creating just a single table, Items. Items has a bit column named "SystemItem" that defines, well, the obvious. In my development / system database, I've got the PK as an int identity(1,1). After the table has been created in the client database, the identity key is changed to (-1,-1). That means client items go in the negative while system items go in the positive.
For synchronizations I basically ignore anything with (SystemItem = 1) while synchronizing the rest using IDENTITY INSERT ON. Thus I'm able to synchronize while completely ignoring client items and avoiding collisions. I'm also able to reference just one "Items" table which covers both client and system items. The only thing to keep in mind is to fix the standard clustered key so it's descending to avoid all kinds of page restructuring when the client inserts new items (client updates vs system updates is like 99%/1%).
You can create a unique id (db generated - sequence, autoinc, etc) for the table that references items, and create two additional columns (tblSystemItemsFK and tblClientItemsFk) where you reference the system items and client items respectively - some databases allows you to have a foreign key that is nullable.
If you're using an ORM you can even easily distinguish client items and system items (this way you don't need to negative identifiers to prevent ID overlap) based on column information only.
With a little more bakcground/context it is probably easier to determine an optimal solution.
You probably need a table say tblItems that simply store all the primary keys of the two tables. Inserting items would require two steps to ensure that when an item is entered into the tblSystemItems table that the PK is entered into the tblItems table.
The third table then has a FK to tblItems. In a way tblItems is a parent of the other two items tables. To query for an Item it would be necessary to create a JOIN between tblItems, tblSystemItems and tblClientItems.
[EDIT-for comment below] If the tblSystemItems and tblClientItems control their own PK then you can still let them. You would probably insert into tblSystemItems first then insert into tblItems. When you implement an inheritance structure using a tool like Hibernate you end up with something like this.
Add a table called Items with a PK ItemiD, And a single column called ItemType = "System" or "Client" then have ClientItems table PK (named ClientItemId) and SystemItems PK (named SystemItemId) both also be FKs to Items.ItemId, (These relationships are zero to one relationships (0-1)
Then in your third table that references an item, just have it's FK constraint reference the itemId in this extra (Items) table...
If you are using stored procedures to implement inserts, just have the stored proc that inserts items insert a new record into the Items table first, and then, using the auto-generated PK value in that table insert the actual data record into either SystemItems or ClientItems (depending on which it is) as part of the same stored proc call, using the auto-generated (identity) value that the system inserted into the Items table ItemId column.
This is called "SubClassing"
I've been puzzling over your table design. I'm not certain that it is right. I realise that the third table may just be providing detail information, but I can't help thinking that the primary key is actually the one in your ITEM table and the FOREIGN keys are the ones in your system and client item tables. You'd then just need to do right outer joins from Item to the system and client item tables, and all constraints would work fine.
I have a similar situation in a database I'm using. I have a "candidate key" on each table that I call EntityID. Then, if there's a table that needs to refer to items in more than one of the other tables, I use EntityID to refer to that row. I do have an Entity table to cross reference everything (so that EntityID is the primary key of the Entity table, and all other EntityID's are FKs), but I don't find myself using the Entity table very often.