Associate many child tables to a parent table - sql-server

I will do my best to lay this out in text. Essentially, we have an application that tracks actions performed by users. Each action has it's own table since each action has different parameters that need to be stored. This allows us to store those extra attributes and run analytics on the actions across multiple or single users rather easily. The actions are not really associated with each other other than by what user performed these actions.
Example:
ActionTableA Id | UserId | AttributeA | AttributeB
ActionTableB Id | UserId | AttributeC | AttributeD | AttributeE
ActionTableC Id | UserId | AttributeF
Next, we need to allocate a value to each action performed by the user and keep a running total of those values.
Example:
ValueTable: Id | UserId | Value | ActionType | ActionId
What would be the best way to link the value in the value table to the actual action performed? We know the action type (A, B, C) - but from a SQL design perspective, I cannot see a good way to have an indexed relationship between the Values of the actions in the ActionsTable and the actual actions themselves. The only thing that makes sense would be to modify the ValueTable to the following:
ValueTable
Id | UserId | Value | ActionType | ActionAId(FK Nullable) | ActionBId(FK Nullable) | ActionCId(FK Nullable)
But the problem I have with this that only one of the 3 actionTableId columns would have a value, the rest would be Null. Additionally, as action types are added, the columns in the value table would too. Lastly, to programatically find the data, I would either a) have to check the ActionType to get the appropriate column for the Id or b) scan the row and pick the non-null actionId.
Is there a better way/design or is this just 'the way it is' for this particular issue.
EDIT
Attached is a diagram of the above setup:
Sorry for the clarity issues, typing SQL questions is always challenging. So I think your comment gave me an idea of something... I could have an SystemActionId table that essentially has an auto-generated value
SystemActions: Id | Type
Then, each ActionTable would have an additional FK to the SystemAction table. Lastly, in the ValueTable - associate it to the SystemActions table as well. This would allow us to tie values to specific actions. I would need to join the action tables to the system actions table where
JOIN (((SystemActions.Id = ActionTableA.Id) JOIN (SystemActions.Id = ActionTableB.Id)) JOIN (SystemActions.Id = ActionTableC.Id)
crappy quick sql syntax
Is this what you were alluding to in the answer below? A snapshot of what that could potentially look like:

Your question is a little unclear, but it looks like you should either have a (denormalized) value column in each action table, or have an artificial key in the value table that is keyed to by each of the seperate action tables.

You have essentially a supertype/subtype structure, or an exclusive arc. Attributes common to all actions bubble "up" into the supertype (the table "actions"). Columns unique to each subtype bubble "down" into the distinct subtypes.
create temp table actions (
action_id integer not null,
action_type char(1) not null check (action_type in ('a', 'b', 'c')),
user_id integer not null, -- references users, not shown.
primary key (action_id),
-- Required for data integrity. See below.
unique (action_id, action_type)
);
create temp table ActionTableA (
action_id integer primary key,
-- default and check constraint guarantee that only an 'a' row goes
-- in this table.
action_type char(1) not null default 'a' check (action_type = 'a'),
-- FK guarantees that this row matches only an 'a' row in actions.
-- To make this work, you need a UNIQUE constraint on these two columns
-- in the table "actions".
foreign key (action_id, action_type)
references actions (action_id, action_type),
attributeA char(1) not null,
attributeB char(1) not null
);
-- ActionTableB and ActionTableC are similar.
create temp table ValueTable (
action_id integer primary key,
action_type char(1) not null,
-- Since this applies to all actions, FK should reference the supertype,
-- which is the table "actions". You can reference either action_id alone,
-- which has a PRIMARY KEY constraint, or the pair {action_id, action_type},
-- which has a UNIQUE constraint. Using the pair makes some kinds of
-- accounting queries easier and faster.
foreign key (action_id, action_type)
references actions (action_id, action_type),
value integer not null default 0 check (value >= 0)
);
To round this out, build updatable views that join the supertype to each subtype, and have all users use the views instead of the base tables.

I would just have a single table for actions, to be honest. Is there a reason (other than denormalization) for having multiple tables? Especially when it will increase the complexity of your business logic?
Are the attribute columns significant in the context of the schema? Could you compress it into an object storage column "attributes"?
Actions: actionID, type, attributes

I think you need something similar to an Audit Trail. Can we have a simple design so that all the actions will be captured in a singe table ?

If the way you want it to work is that for every time a user performs action A you insert a new row in table ActionTableA and a row in ValueTable, and having them both linked, why not have a value column in each action table? This would work only if you want to insert a new row each time the user performs the action rather than if you want to update the value if the user performs the same action again. It seems overly complicated to have a separate table for values if it can be stored in a column. On the other hand if a "value" is a set of different pieces of data (or if you want to have all values in one place) then you do need an extra table but I would still have a foreign key column pointing from the action tables to the value table.

Related

Bad design to compare to computed columns?

Using SQL Server I have a table with a computed column. That column concatenates 60 columns:
CREATE TABLE foo
(
Id INT NOT NULL,
PartNumber NVARCHAR(100),
field_1 INT NULL,
field_2 INT NULL,
-- and so forth
field_60 INT NULL,
-- and so forth up to field_60
)
ALTER TABLE foo
ADD RecordKey AS CONCAT (field_1, '-', field_2, '-', -- and so on up to 60
) PERSISTED
CREATE INDEX ix_foo_RecordKey ON dbo.foo (RecordKey);
Why I used a persisted column:
Not having the need to index 60 columns
To test to see if a current record exists by checking just one column
This table will contain no fewer than 20 million records. Adds/Inserts/updates happen a lot, and some binaries do tens of thousands of inserts/updates/deletes per run and we want these to be quick and live.
Currently we have C# code that manages records in table foo. It has a function which concatenates the same fields, in the same order, as the computed column. If a record with that same concatenated key already exists we might not insert, or we might insert but call other functions that we may not normally.
Is this a bad design? The big danger I see is if the code for any reason doesn't match the concatenation order of the computed column (if one is edited but not the other).
Rules/Requirements
We want to show records in JQGrid. We already have C# that can do so if the records come from a single table or view
We need the ability to check two records to verify if they both have the same values for all of the 60 columns
A better table design would be
parts table
-----------
id
partnumber
other_common_attributes_for_all_parts
attributes table
----------------
id
attribute_name
attribute_unit (if needed)
part_attributes table
---------------------
part_id (foreign key to parts)
attribute_id (foreign key to attributes)
attribute value
It looks complicated but due to proper indexing this is super fast even if part_attributes contain billions of records!

How to use foreign key in two tables based on a flag column?

I have a parent table Tree and two child tables Post and Department.
Based on the Flag column this relation must be set.
How can I do this?
You cannot do that with foreign keys. You could implement a trigger which would check for the ReferenceID presence either on the Post or on the Department table based oh the Flag column.
Although the best approach would be to change your design to have 2 nullable columns as follows, and ensuring only one of them has a value:
CREATE TABLE Tree (
ID Integer NOT NULL,
PostID Integer REFERENCES Post(ID),
DepartmentID Integer REFERENCES Department(ID),
Flag INTEGER NOT NULL
)

Tricky constraint combining 2 columns

I need to be sure about integrity of data in MSSQL database. My data model contains two important fields which are foreign keys to another tables. For example TripId and ReservationId (random names so don't bother about them).
I need the ability to insert data if:
- ReservationId and TripId are not null
- ReservationId is null and TripId is not null
-ReservationId is not null and TripId is null
It is tricky, because I need to reject inserts when one of Id was used in antoher combination, so for example:
My db contains record with RES111 and TRIP666. I must be able to insert another record with the same Ids for reservation and trip.
I mustn't insert data which contains only one ReservationId or TripId or another combination( for example reject: RES111 and TRIP777 must be rejected)
The same when one Id is provided, for example ReservationId.
Inserts containing used ReservationId with any tripId must be rejected.
I can provide such filtering in application code but it has to be done on database level
You could write a conditional insert
For Example
insert into tblname (value, name)
select 'foo', 'bar'
where not exists (select 1 from table where null IN (ReservationId, TripId));
Of course you will need to tailor this to what you need exactly.
The basic concept is, as long as the subquery returns a result, the insert will happen, so the subquery can be a really complicated constrained or check.
You could do this either with a Trigger, or with a CHECK CONSTRAINT that calls a UDF that encapsulates the logic you want to enforce.

how do I model subtyping in a relational schema?

Is the following DB-schema ok?
REQUEST-TABLE
REQUEST-ID | TYPE | META-1 | META-2 |
This table stores all the requests each of which has a unique REQUEST-ID. The TYPE is either A, B or C. This will tell us which table contains the specific request parameters. Other than that we have the tables for the respective types. These tables store the parameters for the respective requests. META-1 are just some additional info like timestamps and stuff.
TYPE-A-TABLE
REQUEST-ID | PARAM_X | PARAM_Y | PARAM_Z
TYPE-B-TABLE
REQUEST-ID | PARAM_I | PARAM_J
TYPE-C-TABLE
REQUEST-ID | PARAM_L | PARAM_M | PARAM_N | PARAM_O | PARAM_P | PARAM_Q
The REQUEST-ID is the foreign key into the REQUEST-TABLE.
Is this design normal/best-practice? Or is there a better/smarter way? What are the alternatives?
It somehow feels strange to me, having to do a query on the REQUEST-TABLE to find out which TYPE-TABLE contains the information I need, to then do the actual query I'm interested in.
For instance imagine a method which given an ID should retrieve the parameters. This method would need to do 2 db-access.
- Find correct table to query
- Query table to get the parameters
Note: In reality we have like 10 types of requests, i.e. 10 TYPE tables. Moreover there are many entries in each of the tables.
Meta-Note: I find it hard to come up with a proper title for this question (one that is not overly broad). Please feel free to make suggestions or edit the title.
For exclusive types, you just need to make sure rows in one type table can't reference rows in any other type table.
create table requests (
request_id integer primary key,
request_type char(1) not null
-- You could also use a table to constrain valid types.
check (request_type in ('A', 'B', 'C', 'D')),
meta_1 char(1) not null,
meta_2 char(1) not null,
-- Foreign key constraints don't reference request_id alone. If they
-- did, they might reference the wrong type.
unique (request_id, request_type)
);
You need that apparently redundant unique constraint so the pair of columns can be the target of a foreign key constraint.
create table type_a (
request_id integer not null,
request_type char(1) not null default 'A'
check (request_type = 'A'),
primary key (request_id),
foreign key (request_id, request_type)
references requests (request_id, request_type) on delete cascade,
param_x char(1) not null,
param_y char(1) not null,
param_z char(1) not null
);
The check() constraint guarantees that only 'A' can be stored in the request_type column. The foreign key constraint guarantees that each row will reference an 'A' row in the table "requests". Other type tables are similar.
create table type_b (
request_id integer not null,
request_type char(1) not null default 'B'
check (request_type = 'B'),
primary key (request_id),
foreign key (request_id, request_type)
references requests (request_id, request_type) on delete cascade,
param_i char(1) not null,
param_j char(1) not null
);
Repeat for each type table.
I usually create one updatable view for each type. The views join the table "requests" with one type table. Application code uses the views instead of the base tables. When I do that, it usually makes sense to revoke privileges on the base tables. (Not shown.)
If you don't know which type something is, then there's no alternative to running one query to get the type, and another query to select or update.
select request_type from requests where request_id = 42;
-- Say it returns 'A'. I'd use the view type_a_only.
update type_a_only
set param_x = '!' where request_id = 42;
In my own work, it's pretty rare to not know the type, but it does happen sometimes.
The phrase you may be looking for is "how do I model inheritance in a relational schema". It's been asked before. Whilst this is a reference to object oriented software design, the basic question is the same: how do I deal with data where there is a "x is a type of y" relationship.
In your case, "request" is the abstract class, and typeA, TypeB etc. are the subclasses.
Your solution is one of the classic answers - "table per subclass". It's clean and easy to maintain, but does mean you can have multiple database access requests to retrieve the data.

Cascade UPDATE to related objects

I've set up my database and application to soft delete rows. Every table has an is_active column where the values should be either TRUE or NULL. The problem I have right now is that my data is out of sync because unlike a DELETE statement, setting a value to NULL doesn't cascade to rows in separate tables for which the "deleted" row in another table is a foreign key.
I have already taken measures to correct the data by finding inactive rows from the source table and manually setting related rows in other tables to be inactive as well. I recognize that I could do this at the application level (I'm using Django/Python for this project), but I feel like this should be a database process. Is there a way to utilize something like PostgreSQL's ON UPDATE constraint so that when a row has is_active set to NULL, all rows in separate tables referencing the updated row as a foreign key automatically have is_active set to NULL as well?
Here's an example:
An assessment has many submissions. If the assessment is marked inactive, all submissions related to it should also be marked inactive.
To my mind, it doesn't make sense to use NULL to represent a Boolean value. The semantics of "is_active" suggest that the only sensible values are True and False. Also, NULL interferes with cascading updates.
So I'm not using NULL.
First, create the "parent" table with both a primary key and a unique constraint on the primary key and "is_active".
create table parent (
p_id integer primary key,
other_columns char(1) default 'x',
is_active boolean not null default true,
unique (p_id, is_deleted)
);
insert into parent (p_id) values
(1), (2), (3);
Create the child table with an "is_active" column. Declare a foreign key constraint referencing the columns in the parent table's unique constraint (last line in the CREATE TABLE statement above), and cascade updates.
create table child (
p_id integer not null,
is_active boolean not null default true,
foreign key (p_id, is_active) references parent (p_id, is_active)
on update cascade,
some_other_key_col char(1) not null default '!',
primary key (p_id, some_other_key_col)
);
insert into child (p_id, some_other_key_col) values
(1, 'a'), (1, 'b'), (2, 'a'), (2, 'c'), (2, 'd'), (3, '!');
Now you can set the "parent" to false, and that will cascade to all referencing tables.
update parent
set is_active = false
where p_id = 1;
select *
from child
order by p_id;
p_id is_active some_other_key_col
--
1 f a
1 f b
2 t a
2 t c
2 t d
3 t !
Soft deletes are a lot simpler and have much better semantics if you implement them as valid-time state tables. FWIW, I think the terms soft delete, undelete, and undo are all misleading in this context, and I think you should avoid them.
PostgreSQL's range data types are particularly useful for this kind of work. I'm using date ranges, but timestamp ranges work the same way.
For this example, I'm treating only "parent" as a valid-time state table. That means that invalidating a particular row (soft deleting a particular row) also invalidates all the rows that reference it through foreign keys. It doesn't matter whether they reference it directly or indirectly.
I'm not implementing soft deletes on "child". I can do that, but I think that would make the essential technique unreasonably hard to understand.
create extension btree_gist; -- Necessary for the kind of exclusion
-- constraint below.
create table parent (
p_id integer not null,
other_columns char(1) not null default 'x',
valid_from_to daterange not null,
primary key (p_id, valid_from_to),
-- No overlapping date ranges for a given value of p_id.
exclude using gist (p_id with =, valid_from_to with &&)
);
create table child (
p_id integer not null,
valid_from_to daterange not null,
foreign key (p_id, valid_from_to) references parent on update cascade,
other_key_columns char(1) not null default 'x',
primary key (p_id, valid_from_to, other_key_columns),
other_columns char(1) not null default 'x'
);
Insert some sample data. In PostgreSQL, the daterange data type has a special value 'infinity'. In this context, it means that the row that has the value 1 for "parent"."p_id" is valid from '2015-01-01' until forever.
insert into parent values
(1, 'x', daterange('2015-01-01', 'infinity'));
insert into child values
(1, daterange('2015-01-01', 'infinity'), 'a', 'x'),
(1, daterange('2015-01-01', 'infinity'), 'b', 'y');
This query will show you the joined rows.
select *
from parent p
left join child c
on p.p_id = c.p_id
and p.valid_from_to = c.valid_from_to;
To invalidate a row, update the date range. This row (below) was valid from '2015-01-01' to '2015-01-31'. That is, it was soft deleted on 2015-01-31.
update parent
set valid_from_to = daterange('2015-01-01', '2015-01-31')
where p_id = 1 and valid_from_to = daterange('2015-01-01', 'infinity');
Insert a new valid row for p_id 1, and pick up the child rows that were invalidated on Jan 31.
insert into parent values (1, 'r', daterange(current_date, 'infinity'));
update child set valid_from_to = daterange(current_date, 'infinity')
where p_id = 1 and valid_from_to = daterange('2015-01-01', '2015-01-31');
Richard T Snodgrass's seminal book Developing Time-Oriented Database Applications in SQL is available free from his university web page.
You can use a trigger:
CREATE OR REPLACE FUNCTION trg_upaft_upd_trip()
RETURNS TRIGGER AS
$func$
BEGIN
UPDATE submission s
SET is_active = NULL
WHERE s.assessment_id = NEW.assessment_id
AND NEW.is_active IS NULL; -- recheck to be sure
RETURN NEW; -- call this BEFORE UPDATE
END
$func$ LANGUAGE plpgsql;
CREATE TRIGGER upaft_upd_trip
BEFORE UPDATE ON assessment
FOR EACH ROW
WHEN (OLD.is_active AND NEW.is_active IS NULL)
EXECUTE PROCEDURE trg_upaft_upd_trip();
Related:
How do I make a trigger to update a column in another table?
Be aware that a trigger has more possible points of failure than a FK constraints with ON UPDATE CASCADE ON DELETE CASCADE.
#Mike added a solution with a multi-column FK constraint I would consider as alternative.
Related answer on dba.SE:
Enforcing constraints “two tables away”
Related answer one week later:
Cross table constraints in PostgreSQL
This is more a schematic problem than a procedural one.
You may have dodged creating a solid definition of "what constitutes a record". At the moment you have object A that may be referenced by object B, and when A is "deleted" (has its is_active column set to FALSE, or NULL, in your current case) B is not reflecting that. It sounds like this is a single table (you only mention rows, not separate classes or tables...) and you have a hierarchical model formed by self-reference. If that is the case you can think of the problem in a few ways:
Recursive lineage
In this model you have one table that contains all the data in one place, whether its a parent, a child, etc. and you check the table for recursive references to traverse the tree.
It is tricky to do this properly in an ORM that lacks explicit support for this without accidentally writing routines that either:
iteratively pound the crap out of your DB by making at least one query per node or
pulling the entire table at once and traversing it in application code
It is, however, straightforward to do this in Postgres and let Django access it via a model over an unmanaged view on the lineage query you build. (I wrote a little about this once.) Under this model your query will descend the tree until it hits the first row of the current branch that is marked as not active and stop, thus effectively truncating all the rows below associated with that one (no need for propagating the is_active column!).
If this were, say, a blog entry + comments within the same structure (a fairly common CMS schema) then any row that is its own parent is a primary entity and anything that has a parent that is not itself is a comment. To remove a whole blog post + its children you mark just the blog post's row as inactive; to remove a thread within the comments mark as inactive the comment that begins that thread.
For a blog + comments type feature this is usually the most straightforward way to do things -- though most CMS systems get it wrong (but usually only in ways that matter if you start doing serious data stuff later, if you're just setting up some place for people to argue on the internet then Worse is Better).
Recursive lineage + External "record" definition
In this model you have your tree of nodes and your primary entities separated. The primary entities are marked as being active or not, and that attribute is common to all the elements that are related to it within the context of that primary entity (they exist and have a meaning independent of it). This means two tables, one for primary entities, and one for your tree of nodes.
Use this when you have something more interesting going on than simply threaded discussion. For example, a model of components where a tree of things may be aggregated separately into other larger things, and you need to have a way to mark those "other larger things" as active or not independently of the components themselves.
Further down the rabbit hole...
There are other takes on this idea, but they get increasingly non-trivial, which is probably not suitable. For example, consider a third basic take on this model where the hierarchy structure, the node bodies, and the primary entities are all separated into different tables. One node body might appear in multiple trees by reference, and multiple trees may be considered active or inactive in the context of a single primary entity, etc.
Consider heading this direction if your data is more complex. If you wind up really needing models this far decomposed ("normalized") then I would caution that any ORM is probably going to wind up being a lot more trouble than its worth -- you will start running headlong into the problem that ORMs are fundamentally leaky abstractions (1 object can never really equate to 1 table...).

Resources