how do I model subtyping in a relational schema? - database

Is the following DB-schema ok?
REQUEST-TABLE
REQUEST-ID | TYPE | META-1 | META-2 |
This table stores all the requests each of which has a unique REQUEST-ID. The TYPE is either A, B or C. This will tell us which table contains the specific request parameters. Other than that we have the tables for the respective types. These tables store the parameters for the respective requests. META-1 are just some additional info like timestamps and stuff.
TYPE-A-TABLE
REQUEST-ID | PARAM_X | PARAM_Y | PARAM_Z
TYPE-B-TABLE
REQUEST-ID | PARAM_I | PARAM_J
TYPE-C-TABLE
REQUEST-ID | PARAM_L | PARAM_M | PARAM_N | PARAM_O | PARAM_P | PARAM_Q
The REQUEST-ID is the foreign key into the REQUEST-TABLE.
Is this design normal/best-practice? Or is there a better/smarter way? What are the alternatives?
It somehow feels strange to me, having to do a query on the REQUEST-TABLE to find out which TYPE-TABLE contains the information I need, to then do the actual query I'm interested in.
For instance imagine a method which given an ID should retrieve the parameters. This method would need to do 2 db-access.
- Find correct table to query
- Query table to get the parameters
Note: In reality we have like 10 types of requests, i.e. 10 TYPE tables. Moreover there are many entries in each of the tables.
Meta-Note: I find it hard to come up with a proper title for this question (one that is not overly broad). Please feel free to make suggestions or edit the title.

For exclusive types, you just need to make sure rows in one type table can't reference rows in any other type table.
create table requests (
request_id integer primary key,
request_type char(1) not null
-- You could also use a table to constrain valid types.
check (request_type in ('A', 'B', 'C', 'D')),
meta_1 char(1) not null,
meta_2 char(1) not null,
-- Foreign key constraints don't reference request_id alone. If they
-- did, they might reference the wrong type.
unique (request_id, request_type)
);
You need that apparently redundant unique constraint so the pair of columns can be the target of a foreign key constraint.
create table type_a (
request_id integer not null,
request_type char(1) not null default 'A'
check (request_type = 'A'),
primary key (request_id),
foreign key (request_id, request_type)
references requests (request_id, request_type) on delete cascade,
param_x char(1) not null,
param_y char(1) not null,
param_z char(1) not null
);
The check() constraint guarantees that only 'A' can be stored in the request_type column. The foreign key constraint guarantees that each row will reference an 'A' row in the table "requests". Other type tables are similar.
create table type_b (
request_id integer not null,
request_type char(1) not null default 'B'
check (request_type = 'B'),
primary key (request_id),
foreign key (request_id, request_type)
references requests (request_id, request_type) on delete cascade,
param_i char(1) not null,
param_j char(1) not null
);
Repeat for each type table.
I usually create one updatable view for each type. The views join the table "requests" with one type table. Application code uses the views instead of the base tables. When I do that, it usually makes sense to revoke privileges on the base tables. (Not shown.)
If you don't know which type something is, then there's no alternative to running one query to get the type, and another query to select or update.
select request_type from requests where request_id = 42;
-- Say it returns 'A'. I'd use the view type_a_only.
update type_a_only
set param_x = '!' where request_id = 42;
In my own work, it's pretty rare to not know the type, but it does happen sometimes.

The phrase you may be looking for is "how do I model inheritance in a relational schema". It's been asked before. Whilst this is a reference to object oriented software design, the basic question is the same: how do I deal with data where there is a "x is a type of y" relationship.
In your case, "request" is the abstract class, and typeA, TypeB etc. are the subclasses.
Your solution is one of the classic answers - "table per subclass". It's clean and easy to maintain, but does mean you can have multiple database access requests to retrieve the data.

Related

Apache Derby Tables, Writing a query that indicates primary and foreign keys in a table

I am new to Apache Derby so this question may seem remedial but I am trying to find a way to write a select statement from the ij prompt that will show all of the fields along with their data types, i.e. CHAR(), VARCHAR(), INT, etc. and indicate whether they are a foreign or primary key. The only thing I have come across so far is the describe table utility. That provides some useful information but it does not, as far as I can tell, give any info pertaining to primary and foreign keys. I'm thinking that I need to be looking in the SYS.SYSCONSTRAINTS table but when I ran a simple select statement on that table the results I got back were critic. The output that I am hoping to get would be something like this
==================================================
FIELD NAME FIELD TYPE CONSTRAINTS
==================================================
FIRST NAME VARCHAR(15) NOT NULL
LAST NAME VARCHAR(15) NOT NULL
MIDDLE INTIAL CHAR(1)
BIRTH DATE DATE NOT NULL
DEPT NUMBER CHAR(3) NOT NULL FOREIGN - DEPARTMENT
SSN CHAR(9) NOT NULL PRIMARY
I assume the answer, if there is one is in joining two or more of the sys. tables but I have not been able to find out which ones.
Does anybody know how to do this?
Thank you.

Database Design For Multiple Product Types with variable attributes

I have a database containing different product types. Each type contains fields that differ greatly with each other. The first type of product, is classified in three categories. The second type of product, is classified in three categories. But the third and the fourth one, is not classified in anything.
Each product can have any number of different properties.
I am using database model which is basically like below:
(see the link)
https://www.damirsystems.com/static/img/product_model_01.png
I have a huge database, containing about 500000 products in product table.
So when I am going to fetch a product from database with all its attributes, or going to search product filtering by attributes, it affects performance badly.
Could anyone help me what will be the tables structure in sql or do some more indexing or any feasible solution for this problem. Because different ecommerce sites are using this kind of database and working fine with huge different types of products.
EDIT : The link to the image (on my site) is blocked, so here is the image
The model you link to looks like partial entity–attribute–value (EAV) model. EAV is very flexible, but offers poor data integrity, and is cumbersome and usually inefficient. It's not really in the spirit of the relational model. Having worked on some large e-commerce sites, i can tell you that this is not standard or good database design practice in this field.
If you don't have an enormous number of types of product (up to tens, but not hundreds) then you can handle this using one of a two common approaches.
The first approach is simply to have a single table for products, with columns for all the attributes that might be needed in each different kind of product. You use whichever columns are appropriate to each kind of product, and leave the rest null. Say you sell books, music, and video:
create table Product (
id integer primary key,
name varchar(255) not null,
type char(1) not null check (type in ('B', 'M', 'V')),
number_of_pages integer, -- book only
duration_in_seconds integer, -- music and video only
classification varchar(2) check (classification in ('U', 'PG', '12', '15', '18')) -- video only
);
This has the advantage of being simple, and of not requiring joins. However, it doesn't do a good job of enforcing integrity on your data (you could have a book without a number of pages, for example), and if you have more than a few types of products, the table will get highly unwieldy.
You can plaster over the integrity problem with table-level check constraints that require each type of products to have values certain columns, like this:
check ((case when type = 'B' then (number_of_pages is not null) else true end)))
(hat tip to Joe Celko there - i looked up how to do logical implication in SQL, and found an example where he does it with this construction to construct a very similar check constraint!)
You might even say:
check ((case when type = 'B' then (number_of_pages is not null) else (number_of_pages is null) end)))
To ensure that no row has a value in a column not appropriate to its type.
The second approach is to use multiple tables: one base table holding columns common to all products, and one auxiliary table for each type of product holding columns specific to products of that type. So:
create table Product (
id integer primary key,
type char(1) not null check (type in ('B', 'M', 'V')),
name varchar(255) not null
);
create table Book (
id integer primary key references Product,
number_of_pages integer not null
);
create table Music (
id integer primary key references Product,
duration_in_seconds integer not null
);
create table Video (
id integer primary key references Product,
duration_in_seconds integer not null,
classification varchar(2) not null check (classification in ('U', 'PG', '12', '15', '18'))
);
Note that the auxiliary tables have the same primary key as the main table; their primary key column is also a foreign key to the main table.
This approach is still fairly straightforward, and does a better job of enforcing integrity. Queries will typically involve joins, though:
select
p.id,
p.name
from
Product p
join Book b on p.id = b.id
where
b.number_of_pages > 300;
Integrity is still not perfect, because it's possible to create a row in an auxiliary tables corresponding to a row of the wrong type in the main table, or to create rows in multiple auxiliary tables corresponding to a single row in the main table. You can fix that by refining the model further. If you make the primary key a composite key which includes the type column, then the type of a product is embedded in its primary key (a book would have a primary key like ('B', 1001)). You would need to introduce the type column into the auxiliary tables so that they could have foreign keys pointing to the main table, and that point you could add a check constraint in each auxiliary table that requires the type to be correct. Like this:
create table Product (
type char(1) not null check (type in ('B', 'M', 'V')),
id integer not null,
name varchar(255) not null,
primary key (type, id)
);
create table Book (
type char(1) not null check (type = 'B'),
id integer not null,
number_of_pages integer not null,
primary key (type, id),
foreign key (type, id) references Product
);
This also makes it easier to query the right tables given only a primary key - you can immediately tell what kind of product it refers to without having to query the main table first.
There is still a potential problem of duplication of columns - as in the schema above, where the duration column is duplicated in two tables. You can fix that by introducing intermediate auxiliary tables for the shared columns:
create table Media (
type char(1) not null check (type in ('M', 'V')),
id integer not null,
duration_in_seconds integer not null,
primary key (type, id),
foreign key (type, id) references Product
);
create table Music (
type char(1) not null check (type = 'M'),
id integer not null,
primary key (type, id),
foreign key (type, id) references Product
);
create table Video (
type char(1) not null check (type = 'V'),
id integer not null,
classification varchar(2) not null check (classification in ('U', 'PG', '12', '15', '18')),
primary key (type, id),
foreign key (type, id) references Product
);
You might not think that was worth the extra effort. However, what you might consider doing is mixing the two approaches (single table and auxiliary table) to deal with situations like this, and having a shared table for some similar kinds of products:
create table Media (
type char(1) not null check (type in ('M', 'V')),
id integer not null,
duration_in_seconds integer not null,
classification varchar(2) check (classification in ('U', 'PG', '12', '15', '18')),
primary key (type, id),
foreign key (type, id) references Product,
check ((case when type = 'V' then (classification is not null) else (classification is null) end)))
);
That would be particularly useful if there were similar kinds of products that were lumped together in the application. In this example, if your shopfront presents audio and video together, but separately to books, then this structure could support much more efficient retrieval than having separate auxiliary tables for each kind of media.
All of these approaches share a loophole: it's still possible to create rows in the main table without corresponding rows in any auxiliary table. To fix this, you need a second set of foreign key constraints, this time from the main table to the auxiliary tables. This is particular fun for couple of reasons: you want exactly one of the possible foreign key relationships to be enforced at once, and the relationship creates a circular dependency between rows in the two tables. You can achieve the former using some conditionals in check constraints, and the latter using deferrable constraints. The auxiliary tables can be the same as above, but the main table needs to grow what i will tentatively call 'type flag' columns:
create table Product (
type char(1) not null check (type in ('B', 'M', 'V')),
id integer not null,
is_book char(1) null check (is_book is not distinct from (case type when 'B' then type else null end)),
is_music char(1) null check (is_music is not distinct from (case type when 'M' then type else null end)),
is_video char(1) null check (is_video is not distinct from (case type when 'V' then type else null end)),
name varchar(255) not null,
primary key (type, id)
);
The type flag columns are essentially repetitions of the type column, one for each potential type, which are set if and only if the product is of that type (as enforced by those check constraints). These are real columns, so values will have to be supplied for them when inserting rows, even though the values are completely predictable; this is a bit ugly, but hopefully not a showstopper.
With those in place, then after all the tables are created, you can form foreign keys using the type flags instead of the type, pointing to specific auxiliary tables:
alter table Product add foreign key (is_book, id) references Book deferrable initially deferred;
alter table Product add foreign key (is_music, id) references Music deferrable initially deferred;
alter table Product add foreign key (is_video, id) references Video deferrable initially deferred;
Crucially, for a foreign key relationship to be enforced, all its constituent columns must be non-null. Therefore, for any given row, because only one type flag is non-null, only one relationship will be enforced. Because these constraints are deferrable, it is possible to insert a row into the main table before the required row in the auxiliary table exists. As long as it is inserted before the transaction is committed, it's all above board.

Associate many child tables to a parent table

I will do my best to lay this out in text. Essentially, we have an application that tracks actions performed by users. Each action has it's own table since each action has different parameters that need to be stored. This allows us to store those extra attributes and run analytics on the actions across multiple or single users rather easily. The actions are not really associated with each other other than by what user performed these actions.
Example:
ActionTableA Id | UserId | AttributeA | AttributeB
ActionTableB Id | UserId | AttributeC | AttributeD | AttributeE
ActionTableC Id | UserId | AttributeF
Next, we need to allocate a value to each action performed by the user and keep a running total of those values.
Example:
ValueTable: Id | UserId | Value | ActionType | ActionId
What would be the best way to link the value in the value table to the actual action performed? We know the action type (A, B, C) - but from a SQL design perspective, I cannot see a good way to have an indexed relationship between the Values of the actions in the ActionsTable and the actual actions themselves. The only thing that makes sense would be to modify the ValueTable to the following:
ValueTable
Id | UserId | Value | ActionType | ActionAId(FK Nullable) | ActionBId(FK Nullable) | ActionCId(FK Nullable)
But the problem I have with this that only one of the 3 actionTableId columns would have a value, the rest would be Null. Additionally, as action types are added, the columns in the value table would too. Lastly, to programatically find the data, I would either a) have to check the ActionType to get the appropriate column for the Id or b) scan the row and pick the non-null actionId.
Is there a better way/design or is this just 'the way it is' for this particular issue.
EDIT
Attached is a diagram of the above setup:
Sorry for the clarity issues, typing SQL questions is always challenging. So I think your comment gave me an idea of something... I could have an SystemActionId table that essentially has an auto-generated value
SystemActions: Id | Type
Then, each ActionTable would have an additional FK to the SystemAction table. Lastly, in the ValueTable - associate it to the SystemActions table as well. This would allow us to tie values to specific actions. I would need to join the action tables to the system actions table where
JOIN (((SystemActions.Id = ActionTableA.Id) JOIN (SystemActions.Id = ActionTableB.Id)) JOIN (SystemActions.Id = ActionTableC.Id)
crappy quick sql syntax
Is this what you were alluding to in the answer below? A snapshot of what that could potentially look like:
Your question is a little unclear, but it looks like you should either have a (denormalized) value column in each action table, or have an artificial key in the value table that is keyed to by each of the seperate action tables.
You have essentially a supertype/subtype structure, or an exclusive arc. Attributes common to all actions bubble "up" into the supertype (the table "actions"). Columns unique to each subtype bubble "down" into the distinct subtypes.
create temp table actions (
action_id integer not null,
action_type char(1) not null check (action_type in ('a', 'b', 'c')),
user_id integer not null, -- references users, not shown.
primary key (action_id),
-- Required for data integrity. See below.
unique (action_id, action_type)
);
create temp table ActionTableA (
action_id integer primary key,
-- default and check constraint guarantee that only an 'a' row goes
-- in this table.
action_type char(1) not null default 'a' check (action_type = 'a'),
-- FK guarantees that this row matches only an 'a' row in actions.
-- To make this work, you need a UNIQUE constraint on these two columns
-- in the table "actions".
foreign key (action_id, action_type)
references actions (action_id, action_type),
attributeA char(1) not null,
attributeB char(1) not null
);
-- ActionTableB and ActionTableC are similar.
create temp table ValueTable (
action_id integer primary key,
action_type char(1) not null,
-- Since this applies to all actions, FK should reference the supertype,
-- which is the table "actions". You can reference either action_id alone,
-- which has a PRIMARY KEY constraint, or the pair {action_id, action_type},
-- which has a UNIQUE constraint. Using the pair makes some kinds of
-- accounting queries easier and faster.
foreign key (action_id, action_type)
references actions (action_id, action_type),
value integer not null default 0 check (value >= 0)
);
To round this out, build updatable views that join the supertype to each subtype, and have all users use the views instead of the base tables.
I would just have a single table for actions, to be honest. Is there a reason (other than denormalization) for having multiple tables? Especially when it will increase the complexity of your business logic?
Are the attribute columns significant in the context of the schema? Could you compress it into an object storage column "attributes"?
Actions: actionID, type, attributes
I think you need something similar to an Audit Trail. Can we have a simple design so that all the actions will be captured in a singe table ?
If the way you want it to work is that for every time a user performs action A you insert a new row in table ActionTableA and a row in ValueTable, and having them both linked, why not have a value column in each action table? This would work only if you want to insert a new row each time the user performs the action rather than if you want to update the value if the user performs the same action again. It seems overly complicated to have a separate table for values if it can be stored in a column. On the other hand if a "value" is a set of different pieces of data (or if you want to have all values in one place) then you do need an extra table but I would still have a foreign key column pointing from the action tables to the value table.

Database design - composite key relationship issue

I had posted a similar question before, but this is more specific. Please have a look at the following diagram:
The explanation of this design is as follows:
Bakers produce many Products
The same Product can be produced by more than one Baker
Bakers change their pricing from time-to-time for certain (of their) Products
Orders can be created, but not necessarily finalised
The aim here is to allow the store manager to create an Order "Basket" based on whatever goods are required, and also allow the system being created to determine the best price at that time based on what Products are contained within the Order.
I therefore envisaged the ProductOrders table to initially hold the productID and associated orderID, whilst maintaining a null (undetermined) value for bakerID and pricingDate, as that would be determined and updated by the system, which would then constitute a finalised order.
Now that you have an idea of what I am trying to do, please advise me on how to to best set these relationships up.
Thank you!
If I understand correctly, an unfinalised order is not yet assigned a baker / pricing (meaning when an order is placed, no baker has yet been selected to bake the product).
In which case, the order is probably placed against the Products Table and then "Finalized" against the BakersProducts table.
A solution could be to give ProductsOrders 2 separate "ProductID's", one being for the original ordered ProductId (i.e. Non Nullable) - say ProductId, and the second being part of the Foreign key to the assigned BakersProducts (say ProductId2). Meaning that in ProductsOrders, the composite foreign keys BakerId, ProductId2 and PricingDate are all nullable, as they will only be set once the order is Finalized.
In order to remove this redundancy, what you might also consider is using surrogate keys instead of the composite keys. This way BakersProducts would have a surrogate PK (e.g. BakersProductId) which would then be referenced as a nullable FK in ProductsOrders. This would also avoid the confusion with the Direct FK in ProductsOrders to Product.ProductId (which from above, was the original Product line as part of the Order).
HTH?
Edit:
CREATE TABLE dbo.BakersProducts
(
BakerProductId int identity(1,1) not null, -- New Surrogate PK here
BakerId int not null,
ProductId int not null,
PricingDate datetime not null,
Price money not null,
StockLevel bigint not null,
CONSTRAINT PK_BakerProducts PRIMARY KEY(BakerProductId),
CONSTRAINT FK_BakerProductsProducts FOREIGN KEY(ProductId) REFERENCES dbo.Products(ProductId),
CONSTRAINT FK_BakerProductsBaker FOREIGN KEY(BakerId) REFERENCES dbo.Bakers(BakerId),
CONSTRAINT U_BakerProductsPrice UNIQUE(BakerId, ProductId, PricingDate) -- Unique Constraint mimicks the original PK for uniqueness ... could also use a unique index
)
CREATE TABLE dbo.ProductOrders
(
OrderId INT NOT NULL,
ProductId INT NOT NULL, -- This is the original Ordered Product set when order is created
BakerProductId INT NULL, -- This is nullable and gets set when Order is finalised with a baker
OrderQuantity BIGINT NOT NULL,
CONSTRAINT FK_ProductsOrdersBakersProducts FOREIGN KEY(BakersProductId) REFERENCES dbo.BakersProducts(BakerProductId)
.. Other Keys here
)

How to solve the A/B key problem?

I have a table my_table with these fields: id_a, id_b.
So this table basically can reference either an row from table_a with id_a, or an row from table_b with id_b. If I reference a row from table_a, id_b is NULL. If I reference a row from table_b, id_a is NULL.
Currently I feel this is my only/best option I have, so in my table (which has a lot more other fields, btw) I will live with the fact that always one field is NULL.
If you care what this is for: If id_a is specified, I'm linking to a "data type" record set in my meta database, that specifies a particular data type. like varchar(40), for example. But if id_b is specified, I'm linking to a relationship definition recordset that specifies details about an relationship (wheather it's 1:1, 1:n, linking what, with which constraints, etc.). The fields are called a little bit better, of course ;) ...just try to simplify it to the problem.
Edit: If it matters: MySQL, latest version. But don't want to constrain my design to MySQL specific code, as much as possible.
Are there better solutions?
A and B are disjoint subtypes in your model.
This can be implemented like this:
refs (
type CHAR(1) NOT NULL, ref INT NOT NULL,
PRIMARY KEY (type, ref),
CHECK (type IN ('A', 'B'))
)
table_a (
type CHAR(1) NOT NULL, id INT NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (type, id) REFERENCES refs (type, id),
CHECK (type = 'A'),
…)
table_b (
type CHAR(1) NOT NULL, id INT NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (type, id) REFERENCES refs (type, id) ON DELETE CASCADE,
CHECK (type = 'B'),
…)
mytable (
type CHAR(1) NOT NULL, ref INT NOT NULL,
FOREIGN KEY (type, ref) REFERENCES refs (type, id) ON DELETE CASCADE,
CHECK (type IN ('A', 'B')),
…)
Table refs constains all instances of both A and B. It serves no other purpose except policing referential integrity, it won't even participate in the joins.
Note that MySQL accepts CHECK constraints but does not enforce them. You will need to watch your types.
You also should not delete the records from table_a and table_b directly: instead, delete the records from refs which will trigger ON DELETE CASCADE.
Create a parent "super-type" table for both A and B. Reference that table in my_table.
Yes, there are better solutions.
However, since you didn't describe what you're allowed to change, it's difficult to know which alternatives could be used.
Principally, this "exclusive-or" kind of key reference means that A and B are actually two subclasses of a common superclass. You have several ways to changing the A and B tables to unify them into a single table.
One of which is to simply merge the A and B table into a big table.
Another of which is to have a superclass table with the common features of A and B as well as a subtype flag that says which subtype it is. This still involves a join with the subclass table, but the join has an explicit discriminator, and can be done "lazily" by the application rather than in the SQL.
I see no problem with your solution. However, I think you should add CHECK constraints to make sure that exactly one of the fields is null.
you know, it's hard to tell if there are any better solutions since you've stripped the question of all vital information. with the tiny amount that's still there i'd say that most better solutions involve getting rid of my_table.

Resources