How to solve the A/B key problem? - database

I have a table my_table with these fields: id_a, id_b.
So this table basically can reference either an row from table_a with id_a, or an row from table_b with id_b. If I reference a row from table_a, id_b is NULL. If I reference a row from table_b, id_a is NULL.
Currently I feel this is my only/best option I have, so in my table (which has a lot more other fields, btw) I will live with the fact that always one field is NULL.
If you care what this is for: If id_a is specified, I'm linking to a "data type" record set in my meta database, that specifies a particular data type. like varchar(40), for example. But if id_b is specified, I'm linking to a relationship definition recordset that specifies details about an relationship (wheather it's 1:1, 1:n, linking what, with which constraints, etc.). The fields are called a little bit better, of course ;) ...just try to simplify it to the problem.
Edit: If it matters: MySQL, latest version. But don't want to constrain my design to MySQL specific code, as much as possible.
Are there better solutions?

A and B are disjoint subtypes in your model.
This can be implemented like this:
refs (
type CHAR(1) NOT NULL, ref INT NOT NULL,
PRIMARY KEY (type, ref),
CHECK (type IN ('A', 'B'))
)
table_a (
type CHAR(1) NOT NULL, id INT NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (type, id) REFERENCES refs (type, id),
CHECK (type = 'A'),
…)
table_b (
type CHAR(1) NOT NULL, id INT NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (type, id) REFERENCES refs (type, id) ON DELETE CASCADE,
CHECK (type = 'B'),
…)
mytable (
type CHAR(1) NOT NULL, ref INT NOT NULL,
FOREIGN KEY (type, ref) REFERENCES refs (type, id) ON DELETE CASCADE,
CHECK (type IN ('A', 'B')),
…)
Table refs constains all instances of both A and B. It serves no other purpose except policing referential integrity, it won't even participate in the joins.
Note that MySQL accepts CHECK constraints but does not enforce them. You will need to watch your types.
You also should not delete the records from table_a and table_b directly: instead, delete the records from refs which will trigger ON DELETE CASCADE.

Create a parent "super-type" table for both A and B. Reference that table in my_table.

Yes, there are better solutions.
However, since you didn't describe what you're allowed to change, it's difficult to know which alternatives could be used.
Principally, this "exclusive-or" kind of key reference means that A and B are actually two subclasses of a common superclass. You have several ways to changing the A and B tables to unify them into a single table.
One of which is to simply merge the A and B table into a big table.
Another of which is to have a superclass table with the common features of A and B as well as a subtype flag that says which subtype it is. This still involves a join with the subclass table, but the join has an explicit discriminator, and can be done "lazily" by the application rather than in the SQL.

I see no problem with your solution. However, I think you should add CHECK constraints to make sure that exactly one of the fields is null.

you know, it's hard to tell if there are any better solutions since you've stripped the question of all vital information. with the tiny amount that's still there i'd say that most better solutions involve getting rid of my_table.

Related

SQL Query returning duplicate with any command with foreign keys

Other answers seem to be using JOIN, which correct me if I'm wrong you don't need to use if you have a foreign key? If you do need to use JOIN would you mind showing what would work with this?
CREATE TABLE Dogs (
DogID int NOT NULL,
DogSize int NOT NULL,
fName VARCHAR(255),
ID int,
PRIMARY KEY (DogID),
CREATE TABLE Owners (
ID int NOT NULL,
Fname VARCHAR(255) NOT NULL,
Lname VARCHAR(255),
Area VARCHAR(255) NOT NULL,
Pay INT NOT NULL,
Extra INT,
PRIMARY KEY (ID)
FOREIGN KEY (ID) REFERENCES Owners(ID)
A foreign key is only there to make sure your data are well-organized. It does not help you "behind the scenes" when making queries. And it certainly doesn't make it good to not use joins.
Your query is an old-style query, which for many reasons you shouldn't use. The equivalent of what you wrote is the following:
SELECT * FROM Dogs CROSS JOIN Owners
This is called a cartesian product, which means all the combinations are returned, since you didn't provide a join condition. I assume you didn't because you thought it would be automatically provided by the foreign key....but it doesn't work that way. You have to provide what you want yourself.
Now for the next problem. You do not really have a connection. The foreign key on Owners is targeting not only the table itself, but actually the column itself. That makes zero sense. Did you maybe want to assign one dog to each owner? There should be another column on Owners (not the primary key one) that would be a foreign key to Dogs:
[DogID] int,FOREIGN KEY (DogID) REFERENCES Dogs(ID)
This is better. Still, not very good; it would limit the dogs owned by each owner to one or zero. A better idea would be to make a new column in Dogs that would reference the owner:
[OwnerID] int,FOREIGN KEY (OwnerID) REFERENCES Owners(ID)

Postgres INSERT INTO... SELECT violates foreign key constraint

I'm having a really, really strange issue with postgres. I'm trying to generate GUIDs for business objects in my database, and I'm using a new schema for this. I've done this with several business objects already; the code I'm using here has been tested and has worked in other scenarios.
Here's the definition for the new table:
CREATE TABLE guid.public_obj
(
guid uuid NOT NULL DEFAULT uuid_generate_v4(),
id integer NOT NULL,
CONSTRAINT obj_guid_pkey PRIMARY KEY (guid),
CONSTRAINT obj_id_fkey FOREIGN KEY (id)
REFERENCES obj (obj_id)
ON UPDATE CASCADE ON DELETE CASCADE
)
However when I try to backfill this using the following code, I get a SQL state 23503 claiming that I'm violating the foreign key constraint.
INSERT INTO guid.public_obj (guid, id)
SELECT uuid_generate_v4(), o.obj_id
FROM obj o;
ERROR: insert or update on table "public_obj" violates foreign key constraint "obj_id_fkey"
SQL state: 23503
Detail: Key (id)=(-2) is not present in table "obj".
However, if I do a SELECT on the source table, the value is definitely present:
SELECT uuid_generate_v4(), o.obj_id
FROM obj o
WHERE obj_id = -2;
"0f218286-5b55-4836-8d70-54cfb117d836";-2
I'm baffled as to why postgres might think I'm violating the fkey constraint when I'm pulling the value directly out of the corresponding table. The only constraint on obj_id in the source table definition is that it's the primary key. It's defined as a serial; the select returns it as an integer. Please help!
Okay, apparently the reason this is failing is because unbeknownst to me the table (which, I stress, does not contain many elements) is partitioned. If I do a SELECT COUNT(*) FROM obj; it returns 348, but if I do a SELECT COUNT(*) FROM ONLY obj; it returns 44. Thus, there are two problems: first, some of the data in the table has not been partitioned correctly (there exists unpartitioned data in the parent table), and second, the data I'm interested in is split out across multiple child tables and the fkey constraint on the parent table fails because the data isn't actually in the parent table. (As a note, this is not my architecture; I'm having to work with something that's been around for quite some time.)
The partitioning is by implicit type (there are three partitions, each of which contains rows relating to a specific subtype of obj) and I think the eventual solution is going to be creating GUID tables for each of the subtypes. I'm going to have to handle the stuff that's actually in the obj table probably by selecting it into a temp table, dropping the rows from the obj table, then reinserting them so that they can be partitioned properly.

How to manage uniqueness of a value that is stored in multiple databases tables

I have separate assets tables for storing different kind of physical and logical assets, such as:-
Vehicle table( ID, model, EngineSize, Drivername, lastMaintenanceDate)
Server table ( ID, IP, OSName, etc…)
VM (ID, Size, etc…).
VM_IP (VM_ID,IP)
Now the problems I have is:-
For the IP column in the server table and in the VM_IP table, I need this column to be unique in these two tables, so for example the database should not allow a server and a VM to have the same IP. In the current design I can only guarantee uniqueness for the table separately.
So can anyone advice on how I can handle this unique requirement on the databases level.
Regards
::EDITED::
I have currently the following database structure:-
Currently I see these points:-
I have introduced a redundant AssetTypeID column in the base Asset table, so I can know the asset type without having to join tables. This might break normalization.
In my above architecture , I cannot control (on the database level) which asset should have IP, which asset should not have IP and which asset can/cannot have multiple IPs.
So is there a way to improve my architecture to handle these two points.
Thanks in advance for any help.
Create an IP table and use foreign keys
If I were facing the problem in design level, I would add two more tables:
A valid_IP table (containing valid IP range)
A Network_Enabeled, base table for all entities that may have an
IP, like Server table, VM_IP ,... the primary key of this base
table will be the primary key of child tables.
In Network_Enabeled table, Having a foreign key from valid_IP table and setting a unique key on the filed will be the answer.
Hope be helpful.
You can use an indexed view.
CREATE VIEW YourViewName with SCHEMABINDING
as
...
GO
CREATE UNIQUE CLUSTERED INDEX IX_YourIndexName
on YourViewName (..., ...)
Based on your edit, you can introduce a superkey on the asset table and use various constraints to enforce most of what it sounds like you're looking for:
create table Asset (
AssetID int not null primary key,
AssetTypeID int not null
--Skip all of the rest, foreign keys, etc, irrelevant to example
,constraint UQ_Asset_TypeCheck
UNIQUE (AssetID,AssetTypeID) --This is the superkey
)
The above means that the AssetTypeID column can now be checked/enforced in other tables, and there's no risk of inconsistency
create table Servers (
AssetID int not null primary key,
AssetTypeID as 1 persisted,
constraint FK_Servers_Assets FOREIGN KEY (AssetID)
references Asset (AssetID), --Strictly, this will become redundant
constraint FK_Servers_Assets_TypeCheck FOREIGN KEY (AssetID,AssetTypeID)
references Asset (AssetID,AssetTypeID)
)
So, in the above, we enforce that all entries in this table must actually be of the correct asset type, by making it a fixed computed column that is then used in a foreign key back to the superkey.
--So on for other asset types
create table Asset_IP (
AssetID int not null,
IPAddress int not null primary key, --Wrong type, for IPv6
AssetTypeID int not null,
constraint FK_Asset_IP_Assets FOREIGN KEY (AssetID)
references Asset (AssetID), --Again, redundant
constraint CK_Asset_Types CHECK (
AssetTypeID in (1/*, Other types allowed IPs */)),
constraint FK_Asset_IP_Assets_TypeCheck FOREIGN KEY (AssetID,AssetTypeID)
references Asset (AssetID,AssetTypeID)
)
And now, above, we again reference the superkey to ensure that we've got a local (to this table) correct AssetTypeID value, which we can then use in a check constraint to limit which asset types are actually allowed entries in this table.
create unique index UQ_Asset_SingleIPs on Asset_IP (AssetID)
where AssetTypeID in (1/* Type IDs that are only allowed 1 IP address */)
And finally, for certain AssetTypeID values, we ensure that this table only contains one row for that AssetID.
I hope that gives you enough ideas of how to implement your various checks based on types. If you want/need to, you can now construct some views (through which the rest of your code will interact) which hides the extra columns and provides triggers to ease INSERT statements.
On a side note, I'd recommend picking a convention and sticking to it when it comes to table naming. My preferred one is to use the plural/collective name, unless the table is only intended to contain one row. So I'd rename Asset as Assets, for example, or Asset_IP as Asset_IPs. At the moment, you have a mixture.

Database best practices

I have a table which stores comments, the comment can either come from another user, or another profile which are separate entities in this app.
My original thinking was that the table would have both user_id and profile_id fields, so if a user submits a comment, it gives the user_id leaves the profile_id blank
is this right, wrong, is there a better way?
Whatever is the best solution depends IMHO on more than just the table, but also how this is used elsewhere in the application.
Assuming that the comments are all associated with some other object, lets say you extract all the comments from that object. In your proposed design, extracting all the comments require selecting from just one table, which is efficient. But that is extracting the comments without extracting the information about the poster of each comment. Maybe you don't want to show it, or maybe they are already cached in memory.
But what if you had to retrieve information about the poster while retrieving the comments? Then you have to join with two different tables, and now the resulting record set is getting polluted with a lot of NULL values (for a profile comment, all the user fields will be NULL). The code that has to parse this result set also could get more complex.
Personally, I would probably start with the fully normalized version, and then denormalize when I start seeing performance problems
There is also a completely different possible solution to the problem, but this depends on whether or not it makes sense in the domain. What if there are other places in the application where a user and a poster can be used interchangeably? What if a User is just a special kind of a Profile? Then I think that the solution should be solved generally in the user/profile tables. For example (some abbreviated pseudo-sql):
create table AbstractProfile (ID primary key, type ) -- type can be 'user' or 'profile'
create table User(ProfileID primary key references AbstractProfile , ...)
create table Profile(ProfileID primary key references AbstractProfile , ...)
Then any place in your application, where a user or a profile can be used interchangeably, you can reference the LoginID.
If the comments are general for several objects you could create a table for each object:
user_comments (user_id, comment_id)
profile_comments (profile_id, comment_id)
Then you do not have to have any empty columns in your comments table. It will also make it easy to add new comment-source-objects in the future without touching the comments table.
Another way to solve is to always denormalize (copy) the name of the commenter on the comment and also store a reference back to the commenter via a type and an id field. That way you have a unified comments table where on you can search, sort and trim quickly. The drawback is that there isn't any real FK relationship between a comment and it's owner.
In the past I have used a centralized comments table and had a field for the fk_table it is referencing.
eg:
comments(id,fk_id,fk_table,comment_text)
That way you can use UNION queries to concatenate the data from several sources.
SELECT c.comment_text FROM comment c JOIN user u ON u.id=c.fk_id WHERE c.fk_table="user"
UNION ALL
SELECT c.comment_text FROM comment c JOIN profile p ON p.id=c.fk_id WHERE c.fk_table="profile"
This ensures that you can expand the number of objects that have comments without creating redundant tables.
Here's another approach, which allows you to maintain referential integrity through foreign keys, manage centrally, and provide the highest performance using standard database tools such as indexes and if you really need, partitioning etc:
create table actor_master_table(
type char(1) not null, /* e.g. 'u' or 'p' for user / profile */
id varchar(20) not null, /* e.g. 'someuser' or 'someprofile' */
primary key(type, id)
);
create table user(
type char(1) not null,
id varchar(20) not null,
...
check (id = 'u'),
foreign key (type, id) references actor_master_table(type, id)
);
create table profile(
type char(1) not null,
id varchar(20) not null,
...
check (id = 'p'),
foreign key (type, id) references actor_master_table(type, id)
);
create table comment(
creator_type char(1) not null,
creator_id varchar(20) not null,
comment text not null,
foreign key(creator_type, creator_id) references actor_master_table(type, id)
);

Bidirectional foreign key constraint

I'm thinking of designing a database schema similar to the following:
Person (
PersonID int primary key,
PrimaryAddressID int not null,
...
)
Address (
AddressID int primary key,
PersonID int not null,
...
)
Person.PrimaryAddressID and Address.PersonID would be foreign keys to the corresponding tables.
The obvious problem is that it's impossible to insert anything into either table. Is there any way to design a working schema that enforces every Person having a primary address?
"I believe this is impossible. You cannot create an Address Record until you know the ID of the person and you cannot insert the person record until you know an AddressId for the PrimaryAddressId field."
On the face of it, that claim seems SO appealing. However, it is quite propostrous.
This is a very common kind of problem that the SQL DBMS vendors have been trying to attack for perhaps decades already.
The key is that all constraint checking must be "deferred" until both inserts are done. That can be achieved under different forms. Database transactions may offer the possibility to do something like "SET deferred constraint checking ON", and you're done (were it not for the fact that in this particular example, you'd likely have to mess very hard with your design in order to be able to just DEFINE the two FK constraints, because one of them simply ISN'T a 'true' FK in the SQL sense !).
Trigger-based solutions as described here achieve essentially the same effect, but those are exposed to all the maintenance problems that exist with application-enforced integrity.
In their work, Chris Date & Hugh Darwen describe what is imo the true solution to the problem : multiple assignment. That is, essentially, the possibility to compose several distinct update statements and have the DBMS act upon it as if that were one single statement. Implementations of that concept do exist, but you won't find any that talks SQL.
This is a perfect example of many-to-many relationship. To resolve that you should have intermediate PERSON_ADDRESS table. In other words;
PERSON table
person_id (PK)
ADDRESS table
address_id (PK)
PERSON_ADDRESS
person_id (FK) <= PERSON
address_id (FK) <= ADDRESS
is_primary (BOOLEAN - Y/N)
This way you can assign multiple addresses to a PERSON and also reuse ADDRESS records in multiple PERSONs (for family members, employees of the same company etc.). Using is_primary field in PERSON_ADDRESS table, you can identify if that person_addrees combination is a primary address for a person.
We mark the primary address in our address table and then have triggers that enforces only record per person can have it (but one record must have it). If you change the primary address, it will update the old primary address as well as the new one. If you delete a primary address and other addresses exist, it will promote one of them (basesd ona series of rules) to the primary address. If the address is inserted and is the first address inserted, it will mark that one automatically as the primary address.
The second FK (PersonId from Address to Person) is too restrictive, IMHO. Are you suggesting that one address can only have a single person?
From your design, it seems that an address can apply to only one person, so just use the PersonID as the key to the address table, and drop the AddressID key field.
I know I'll probably be crucified or whatever, but here goes...
I've done it like this for my "particular very own unique and non-standard" business need ( =( God I'm starting to sound like SQL DDL even when I speak).
Here's an exaxmple:
CREATE TABLE IF NOT EXISTS PERSON(
ID INT,
CONSTRAINT PRIMARY KEY (ID),
ADDRESS_ID INT NOT NULL DEFAULT 1,
DESCRIPTION VARCHAR(255),
CONSTRAINT PERSON_UQ UNIQUE KEY (ADDRESS_ID, ...));
INSERT INTO PERSON(ID, DESCRIPTION)
VALUES (1, 'GOVERNMENT');
CREATE TABLE IF NOT EXISTS ADDRESS(
ID INT,
CONSTRAINT PRIMARY KEY (ID),
PERSON_ID INT NOT NULL DEFAULT 1,
DESCRIPTION VARCHAR(255),
CONSTRAINT ADDRESS_UQ UNIQUE KEY (PERSON_ID, ...),
CONSTRAINT ADDRESS_PERSON_FK FOREIGN KEY (PERSON_ID) REFERENCES PERSON(ID));
INSERT INTO ADDRESS(ID, DESCRIPTION)
VALUES (1, 'ABANDONED HOUSE AT THIS ADDRESS');
ALTER TABLE PERSON ADD CONSTRAINT PERSON_ADDRESS_FK FOREIGN KEY (ADDRESS_ID) REFERENCES ADDRESS(ID);
<...life goes on... whether you provide and address or not to the person and vice versa>
I defined one table, then the other table referencing the first and then altered the first to reflect the reference to the second (which didn't exist at the time of the first table's creation). It's not meant for a particular database; if I need it I just try it and if it works then I use it, if not then I try to avoid having that need in the design (I can't always control that, sometimes the design is handed to me as-is). if you have an address without a person then it belongs to the "government" person. If you have a "homeless person" then it gets the "abandoned house" address. I run a process to determine which houses have no users

Resources