Database joins / FK - basic questions

Database joins / FK - basic questions - database

There are 2 tables A & B. Each has say 10 colums.
Table A has 8 columns as FK to other tables. Table B uses enums and std colunms without any FK.
So which table is faster / better to
use?
If i do any action with table A,
i assume I only have to touch colunms
I am relating the action too and do
not have to join all the 10 FK tables
even if i only need 1 FK colunm?
If i do
need to perform any action on a FK,
like write, update or delete a value,
do i need to join to the parent
table?
If i understand correctly,
EAV model is better than a expanded
colunm table because if i need to
display two text from the table then
i need to use a inner join for the
colunm table for for a EAV table i
can use a regular select only with no join?

For only a few values and if the amount of values doesn't change, ENUM can be faster and takes up less space. However, to later add possible values, you'll need to alter the entire table, which is not good design. Table A is in most cases the better option.
Offcourse you only join the table A with the tables you need.
No, you can just modify the table containing the value, unless you change the PK. You should however design your tables in such way that changing the PK is not often needed - use artificial PK's (autoincrements are perfect). Even countries cease to exist or change names...
No, for your EAV you'll need the join. However, joining on keys is extremely fast... this is what relational databases are all about, it's their strong point.

Related

To use a FK in a one-to-many relationship versus using a join table

First of all a little bit of context:
TableA as ta
TableB as tb
One 'ta' has many 'tb', but one 'tb' can only be owned by one 'ta'.
I'd often just ad a FK to 'tb' pointing to 'ta' and it's done. Now i'm willing to model it differently (to improve it's readability); i want to use a join table, be it named 'ta_tb' and set a PK to 'tb_id', to enforce the 'one-to-many' clause.
Are there any performance issues when using the approach b in spite of approach a?

If this is a clear 1:n-relation (and ever will be!) there is no need (and no advantage) of a new table in between.
Such a joining table you would use to build a m:n-relation.
There could be one single reason to use a joining table with a 1:n-relation: If you want to manage additional data specifying details of this relation.
HTH

Whenever you normalize your database, there is always a performance hit. If you do a join table (or sometimes referred to as a cross reference) the dbms will need to do work to join the right records.
DBMS's these days do pretty well with creating indexes and reducing these performance hits. It just depends on your situation.
Is it more important to have readability and normalization? Then use a join/xref table.
Is this for a small application that you want to perform well? Just make Table B have a FK to its parent.

If you index correctly, there should be very little performance impact although there will be a very slight extra cost that is likely not noticeable unless your database is maxed out already.
However, if you do this and want to maintain the idea that each id from column b must have one and only 1 a, then you need to make sure to put a unique index on the id for table b in the join table. Later if you need to change that to a many to many relationship, you can remove the index which is certainly easier than changing the database structure. Otherwise this design puts your data integrity at risk.
So the join table might be the structure I woudl recommend if I knew that eventually a many to many relationship was possible. Otherwise, I would probably stick with the simpler structure of the FK in table b.

Yes, at least you will need one more join to access TableB fields and this will impact the performance. (there is a question regarding this here When and why are database joins expensive?)
Also relations that uses a join table is known as many to many, in your case you have a PK on the middle table "making" this relation one to many, and this is less readable than the approach A.
Keep it simple as possible, the approach A is perfect for one to many relationships.

Just curious about SQL Joins

I'm just curious here. If I have two tables, let's say Clients and Orders.
Clients have a unique and primary key ID_Client. Orders have an ID_Client field also and a relation to maintain integrity to Client's table by ID_Client field.
So when I want to join both tables i do:
SELECT
Orders.*, Clients.Name
FROM
Orders
INNER JOIN
Clients ON Clients.ID_Client = Orders.ID_Client
So if I took the job to create the primary key, and the relation between the tables,
Is there a reason why I need to explicitly include the joined columns in on clause?
Why can't I do something like:
SELECT
Orders.*, Clients.Name
FROM
Orders
INNER JOIN
Clients
So SQL should know which columns relate both tables...

I had this same question once and I found a great explanation for it on Database Administrator Stack Exchange, the answer below was the one that I found to be the best, but you can refer to the link for additional explanations as well.
A foreign key is meant to constrain the data. ie enforce
referential integrity. That's it. Nothing else.
You can have multiple foreign keys to the same table. Consider the following where a shipment has a starting point, and an ending point.
table: USA_States
StateID
StateName
table: Shipment
ShipmentID
PickupStateID Foreign key
DeliveryStateID Foreign key
You may want to join based on the pickup state. Maybe you want to join on the delivery state. Maybe you want to perform 2 joins for
both! The sql engine has no way of knowing what you want.
You'll often cross join scalar values. Although scalars are usually the result of intermediate calculations, sometimes you'll have a
special purpose table with exactly 1 record. If the engine tried to
detect a foriegn key for the join.... it wouldn't make sense because
cross joins never match up a column.
In some special cases you'll join on columns where neither is unique. Therefore the presence of a PK/FK on those columns is
impossible.
You may think points 2 and 3 above are not relevant since your questions is about when there IS a single PK/FK relationship
between tables. However the presence of single PK/FK between the
tables does not mean you can't have other fields to join on in
addition to the PK/FK. The sql engine would not know which fields you
want to join on.
Lets say you have a table "USA_States", and 5 other tables with a FK to the states. The "five" tables also have a few foreign keys to
each other. Should the sql engine automatically join the "five" tables
with "USA_States"? Or should it join the "five" to each other? Both?
You could set up the relationships so that the sql engine enters an
infinite loop trying to join stuff together. In this situation it's
impossible fore the sql engine to guess what you want.
In summary: PK/FK has nothing to do with table joins. They are separate unrelated things. It's just an accident of nature that you
often join on the PK/FK columns.
Would you want the sql engine to guess if it's a full, left, right, or
inner join? I don't think so. Although that would arguably be a lesser
sin than guessing the columns to join on.

If you don't explicitly give the field names in the query, SQL doesn't know which fields to use. You won't always have fields that are named the same and you won't always be joining on the primary key. For example, a relationship could be between two foreign key fields named "Client_Address" and "Delivery_Address". In that case, you can easily see how you would need to give the field name.
As an example:
SELECT o.*, c.Name
FROM Clients c
INNER JOIN Orders o
ON o.Delivery_Address = c.Client_Address

Is there a reason why do i need to explicit include then joinned fields in on clause?
Yes, because you still need to tell the database server what you want. "Do what I mean" is not within the capabilities of any software system so far.
Foreign keys are tools for enforcing data integrity. They do not dictate how you can join tables. You can join on any condition that is expressible through an SQL expression.
In other words, a join clause relates two tables to each other by a freely definable condition that needs to evaluate to true given the two rows from left hand side and the right hand side of the join. It does not have to be the foreign key, it can be any condition.
Want to find people that have last names equal to products you sell?
SELECT
Products.Name,
Clients.LastName
FROM
Products
INNER JOIN Clients ON Products.Name = Clients.LastName
There isn't even a foreign key between Products and Clients, still the whole thing works.

It's like that. :)
The sql standard says that you have to say on which columns to join. The constraints are just for referential integrity. With mysql the join support "join table using (column1, column2)" but then those columns have to be present in both tables

Reasons why this behaviour is not default
Because one Table can have multiple columns referencing back to one column in another table.
In a lot of legacy databases there are no Foreign key constraints but yet the columns are “Supposed to be” referencing some column in some other table.
The join conditions are not always as simple as A.Column = B.Column . and the list goes on…….
Microsoft developers were intelligent enough to let us make this decision rather than them guessing that it will always be A.Column = B.Column

Data structure using master id

I have a database with tables A and B in a one-to-many relationship. So one entity in A can be assigned to multiple and differing entities in B. A and B each have their own specific fields, but there are also fields and workflows related to either A or B, which are basically the same data but related only to either A or B.
As an example, an entity in A can have multiple comments for differing reasons and so can entities in B. Since there can be multiple comments for a single record I have to have a related comment table outside of tables A and B. I didn't want to have two comment tables, one for A and a separate table for B, so I set up a MasterID table that is related to both A and B and has referential integrity enforced. This means that when I want to add a record in A or B, I have to make sure that a MasterID already exists in the MasterID table. There are other tables that have the same type of functionality, comments is just one example, but if I didn't use a MasterID I'd have to create multiple tables each for A and B.
So my question is, is this the correct way to do this? Is there another way? The front-end will be in Access so I'm running into a little bit of trouble making sure a MasterID is created right before creating a new record in A or B.
MasterID(MasterID)
TableA(TableAID, FK_MasterID)
TableB(TableBID, FK_MasterID, FK_TableAID)
Comments(CommentID, MasterID, Comment)
Thanks for any help.

From a pure data design standpoint, you are on the right track, but not quite. You can use an entity-subtyping approach in which A and B are subtypes of another entity (MasterID). It is this supertype entity which attracts comments. However, for this to be true subtyping, the PK of A and the PK of B would be the FK to MasterID.
The way you've designed your tables, they have two candidate keys. If you eliminate the redundant candidate keys, then you have a standard entity-subtyping pattern, which is a legitimate and commonly used design approach.

Based on my understanding for the problem, I think this is too complex for a little value. If I understand you correctly, you have a situation like the picture and you want to make the key for Comment unique.
Creating a fourth table could work but it adds unnecessary complexity.
What you could do instead is to make the key for the Comments table a compound key of the two columns one is a sequence number and the other is a character field indicating the parent table. So you get keys like (A,1), (A,2), (B,3), (A,4), (B,5) ...etc.
This way you don't need the master table, and you don't need FKs in Table A or B.

Database Mapping - Multiple Foreign Keys

I want to make sure this is the best way to handle a certain scenario.
Let's say I have three main tables I will keep them generic. They all have primary keys and they all are independent tables referencing nothing.
Table 1
PK
VarChar Data
Table 2
PK
VarChar Data
Table 3
PK
VarChar Data
Here is the scenario, I want a user to be able to comment on specific rows on each of the above tables. But I don't want to create a bunch of comment tables. So as of right now I handled it like so..
There is a comment table that has three foreign key columns each one references the main tables above. There is a constraint that only one of these columns can be valued.
CommentTable
PK
FK to Table1
FK to Table2
FK to Table3
VarChar Comment
FK to Users
My question: is this the best way to handle the situation? Does a generic foreign key exist? Or should I have a separate comments table for each main table.. even though the data structure would be exactly the same? Or would a mapping table for each one be a better solution?

My question: is this the best way to handle the situation?
Multiple FKs with a CHECK that allows only one of them to be non-NULL is a reasonable approach, especially for relatively few tables like in this case.
The alternate approach would be to "inherit" the Table 1, 2 and 3 from a common "parent" table, then connect the comments to the parent.
Look here and here for more info.
Does a generic foreign key exist?
If you mean a FK that can "jump" from table to table, then no.
Assuming all 3 FKs are of the same type1, you could theoretically implement something similar by keeping both foreign key value and referenced table name2 and then enforcing it through a trigger, but declarative constraints should be preferred over that, even at a price of slightly more storage space.
If your DBMS fully supports "virtual" or "calculated" columns, then you could do something similar to above, but instead of having a trigger, generate 3 calculated columns based on FK value and table name. Only one of these calculated columns would be non-NULL at any given time and you could use "normal" FKs for them as you would for the physical columns.
But, all that would make sense when there are many "connectable" tables and your DBMS is not thrifty in storing NULLs. There is very little to gain when there are just 3 of them or even when there are many more than that but your DBMS spends only one bit on each NULL field.
Or should I have a separate comments table for each main table, even though the data structure would be exactly the same?
The "data structure" is not the only thing that matters. If you happen to have different constraints (e.g. a FK that applies to one of them but not the other), that would warrant separate tables even though the columns are the same.
But, I'm guessing this is not the case here.
Or would a mapping table for each one be a better solution?
I'm not exactly sure what you mean by "mapping table", but you could do something like this:
Unfortunately, that would allow a single comment to be connected to more than one table (or no table at all), and is in itself a complication over what you already have.
All said and done, your original solution is probably fine.
1 Or you are willing to store it as string and live with conversions, which you should be reluctant to do.
2 In practice, this would not really be a name (as in string) - it would be an integer (or enum if DBMS supports it) with one of the well-known predefined values identifying the table.

Thanks for all the help folks, i was able to formulate a solution with the help of a colleague of mine. Instead of multiple mapping tables i decided to just use one.
This mapping table holds a group of comments, so it has no primary key. And each group row links back to a comment. So you can have multiple of the same group id. one-many-one would be the relationship.

how can i have a unique column in many tables

I have ten or more(i don't know) tables that have a column named foo with same datatype.
how can i tell sql that values in all the tables should be unique.
I mean If(i have value "1" in table1) I should NOT be able to have value "1" in table2

Have a common ID's table, which these ten tables reference. That will work well in that it will ensure unique ID's, but doesn't mean you couldn't duplicate the ID's in the table if someone really wants to.
What I mean is a common ID's table ensures that you don't have duplicates for insert (by also inserting an ID into this common table), but the thing is the way to guarantee that it never happens is by building the business rules into the system or placing check constraints to cross reference the other tables (which would ensure uniqueness, but degrade performance).

The question is phrased vaguely; if you need to generate a column that's unique among several tables, use row GUIDs or a common ID generator table; if you need to enforce uniqueness (and the field values are already there), use triggers.
Generally, if you generate the values, you don't need to enforce anything. The generation logic, if done right, will take care of that. If you are inserting, say, user input, then you can and should enforce uniqueness during insertion. As a validation rule or something.

You can define the field as a GUID (or a UNIQUEIDENTIFIER in SQL server). Then it will always be unique no matter what.

How about setting a check constraint on each table, such that ID % 10 = N (where N is the table number, from 0-9). And use IDENTITY(N,10) each time.

I would suggest that possibly your design is flawed. Why are these separate tables? It ouwld be better to put them in one table with one id field and another filed to identify whatever is making these spearate tables (cusotmer id for instance). Then you can read about partioning tables if you want them to be split by customer for performance reasons.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight