I am designing an airline database (the outline of one anyway) for an assignment and seem to be running around in circles.
Three tables are concerned:
Customer Booking_Reference Flight
cust_id(pk) reference_id(pk) Flight_id(pk)
cust_id(fk)
A booking reference can have many flights.
A flight will have many booking references.
I am trying to break up the many to many relationship. Is it possible to have a relational table with the flight_id as the attributes (columns) and the booking_reference as the rows (data)? If so there can be no primary key, which is a no-go as I understand.
Alternatively I could make the booking_reference/flight relational table with 2 attributes and a compound primary key of booking_reference/flight, which would result in both entities being duplicated but the primary key being unique (half of it anyway). Is this acceptable design practice?
I was going to just list a max number of 8 flights as columns in the booking reference table (with NULL for the entries where there is less than 8 flights) and give customers with more than 8 flights a new reference_id, but this seems to be more ridiculous as i learn more about databases, resulting in more reference ids and more NULL data.
Any ideas on which route to take?
Rather than having eight (or any arbitrary number of) columns, create what's sometimes called a join table, with three columns:
Table: references_flights
id (Primary key)
reference_id (fk)
flight_id (fk)
You should then be able to query data across them with the right JOINs, but I'll leave that for someone with more database expertise.
Related
Suppose there is a table keeping info about Vendors and Customers in one table named Partners (since one partner can be vendor at one point of time and customer at other).
Partners table have usual stuff: company name, short name, address, city, country. Now, for domestic partners there is DomesticVatNumber and for non-domestic there is InternationalVatNumber. Usually, vat number would be perfect candidate for primary key but the problem here is that not all domestic partners have InternationalVatNumber and international ones dont have DomesticVatNumber.
I am trying to see best ways to design this in db. Is surrogate key the only option in this case or should i maybe reconsider having domestic and international partners in same table? Should i maybe split them into 2 tables: DomesticPartners (which always have DomesticVatNumber) and InternationalPartners (which always have InternationalVatNumber) and then put primary key on DomesticVat and InternationalVat columns respectively?
What are pros/cons of each approach?
Personally, I would never make a primary key out of something assigned by an external party, nor would I use a value that the user would ever see. I would always use a meaningless key (either an identity column or a unique identifier).
Given what you are saying, I wouldn't split them into separate tables since you would then have to either have any table that referenced your partner table in a foreign key would either have to have two nullable columns setup to do this or have one column but no foreign key relationship (shudder...).
The best option is to have one table, have the domestic and international VAT numbers as separate fields in the table but not a primary key. Since they will both be nullable, you would have limited options for a unique constraint on them.
Just my 2 cents
As your business grows, your systems get more complex, and it makes more sense to have one table. An example can be an ENTITIES table which stores everyone and everything, including vendors and customers. This can include individuals, groups and businesses, clients and staff, etc. Later on you will be glad you did it this way, because it reduces the number of complex joins you are going to have with multiple tables. You can use ENTITY_NO as a surrogate key and ENTITY_TYPE to differentiate entities. VAT number fields can be indexed separately and made nullable.
i have database that has master tables prefixed with mt_ and transactions tables prefixed with tr_ . But when i go through the database i started to wonder what is the actual definition of a transaction table and master table. To my understanding transaction table should have a composite primary key (primary key made from two or many PKs of other tables). But when i looked at the transaction tables in database, there are tables that have the composite key as mentioned previously also it has tables tagged as tr_ but have only one key tagged as PK and they also have PK keys that belongs to other tables but they weren't even tagged as FK...
So could any one here explain the difference between a master table and a transaction table and how to identify them in DB?
Updated
Here is an examtple of my db
tr_orders
OrderId int PK
CustomerId int Fk
OrderDate datetime etc
tr_reciept
RecieptId int PK
OrderId int **(but not FK)**
PaindAmount money
recieptDate datetime
Here are the table structure of the complete two tables:
tr_orders
tr_reciept
i dont understand why these tables are tr tables?
Why you don't always put a Primary Key on all the Foreign Keys
When something 'happens' it goes into a transaction. Someone buys a toy at a shop. A row is created recording that it was a toy and the datetime it happened and how much it cost.
The someone else buys a toy ten minutes later
We have two records in our transaction table:
Date Time Product_Key Shop_Key Amount
--------------------------------------------------------------------------
18 Dec 2015 13:05 7 12 10
18 Dec 2015 13:15 7 12 10
Here we have two foreign keys: Product_Key and Shop_Key
We can't create a PK on just those two foreign keys because then one shop could only ever sell one toy.
So the PK does not automatically go on all the FK's
But really the thing to take away is that your data model (tables, fields, keys, datatypes) reflects what your business does. If a shop could truly only ever sell one toy, it would be a valid data model to have a PK on those two fields.
Some characteristics of 'transactional vs master tables
"Transactional" and "master" tables generally have a many to one relationship, meaning many transactions match one master record. Many purchase records match the same single toy record. A FK is a dead giveaway to this kind of relationship although "master" tables also have FK's
"Transactional" tables usually have a date or some kind of event id and are often 'aggregated' when reporting. This could be a record count or a sum of an amount.
Some characteristics of real world systems
It's entirely likely that someone forgot to put on a FK or PK, or it could be that there is a unique key (not a PK) enforcing what you are expectig to see.
I've seen live systems where the keys were clearly incorrect, or there were no keys at all.
Master - - - - - - - - - - - - - - - - - - - - - - Transaction
Country .... Employeee ...... Customer ........... Order
Master & Lookup tables exist in a Range, not a Binary On/Off State, and reflects the expectation of the amount of activity the table will experience
Lookup/Reference tables like State, Currency, Country, etc. RARELY have new records or changes -- Very "Master"
Employees add or change records occasionally, but not often (hopefully) so more "Master" than "Transaction"
Customers add or change records more often than Employees (also hopefully) so still a measure of "Master" but also with "Transaction" qualities
Orders are added and changed ALL the time (hopefully) and are Very "Transaction,"
Same with Reciepts
If a table has No FK fields, it's likely very "Master"
Tables with FKs have some amount of "Transaction" to them, the more you expect new records - the more "Transaction" the table could be.
Keys:
in my opinion, every table should have a surrogate PK, not related to FKs or any Natural keys. Lots of reasons for this opinion, but whatever works for you is cool, too.
Often, if there is an obvious Natural key, then a table needs a Unique constraint for that key, in addition to the PK
I have read through handfuls of what would seem to make this a duplicate question. But reading through all of these has left me uncertain. I'm hoping to get an answer based on the absolute example below, as many questions/answers trail off into debates back and forth.
If I have:
dbo.Book
--------
BookID PK int identity(1,1)
dbo.Author
----------
AuthorID PK int identity(1,1)
Now I have two choices for a simple junction table:
dbo.BookAuthor
--------------
BookID CPK and FK
AuthorID CPK and FK
The above would be a compound/composite key on both FKs, as well as set up the FK relationships for both columns - also using Cascade on delete.
OR
dbo.BookAuthor
--------------
RecordID PK int identity(1,1)
BookID FK
AuthorID FK
Foreign key relationships on BookID and AuthorID, along with Cascade on delete. Also set up a unique constraint on BookID and AuthorID.
I'm looking for a simple answer as to why one method is better than another in the ABOVE particular example. The answers that I'm reading are very detailed, and I was just about to settle on a compound key, but then watched a video where the example used an Identity column like my first example.
It seems this topic is slightly torn in half, but my gut is telling me that I should just use a composite key.
What's more efficient for querying? It seems having a PK identity column along with setting up a unique constraint on the two columns, AND the FK relationships would be more costly, even if a little.
This is something I've always remembered from my database course way back in college. We were covering the section from the textbook on "Entity Design" and it was talking about junction tables... we called them intersect tables or intersection relations. I was actually paying attention in class that day. The professor said, in his experience, a many-to-many junction table almost always indicates an unidentified missing entity. These entities almost always end up with data of their own.
We were given an example of Student and Course entities. For a student to take a course, you need to junction between those two. What you actually have as a result is a new entity: an Enrollment. The additional data in this case would be things like Credit Type (audit vs regular) or Final Grade.
I remember that advice to this day... but I don't always follow it. What I will do in this situation is stop, and make sure to go back to the stakeholders on the issue and work with them on what data points we might still be missing in this junction. If we really can't find anything, then I'll use the compound key. When we do find data, we think of a better name and it gets a surrogate key.
Update in 2020
I still have the textbook, and by amazing coincidence both it and this question were brought to my attention within a few hours of each other. So for the curious, it was Chapter 5, section 6, of the 7th edition of this book:
https://www.amazon.com/Database-Processing-Fundamentals-Design-Implementation-dp-9332549958/dp/9332549958/
As a staunch proponent of, and proselytizer for, the benefits of surrogate keys, I none-the-less make an exception for all-key join tables such as your first example. One of the benefits of surrogate keys is that engines are generally optimized for joining on single integer fields, as the default and most common circumstance.
Your first proposal still obtains this benefit, but also has a 50% greater fan-put on each index level, reducing both the overall size and height of the indices on the join table. Although the performance benefits of this are likely negligible for anything smaller than a massive table it is best practice and comes at no cost.
When I might opt for the other design is if the relation were to accrue additional columns. At that point it is no longer strictly a join table.
I prefer the first design, using Composite Keys. Having an identity column on the junction table does not give you an advantage even if the parent tables have them. You won't be querying the BookAuthor using the identity column, instead you would query it using the BookID and AuthorID.
Also, adding an identity would allow for duplicate BookID-AuthorID combination, unless you put a constraint.
Additionally, if your primary key is (BookID, AuthorID), you need to an index on AuthorID, BookID). This will help if you want to query the the books written by an author.
Using composite key would be my choice too. Here's why:
Less storage overhead
Let's say you would use a surrogate key. Since you'd probably gonna want to query all authors for a specific book and vica versa you'd need indexes starting with both BookId and AuthorId. For performance reasons you should include the other column in both indexes to prevent a clustered key lookup. You'd probably would want to make one of them a unique to make sure no duplicate BookId/AuthorId combinations are added to the table.
So as a net result:
The data is stored 3 times instead of 2 times
2 unique constraints are to be validated instead of 1
Querying a junction table referencing table
Even if you'd add a table like Contributions (AuthorId, BookId, ...) referencing the junction table. Most queries won't require the junction table to be touched at all. E.g.: to find all contribution of a specific author would only involve the author and contributions tables.
Depending on the amount of data in the junction table, a compound key might end up causing poor performance over an auto generated sequential primary key.
The primary key is the clustered index for the table, which means that it determines the order in which rows are stored on disc. If the primary key's values are not generated sequentially (e.g. it is a composite key comprised of foreign keys from tables where rows do not fall in the same order as the junction table's rows, or it is a GUID or other random key) then each time a row is added to the junction table a reshuffle of the junction table's rows will be necessary.
You probably should use the compound/composite key. This way you are fully relational - one author can write many books and one book can have multiple authors.
This matter confuses me,
I have a College Information system the junction table between students table and subjects(curriculum) table, the primary key is composite key (StudentID, SubjectID) and both of them are Foreign keys but the student may be fail in exam and repeat the subject so we will have duplicate PK and we need to record all data. I have two ways to solve this matter but i don't know the best way?
Add new column as primary Key instead of composite key.
Join to the composite key Season Column and year column and the composite key will be(StudentID, SubjectID, Season, Year). I have to mention that i don't need this composite key as foreign key.
Which way is better for performance and DB integrity?
Subject and exam are separate (if related) concepts, so you should not try to represent them within the same table. Also, the fact that an exam has been held for the given subject is separate from the fact that any particular student took that exam. Split all these concepts into their own tables, and the model becomes more natural, for example:
Representing a student that took the same exam several times is just a matter of adding multiple rows to the STUDENT_EXAM table.
NOTE: STUDENT_SUBJECT just records the fact that the student has enrolled in the subject, but not when (which year/semester). Keeping semester-specific information may require additional tables and more complicated relationships within the model.
NOTE: There is a diamond-shaped dependency in this model. Since SUBJECT_ID was passed from the "top" (SUBJECT), down both "sides" (STUDENT_SUBJECT, EXAM) and then merged at the "bottom" (STUDENT_EXAM) of the diamond, a student cannot take an exam on a subject (s)he has not enrolled in.
I have two sql tables Players and Teams joined by a PlayerTeam junction table. A player can be on a team for various years. What's the best way to handle this relationship? I started with a PlayerTeamYear table, but the foreign key reference to both columns in PlayerTeam seems unwieldy, especially as I want to convert this to Entity Framework. This has got to be a common scenario.
Thanks!
I would add the year to the PlayerTeam table so it has three columns in its primary key.
The suggestion above is a good one. I'd actually add two columns to the PlayerTeam table: a year and a surrogate key to act as your primary key.
Think of it like this: The player table describes all the players, the team table describes all the teams, and the PlayerTeam table describes the the relationship between players and teams. This relationship should also include when the players were on a given team.
There are a few other things you may want to consider. Is it possible for a player to have left a team and not joined another one? For example, can a player retire or take a year off, and do you want to record this? If so, you may want to consider adding yet another date column to your PlayerTeam table indicating when a player left a given team.
If you're worried about the calls getting complex, you might consider creating your tables such that you can join them using natural joins. Natural joins, if they're available, are joins on two tables in which there are columns with the same name. Generally, the way I like addressing this problem is like this (please pardon the pseudocode):
PlayerTable:
(int) player_id PRIMARY KEY AUTOINCREMENT
...
other data
TeamTable:
(int) team_id PRIMARY KEY AUTOINCREMENT
...
other data
PlayerTeamTable:
(int) player_team_id PRIMARY KEY AUTOINCREMENT
(int) player_id FOREIGN KEY references PlayerTable(player_id)
(int) team_id FOREIGN KEY references TeamTable(team_id)
(datetime) joined_team
(datetime) left_team
This way, you can run a simple query to find everyone who is currently part of a particular team:
select * from PlayerTable natural join TeamTable, PlayerTeamTable where team_name = 'Liverpool' and left_team = NULL;
Or something to that effect. This may not be particularly efficient (you may want to run "explain" on the select query), but it's quite simple to handle.