I'm building a system whereby students can view their results once they are out. The system is supposed to keep record of the student's marks over the course period when they progress to the next year. I'm not sure about my use of foreign keys in the Module_tests table though. I have two foreign keys in this table to identify a test's mark for a specific student and specific module. I guess this pair of foreign keys are the primary key of Module_tests table. Is this a sensible design? Is there a more efficient one?
Thanks in advance
Student table has three columns: StudentID, first name, surname.
Module table has three columns: ModuleID, ModuleName, Year.
Module_tests table has five columns: StudentID, ModuleID, Assignment, PracticalTest, FinalExam.
(hope it's clear as text as I can't attach images in my posts)
I think you should add an additional column to Module_tests that can be primary and auto-increment.
In the situation whereby the students have to retake a module, the integrity of your database might get compromised. Other than that, your tables seem fine.
A junction table that uniquely identifies rows by two foreign keys has been asked in this question, and the answers pretty much apply here.
Related
I have read through handfuls of what would seem to make this a duplicate question. But reading through all of these has left me uncertain. I'm hoping to get an answer based on the absolute example below, as many questions/answers trail off into debates back and forth.
If I have:
dbo.Book
--------
BookID PK int identity(1,1)
dbo.Author
----------
AuthorID PK int identity(1,1)
Now I have two choices for a simple junction table:
dbo.BookAuthor
--------------
BookID CPK and FK
AuthorID CPK and FK
The above would be a compound/composite key on both FKs, as well as set up the FK relationships for both columns - also using Cascade on delete.
OR
dbo.BookAuthor
--------------
RecordID PK int identity(1,1)
BookID FK
AuthorID FK
Foreign key relationships on BookID and AuthorID, along with Cascade on delete. Also set up a unique constraint on BookID and AuthorID.
I'm looking for a simple answer as to why one method is better than another in the ABOVE particular example. The answers that I'm reading are very detailed, and I was just about to settle on a compound key, but then watched a video where the example used an Identity column like my first example.
It seems this topic is slightly torn in half, but my gut is telling me that I should just use a composite key.
What's more efficient for querying? It seems having a PK identity column along with setting up a unique constraint on the two columns, AND the FK relationships would be more costly, even if a little.
This is something I've always remembered from my database course way back in college. We were covering the section from the textbook on "Entity Design" and it was talking about junction tables... we called them intersect tables or intersection relations. I was actually paying attention in class that day. The professor said, in his experience, a many-to-many junction table almost always indicates an unidentified missing entity. These entities almost always end up with data of their own.
We were given an example of Student and Course entities. For a student to take a course, you need to junction between those two. What you actually have as a result is a new entity: an Enrollment. The additional data in this case would be things like Credit Type (audit vs regular) or Final Grade.
I remember that advice to this day... but I don't always follow it. What I will do in this situation is stop, and make sure to go back to the stakeholders on the issue and work with them on what data points we might still be missing in this junction. If we really can't find anything, then I'll use the compound key. When we do find data, we think of a better name and it gets a surrogate key.
Update in 2020
I still have the textbook, and by amazing coincidence both it and this question were brought to my attention within a few hours of each other. So for the curious, it was Chapter 5, section 6, of the 7th edition of this book:
https://www.amazon.com/Database-Processing-Fundamentals-Design-Implementation-dp-9332549958/dp/9332549958/
As a staunch proponent of, and proselytizer for, the benefits of surrogate keys, I none-the-less make an exception for all-key join tables such as your first example. One of the benefits of surrogate keys is that engines are generally optimized for joining on single integer fields, as the default and most common circumstance.
Your first proposal still obtains this benefit, but also has a 50% greater fan-put on each index level, reducing both the overall size and height of the indices on the join table. Although the performance benefits of this are likely negligible for anything smaller than a massive table it is best practice and comes at no cost.
When I might opt for the other design is if the relation were to accrue additional columns. At that point it is no longer strictly a join table.
I prefer the first design, using Composite Keys. Having an identity column on the junction table does not give you an advantage even if the parent tables have them. You won't be querying the BookAuthor using the identity column, instead you would query it using the BookID and AuthorID.
Also, adding an identity would allow for duplicate BookID-AuthorID combination, unless you put a constraint.
Additionally, if your primary key is (BookID, AuthorID), you need to an index on AuthorID, BookID). This will help if you want to query the the books written by an author.
Using composite key would be my choice too. Here's why:
Less storage overhead
Let's say you would use a surrogate key. Since you'd probably gonna want to query all authors for a specific book and vica versa you'd need indexes starting with both BookId and AuthorId. For performance reasons you should include the other column in both indexes to prevent a clustered key lookup. You'd probably would want to make one of them a unique to make sure no duplicate BookId/AuthorId combinations are added to the table.
So as a net result:
The data is stored 3 times instead of 2 times
2 unique constraints are to be validated instead of 1
Querying a junction table referencing table
Even if you'd add a table like Contributions (AuthorId, BookId, ...) referencing the junction table. Most queries won't require the junction table to be touched at all. E.g.: to find all contribution of a specific author would only involve the author and contributions tables.
Depending on the amount of data in the junction table, a compound key might end up causing poor performance over an auto generated sequential primary key.
The primary key is the clustered index for the table, which means that it determines the order in which rows are stored on disc. If the primary key's values are not generated sequentially (e.g. it is a composite key comprised of foreign keys from tables where rows do not fall in the same order as the junction table's rows, or it is a GUID or other random key) then each time a row is added to the junction table a reshuffle of the junction table's rows will be necessary.
You probably should use the compound/composite key. This way you are fully relational - one author can write many books and one book can have multiple authors.
This matter confuses me,
I have a College Information system the junction table between students table and subjects(curriculum) table, the primary key is composite key (StudentID, SubjectID) and both of them are Foreign keys but the student may be fail in exam and repeat the subject so we will have duplicate PK and we need to record all data. I have two ways to solve this matter but i don't know the best way?
Add new column as primary Key instead of composite key.
Join to the composite key Season Column and year column and the composite key will be(StudentID, SubjectID, Season, Year). I have to mention that i don't need this composite key as foreign key.
Which way is better for performance and DB integrity?
Subject and exam are separate (if related) concepts, so you should not try to represent them within the same table. Also, the fact that an exam has been held for the given subject is separate from the fact that any particular student took that exam. Split all these concepts into their own tables, and the model becomes more natural, for example:
Representing a student that took the same exam several times is just a matter of adding multiple rows to the STUDENT_EXAM table.
NOTE: STUDENT_SUBJECT just records the fact that the student has enrolled in the subject, but not when (which year/semester). Keeping semester-specific information may require additional tables and more complicated relationships within the model.
NOTE: There is a diamond-shaped dependency in this model. Since SUBJECT_ID was passed from the "top" (SUBJECT), down both "sides" (STUDENT_SUBJECT, EXAM) and then merged at the "bottom" (STUDENT_EXAM) of the diamond, a student cannot take an exam on a subject (s)he has not enrolled in.
My database includes a Customer and a Subcategories table.
Customers can belong to one-or-more Subcategories (and of course there are one-or-more Customers for any Subcategory).
I wonder which is the best solution to link these tables that share a many-to-many relationship:
A "standard" junction CustomerSubcategory table that includes just two fields: CustomerID (PK, FK) and SubcategoryID (PK, FK), or
A CustomerSubcategoryDetail table that would also include a CustomerSubcategoryDetailID (PK), as well as the SubcategoryID (PK, FK) and CustomerID (FK) fields.
Any advice?
Cheers, Corbex
I don't think the additional PK on the junction table adds any value. Will you ever have to look up the value(s) in that table without being able to identify them by Customer and/or SubCategoryID? Will the CustomerSubCategoryDetailID value be used anywhere else?
An interesting discussion ensued on my blog when I complained about folks knee-jerking an IDENTITY column onto every single table. Some folks made some good arguments for having an OrderDetailID column on the OrderDetails table. Do any of these situations apply to you?
I don't see the need for a third column with a customersubcategoryid; therefore, I would go with the first option. The extra column is not going to give you any advantage in any of your joins. You won't even use it.
Go with your first choice. Make sure that you create a key that includes both columns and enforces uniqueness. That way you'll never need a separate primary key to distinguish between otherwise identical rows.
There is a additional benefit in having both columns in the unique key. Queries that need to perform a lookup by the first column can use the index, but they never need to read the data rows because the index already contains the second column.
I have a few database tables that really only require a unique id that references another table e.g.
Customer Holiday
******** *******
ID (PK) ---> CustomerID (PK)
Forename From
Surname To
....
These tables such as Holiday, only really exist to hold information regarding a Customer. Therefore, do I need to specify a separate field to hold the ID for the holiday? i.e.
Holiday
*******
ID (PK)
CustomerID (FK)
...
Or would I be ok, in this instance, to just set the CustomerID as the primary key in the table?
Regards,
James.
This really depends on what you are doing.
if each customer can have only 1 holiday, then yes, you could make the customerid the primary key.
If each customer can have multiple holidays, then no, you would want to add a new id column, make it the primary. This allows you to select holidays by each customer AND to select individual records by their unique id.
Additionally if each customer can only have 1 holiday, I'd just add the holiday information to the table, as a one-to-one relationship is typically un-necessary.
If I understand your question correctly, you could only use the Customer table as a primary key in Holiday if there will never be any other holiday for that customer in the table. In other words, two holidays for one customer breaks using the Customer id as a primary key.
If there will ever be an object-oriented program associated with this database, each entity (each row) must have a unique key.
Your second design assures that each instance of Holiday can be uniquely identified and processed by an OO application using a simple Object-Relational Mapping.
Generally, it's best to assure that every entity in the database has a unique, immutable, system-assigned ("surrogate") key. Other "natural" keys can have unique indexes, constraints, etc., to fit the business logic.
Previous answer correct, but also remember, you could have 2 seperate primary keys in each table, and the "holiday" table would have the foreign key to CustomerId.
Then you could manage the assignment of holidays to customers in your code, to make sure that only one holiday can be assigned to a customer, but this brings in the problem concurrency, being 2 people adding a holiday to a customer at the same time will most probably result in a customer having 2 holidays.
You could even place holiday fields in the customer table if a customer can only be created with a holiday, but this design is messy, and not really advised
So once again, option in your question 2 still the best way to go, just giving you your options.
In practice I've found that every table should have a unique primary key identifying the records in those tables. All relationships with other tables should be explicitly declared.
This helps others understand the relationships better, especially if they use a tool to reverse-engineer the schema into a visual representation.
In addition, it gives you more flexibility to expand your solution in the future. You may only have one holiday per customer now, but this is much more difficult to change if you make customer ID the primary key.
If you want to mandate the uniqueness of customer in the holiday table, create a unique index on that foreign key. In fact, this could improve performance when querying on customer ID (although I'm guessing you won't see enough records to notice this improvement).
I have two sql tables Players and Teams joined by a PlayerTeam junction table. A player can be on a team for various years. What's the best way to handle this relationship? I started with a PlayerTeamYear table, but the foreign key reference to both columns in PlayerTeam seems unwieldy, especially as I want to convert this to Entity Framework. This has got to be a common scenario.
Thanks!
I would add the year to the PlayerTeam table so it has three columns in its primary key.
The suggestion above is a good one. I'd actually add two columns to the PlayerTeam table: a year and a surrogate key to act as your primary key.
Think of it like this: The player table describes all the players, the team table describes all the teams, and the PlayerTeam table describes the the relationship between players and teams. This relationship should also include when the players were on a given team.
There are a few other things you may want to consider. Is it possible for a player to have left a team and not joined another one? For example, can a player retire or take a year off, and do you want to record this? If so, you may want to consider adding yet another date column to your PlayerTeam table indicating when a player left a given team.
If you're worried about the calls getting complex, you might consider creating your tables such that you can join them using natural joins. Natural joins, if they're available, are joins on two tables in which there are columns with the same name. Generally, the way I like addressing this problem is like this (please pardon the pseudocode):
PlayerTable:
(int) player_id PRIMARY KEY AUTOINCREMENT
...
other data
TeamTable:
(int) team_id PRIMARY KEY AUTOINCREMENT
...
other data
PlayerTeamTable:
(int) player_team_id PRIMARY KEY AUTOINCREMENT
(int) player_id FOREIGN KEY references PlayerTable(player_id)
(int) team_id FOREIGN KEY references TeamTable(team_id)
(datetime) joined_team
(datetime) left_team
This way, you can run a simple query to find everyone who is currently part of a particular team:
select * from PlayerTable natural join TeamTable, PlayerTeamTable where team_name = 'Liverpool' and left_team = NULL;
Or something to that effect. This may not be particularly efficient (you may want to run "explain" on the select query), but it's quite simple to handle.