This matter confuses me,
I have a College Information system the junction table between students table and subjects(curriculum) table, the primary key is composite key (StudentID, SubjectID) and both of them are Foreign keys but the student may be fail in exam and repeat the subject so we will have duplicate PK and we need to record all data. I have two ways to solve this matter but i don't know the best way?
Add new column as primary Key instead of composite key.
Join to the composite key Season Column and year column and the composite key will be(StudentID, SubjectID, Season, Year). I have to mention that i don't need this composite key as foreign key.
Which way is better for performance and DB integrity?
Subject and exam are separate (if related) concepts, so you should not try to represent them within the same table. Also, the fact that an exam has been held for the given subject is separate from the fact that any particular student took that exam. Split all these concepts into their own tables, and the model becomes more natural, for example:
Representing a student that took the same exam several times is just a matter of adding multiple rows to the STUDENT_EXAM table.
NOTE: STUDENT_SUBJECT just records the fact that the student has enrolled in the subject, but not when (which year/semester). Keeping semester-specific information may require additional tables and more complicated relationships within the model.
NOTE: There is a diamond-shaped dependency in this model. Since SUBJECT_ID was passed from the "top" (SUBJECT), down both "sides" (STUDENT_SUBJECT, EXAM) and then merged at the "bottom" (STUDENT_EXAM) of the diamond, a student cannot take an exam on a subject (s)he has not enrolled in.
Related
how normalized is this table?
Example SqlFiddle
So I know that the topic and definition of normalization itself has been pretty well discussed but I was hoping I could get some clarification on my understanding of normalization. An example is a diagram I drew out in Access real quick, from what I think, I believe that these relationships and tables themselves all fit in the 3NF criteria. There is a Projects table with the following fields ProjNumber(PK), ProjName, and ProjDesc. Then there is an Assignments table with a compound key consistent of EmpID/ProjNumber, with the fields HourlyBillingRate, NumOfHours, and TeamNum. And lastly is the Teams table, which consists of the fields TeamNum(PK), TeamName, ProjNumber.
The ProjNumber from Assignments and Teams are both foreign keys that relate back to the Projects table, and the TeamNum field from Assignments is a foreign key relating back to the Teams table primary key. I'm not too sure if it's necessary to directly relate back to the Teams table, if I have the ProjNumber foreign key because that project would have an associated TeamNum.
The context of these tables is that there is a project that has to be done, a team associated with carrying out that team, and then the employees that are on that team which are paid an hourly billing rate for that proj they are working under.
The reason I use a compound key, is I wanted to answer the question of "What is the employee works on multiple projects?", so I couldn't make EmpID the sole primary key, thus I chose to make it a compound key because even if the employee works on multiple projects, the combination of the two will always be unique. I believe that each field is necessary and relevant fully with their respective primary keys.
Thoughts? Does it in fact fulfill 3NF criteria?
It depends. Your diagram and discussion appear to assume that the primary key is the only candidate key in each of the tables. That appears not to be the case.
In the Assignments table, it looks as though EmpID and TeamNumber is another candidate key, provided that TeamNumber may not be NULL.
If we look at this table with EmpId, TeamNumber as the key, then it is not in 2NF. ProjNumber is determined by TeamNumber, which is not the whole key.
So now the answer to your question turns on whether FDs are analyzed with respect to all candidate keys or just the declared prmary key. I have seen tutorials on on normalization that go both ways. I follow the one that considers all candidate keys, so the table is not in 2NF.
Unless I've misconstrued the FDs in your case, or Assigment.TeamNumber can be NULL.
HOWEVER, your SQL Fiddle presentation is different. Now, if there are several teams on one project, and an employee is assigned to one project for a few hours, there isn't any way to tell what team the employee was on. The FDs are not the same in the SQL Fiddle example and in the implicatins I take from your diagram.
I have read through handfuls of what would seem to make this a duplicate question. But reading through all of these has left me uncertain. I'm hoping to get an answer based on the absolute example below, as many questions/answers trail off into debates back and forth.
If I have:
dbo.Book
--------
BookID PK int identity(1,1)
dbo.Author
----------
AuthorID PK int identity(1,1)
Now I have two choices for a simple junction table:
dbo.BookAuthor
--------------
BookID CPK and FK
AuthorID CPK and FK
The above would be a compound/composite key on both FKs, as well as set up the FK relationships for both columns - also using Cascade on delete.
OR
dbo.BookAuthor
--------------
RecordID PK int identity(1,1)
BookID FK
AuthorID FK
Foreign key relationships on BookID and AuthorID, along with Cascade on delete. Also set up a unique constraint on BookID and AuthorID.
I'm looking for a simple answer as to why one method is better than another in the ABOVE particular example. The answers that I'm reading are very detailed, and I was just about to settle on a compound key, but then watched a video where the example used an Identity column like my first example.
It seems this topic is slightly torn in half, but my gut is telling me that I should just use a composite key.
What's more efficient for querying? It seems having a PK identity column along with setting up a unique constraint on the two columns, AND the FK relationships would be more costly, even if a little.
This is something I've always remembered from my database course way back in college. We were covering the section from the textbook on "Entity Design" and it was talking about junction tables... we called them intersect tables or intersection relations. I was actually paying attention in class that day. The professor said, in his experience, a many-to-many junction table almost always indicates an unidentified missing entity. These entities almost always end up with data of their own.
We were given an example of Student and Course entities. For a student to take a course, you need to junction between those two. What you actually have as a result is a new entity: an Enrollment. The additional data in this case would be things like Credit Type (audit vs regular) or Final Grade.
I remember that advice to this day... but I don't always follow it. What I will do in this situation is stop, and make sure to go back to the stakeholders on the issue and work with them on what data points we might still be missing in this junction. If we really can't find anything, then I'll use the compound key. When we do find data, we think of a better name and it gets a surrogate key.
Update in 2020
I still have the textbook, and by amazing coincidence both it and this question were brought to my attention within a few hours of each other. So for the curious, it was Chapter 5, section 6, of the 7th edition of this book:
https://www.amazon.com/Database-Processing-Fundamentals-Design-Implementation-dp-9332549958/dp/9332549958/
As a staunch proponent of, and proselytizer for, the benefits of surrogate keys, I none-the-less make an exception for all-key join tables such as your first example. One of the benefits of surrogate keys is that engines are generally optimized for joining on single integer fields, as the default and most common circumstance.
Your first proposal still obtains this benefit, but also has a 50% greater fan-put on each index level, reducing both the overall size and height of the indices on the join table. Although the performance benefits of this are likely negligible for anything smaller than a massive table it is best practice and comes at no cost.
When I might opt for the other design is if the relation were to accrue additional columns. At that point it is no longer strictly a join table.
I prefer the first design, using Composite Keys. Having an identity column on the junction table does not give you an advantage even if the parent tables have them. You won't be querying the BookAuthor using the identity column, instead you would query it using the BookID and AuthorID.
Also, adding an identity would allow for duplicate BookID-AuthorID combination, unless you put a constraint.
Additionally, if your primary key is (BookID, AuthorID), you need to an index on AuthorID, BookID). This will help if you want to query the the books written by an author.
Using composite key would be my choice too. Here's why:
Less storage overhead
Let's say you would use a surrogate key. Since you'd probably gonna want to query all authors for a specific book and vica versa you'd need indexes starting with both BookId and AuthorId. For performance reasons you should include the other column in both indexes to prevent a clustered key lookup. You'd probably would want to make one of them a unique to make sure no duplicate BookId/AuthorId combinations are added to the table.
So as a net result:
The data is stored 3 times instead of 2 times
2 unique constraints are to be validated instead of 1
Querying a junction table referencing table
Even if you'd add a table like Contributions (AuthorId, BookId, ...) referencing the junction table. Most queries won't require the junction table to be touched at all. E.g.: to find all contribution of a specific author would only involve the author and contributions tables.
Depending on the amount of data in the junction table, a compound key might end up causing poor performance over an auto generated sequential primary key.
The primary key is the clustered index for the table, which means that it determines the order in which rows are stored on disc. If the primary key's values are not generated sequentially (e.g. it is a composite key comprised of foreign keys from tables where rows do not fall in the same order as the junction table's rows, or it is a GUID or other random key) then each time a row is added to the junction table a reshuffle of the junction table's rows will be necessary.
You probably should use the compound/composite key. This way you are fully relational - one author can write many books and one book can have multiple authors.
I'm building a system whereby students can view their results once they are out. The system is supposed to keep record of the student's marks over the course period when they progress to the next year. I'm not sure about my use of foreign keys in the Module_tests table though. I have two foreign keys in this table to identify a test's mark for a specific student and specific module. I guess this pair of foreign keys are the primary key of Module_tests table. Is this a sensible design? Is there a more efficient one?
Thanks in advance
Student table has three columns: StudentID, first name, surname.
Module table has three columns: ModuleID, ModuleName, Year.
Module_tests table has five columns: StudentID, ModuleID, Assignment, PracticalTest, FinalExam.
(hope it's clear as text as I can't attach images in my posts)
I think you should add an additional column to Module_tests that can be primary and auto-increment.
In the situation whereby the students have to retake a module, the integrity of your database might get compromised. Other than that, your tables seem fine.
A junction table that uniquely identifies rows by two foreign keys has been asked in this question, and the answers pretty much apply here.
I am designing an airline database (the outline of one anyway) for an assignment and seem to be running around in circles.
Three tables are concerned:
Customer Booking_Reference Flight
cust_id(pk) reference_id(pk) Flight_id(pk)
cust_id(fk)
A booking reference can have many flights.
A flight will have many booking references.
I am trying to break up the many to many relationship. Is it possible to have a relational table with the flight_id as the attributes (columns) and the booking_reference as the rows (data)? If so there can be no primary key, which is a no-go as I understand.
Alternatively I could make the booking_reference/flight relational table with 2 attributes and a compound primary key of booking_reference/flight, which would result in both entities being duplicated but the primary key being unique (half of it anyway). Is this acceptable design practice?
I was going to just list a max number of 8 flights as columns in the booking reference table (with NULL for the entries where there is less than 8 flights) and give customers with more than 8 flights a new reference_id, but this seems to be more ridiculous as i learn more about databases, resulting in more reference ids and more NULL data.
Any ideas on which route to take?
Rather than having eight (or any arbitrary number of) columns, create what's sometimes called a join table, with three columns:
Table: references_flights
id (Primary key)
reference_id (fk)
flight_id (fk)
You should then be able to query data across them with the right JOINs, but I'll leave that for someone with more database expertise.
I am implementing a system to represent a school schedule in SQL, and I want to have a table called Student which includes all of the student's classes. do i need to include references to a Class table as attributes class1,class2,class3,...,class12
or can I use a sort of array?
Since you are using relational database, it would be good to make a m:n relationship between Student and Class table. It would mean that you will have Student table with primary key student_id, Class table with primary key class_id, and one more table, called StudentClass with foreign keys fk_student_id and fk_class_id, plus some additional properties (depending on what do you want to achieve). That would be a good relational design.
You could have a field filled with a comma separated list, or you could keep a separate table of 'allowed classes', with associated data (unique ID number, name, description, teacher), then use foreign keys and an intermediate table to implement a many to many relationship of students to classes.
Many to many relationship
Foreign keys in SQLite
Support for foreign keys in SQLite is pretty good these days, and all the features you'll likely want are there.