I'm having issues understanding super keys in a relation when the relation only contains one functional dependency.
When considering a relation R(A,B,C,D,E) where A is the primary key and with the functional dependency A->B, can A be considered a super key to the relation since there is only one FD? Or would one need to expand the functional dependency to include the unmentioned parts of the relation (C,D,E) in order to find a super key?
I'm mainly confused because all the material I've seen on the web until this point has contained multiple functional dependencies all of which have contained all of the attributes within the relation, so I'm not sure how to interpret the unused attributes. If someone could help clarify this I'd appreciate it!
If the only explicitly mentioned FD for the relation schema is A ⟶ B, then there is the implicit, trivial FD {A,B,C,D,E} ⟶ {A,B,C,D,E}. Given that A ⟶ B, we can deduce that:
{A,C,D,E} is the primary key of R.
There is a partial key dependency for A ⟶ B so R is not in BCNF and the relation schema R should be broken down into two non-loss projections:
R1(A,B) with A ⟶ B so A is the primary key.
R2(A,C,D,E) which is all key (primary key is the combination {A,C,D,E}).
And R1 ⋈ R2 ≡ R (assuming ⋈ is the join operation).
As Catcall says in comments, there is mention of A being the primary key of R. If A is a primary key, then each of the singleton FDs A ⟶ B, A ⟶ C, A ⟶ D and A ⟶ E (or, collectively, A ⟶ {B,C,D,E}) applies, and there's no need to call out A ⟶ B separately. If A is a primary key of R, there's no need to decompose R at all; there is really nothing interesting about the table because it is normalized to 5NF given the available information (assuming there are no unstated non-trivial FDs, MVDs or JDs).
If A is a primary key, then it is also a superkey, but not a proper superkey. Any of the attribute combinations with A and one or more of the other attributes is also a superkey, and is a proper superkey.
When considering a relation R(A,B,C,D,E) where A is the primary key
and with the functional dependency A->B, can A be considered a super
key to the relation since there is only one FD?
If it's given that A is the primary key of R, then by definition you have the FD A->BCDE. It also follows that
A->B
A->C
A->D
A->E
If A is a primary key of R, then it follows by definition that A is also a candidate key and a superkey of R.
Lets make it simple:
Here is a definition for Super,candidate and primary keys
Super Keys
Super key stands for superset of a key.
A Super Key is a set of one or more attributes that are taken collectively and can identify all other attributes uniquely.
Candidate Keys
Candidate Keys are super keys for which no proper subset is a super key.
In other words candidate keys are minimal super keys.
Primary Key:
It is a candidate key that is chosen by the database designer to identify entities with in an entity set.
Primary key is the minimal super keys. In the ER diagram primary key is represented by underlining the primary key attribute.
Ideally a primary key is composed of only a single attribute. But it is possible to have a primary key composed of more than one attribute.
Related
I have a confusion re the question below since according to me maybe its answer is that AD is a candidate key. But it is the primary key so I want to know: Is its candidate key the same as its primary key?
A relation R={A,B,C,D,E,F} is given with following set of functional dependencies
A->B
AD->C
B->F
A->E
What will be its candidate key? Will it be same as its primary key?
A relation has one or more CKs (candidate keys). (They are the superkeys that don't contain smaller superkeys.) We can call one of the CKs "the" PK (primary key). We then collectively call the other CKs "the" AKs (alternate keys). PKs & AKs are irrelevant to relational theory.
It doesn't make sense to say that a set of columns "is its primary key" unless it is already known or assumed that there is just one CK or that it is a CK and has been chosen as the PK.
PS None of those FDs (functional dependencies) determines A or D so they must be in all CKs. But AD determines all other attributes. So it is a CK & it is the only CK. So if we name a PK then it has to be AD. And if we refer to the PK without explicitly naming AD as the PK then we must mean AD.
I must normalise table data to 3NF.
I have a composite key from 1NF, but all of the non-key attributes appear to be reliant on both of the primary key attributes. I'm trying to take it from 2NF into 3NF. Can I still have a composite key?
You can have a compound key in every normal form.
In fact, when you write functional dependencies in the form A->B, both A and B refer to sets of attributes. That's why they're in uppercase; uppercase letters represent sets in set theory.
...all of the non-key attributes appear to be reliant on Both of the primary keys...
There's only one primary key. In your case, that one primary key has more than one attribute.
There might be more than one candidate key, though. In normalization, every candidate key is equally important. For example, if you're trying to identify transitive dependencies, you need to look for transitivity with respect to every candidate key, not just the primary key.
From the Database Management Systems book: given the relation SNLRWH (each letter denotes an attribute) and the following functional dependencies:
S->SNLRWH (S is the PK)
R->W
My attempt:
First, it is not 3NF: for the second FD, neither R contains W, nor R contains a key, nor W is part of a key.
Second, it is/not 2NF. If we examine the second FD, W is dependent on R, which in turn is not part of a key. STUCK.
2NF is violated if some proper subset of a candidate key appears as a determinant on the left hand side of one of your (non-trivial) dependencies. Ask yourself whether any of your determinants is a subset of a candidate key.
Usually 2NF is violated only when a relation has a composite key - a key with more than one attribute. It is technically possible for a relation with only simple keys (single attribute keys) to violate 2NF if the empty set (∅) happens to be a determinant. Such cases are fairly unusual and rarely thought worthy of consideration because they are so obviously "wrong". For completeness, here's a fun example of that special case. In the following relation Circumference and Diameter are both candidate keys. The dependency in violation of 2NF is ∅ -> Pi, the ratio of the circumference to the diameter.
2NF has to do with partial key dependencies. In order for a relation to fail the test for 2NF, the relation has to have at least one candidate key that has at least two columns.
Since your relation has only one candidate key, and that candidate key has only one column, you can't possibly have a partial key dependency. It passes the test for 2NF.
Is the relation from below correctly divided into relations in BCNF:
R(a,b,c,d,e) - a and b are primary keys and there are dependencies such as:
a → c
a → e
c → e
I split the above relations into:
AC(a,c)
CE(c,e)
AB(a,b,d)
Is it the case that a is a primary key and b is a primary key, or is it the case that {a,b} is the (composite) primary key? If the columns are separately primary keys, then you have a number of additional but not explicitly stated functional dependencies: a → bd and b → acde. If the columns {a,b} are a composite PK, then you have an additional functional dependency ab → cde. Either way, the AC and CE relations are fine, and the ABD relation is the other necessary one. The only issue is 'what are the candidate keys of ABD'? And the answer is 'either {a,b} as a composite PK, or a and b as two separate candidate keys'.
Are you sure about that primary key? Normally, determining all the candidate keys is part of these kinds of exercises.
An informal way of expressing what we know about candidate keys is that every attribute that's not on the right-hand side (RHS) of any functional dependency must be part of every candidate key.
Since I don't know how you determined that {ab} is a candidate key, I'd be inclined to say that, because {abd} is not on any RHS, {abd} must be part of every candidate key.
In short, your FDs say that {abd} is the primary key, not {ab}.
In order for your key and your decomposition to be right, you need to have the additional FD ab->d.
We say 2NF is "the whole key" and 3NF "nothing but the key".
Referencing this answer by Smashery:
What are database normal forms and can you give examples?
The example used for 3NF is exactly the same as 2NF--it's a field which is dependent on only one key attribute. How is the example for 3NF different from the one for 2NF?
Suppose that some relation satisifies a non-trivial functional dependency of the form A->B, where B is a nonprime attribute.
2NF is violated if A is not a superkey but is a proper subset of a candidate key
3NF is violated if A is not a superkey
You have spotted that the 3NF requirement is just a special case (but not really so special) of the 2NF requirement. 2NF in itself is not very important. The important issue is whether A is a superkey, not whether A just happens to be some part of a candidate key.
Since you ask very specific question about an answer for existing so question here is an explanation of that (and basically I'll say what dportas already said in his answer, but in more words).
The examples of design that is not in 2NF and not in 3NF are not the same.
Yes, the dependency in both cases is on a single field.
However, in non 2NF example:
dependency is on the part of the primary key
while in non 3NF example (which is in 2NF):
dependency is on a field that is not a part of the primary key (and also notice that in that example it does satisfy 2NF; this is to show that even if you check for 2NF you should also check for 3NF)
In both cases to normalize you would create additional table which would not exhibit update anomalies (example of update anomaly: in 2NF example, what happens if you update Coursename for IT101|2009-2, but not for IT101|2009-1? You get inconsistent=meaningless=unusable data).
So, if you memorize the key, the whole key and nothing but the key, which covers both 2NF and 3NF, that should work for you in practice when normalizing. The distinction between 2NF and 3NF might seem subtle to you (question if in the additional dependency the attribute(s) on which the data is dependent are part of candidate key or not) - and, well, it is - so just accept it.
2NF allows non-prime attributes to be functionally dependent on non-prime attributes
but
3NF allows non-prime attributes to be functionally dependent only on super key
Thus,when a table is in 3NF it is in 2NF and 3NF is stricter than 2NF
Hope this helps...
You have achieved the 3rd NF when there are no relations between the key and other columns that don't depend on it.
Not sure my professor would have said that like this but this is what it is.
If you're "in the field". Forget about the definitions. Look for "best practices". One is DRY : Don't Repeat Yourself.
If you follow that principle, you already master everything you need for NF.
Here is an example.
Your table has the following schema:
PERSONS : id, name, age, car make, car model
Age and name are related to the person entry (=> id) but the model depends to the car and not the person.
Then, you would split it in two tables:
PERSONS : id, name, age, car_models_id (references CAR_MODELS.id)
CAR_MODELS : id, name, car_makes_id (references CAR_MAKES.id)
CAR_MAKES : id, name
You can have replication in 2FN but not in 3FN anymore.
Normalization is all about non-replication, consistency, and from another point of view foreign keys and JOINs.
The more normalized the better for data but not for performance nor understanding if it gets really too complicated.
2NF follows the partial dependency whereas 3NF follows the transitive functional dependency. It is important to know that the 3NF must be in 2NF and support transitive functional dependency.
First, we have to know the tools we work with:
candidate key attribute;
non candidate key attribute;
partial dependency;
full dependency;
Candidate Key Attribute
A candidate key attribute is any column or combination of columns that can be/form primary key. You can have many candidate keys, but you will pick only one of these to be primary key. Still, any candidate key attribute is important in 2NF. No need to be primary key, or any key, it is enough to be a candidate key attribute. 2NF refers to CANDIDATE KEY. Those that say key or primary key instead of "candidate key" add to the confusion.
Non Candidate Key Attribute
Any column that can't be primary key and can't be part of the primary key.
Partial Dependency
Partial dependency arrives when there is a candidate key formed by MORE THAN ONE column, AND a non candidate key attribute depends only on A column that constitutes the candidate key.
Full Dependency
Any non candidate key attribute, if depends on a candidate key, then depends on the WHOLE candidate key. If the candidate key is formed by more than one column, then the dependent column must depend on any column that forms the candidate key.
Now you have the tools to understand 2NF and 3NF.
2NF does not allow partial dependency. If you find a non candidate key attribute that is partially dependent on a candidate key attribute, you must beak partial dependency to make it full dependency. So 2NF allows a non candidate key attribute to be full dependent on a candidate key attribute that is not primary key. It is just a possible primary key, if you pick it, but you are not forced to pick it. 2NF is compliant only by that.
Let's say you have it in 2NF. All non candidate key attributes are full dependent on candidate key attributes. But a non candidate key attribute is full dependent on a candidate key attribute that you did not pick it to be primary key. 3NF do not allow it. All full dependencies must be with primary key (at this point you picked a primary key already).