Given a relation schema R with n attributes R(A1, A2, ..., An). What’s the maximum number of possible super-keys for R? Please justify your answer.
Given a relation schema R with n attributes R(A1, A2, ..., An). What’s the maximum number of possible candidate keys for R? Please justify your answer.
I am still wondering on how to answer both of these questions. What I have thought as answer for the first question would be (2^n) - 1 because empty set is not included.
As for the second question. My answer would be n atrributes.
What do you guys think?
Maximum number of superkeys on relation with n attributes would be number of all possible combinations of attributes. This turns out to be (2^n)-1.
This is nothing but taking
1 attribute from n (nC1) + 2 attributes from n (nC2) + ... + nCn = (2^n)-1
Or we can simply think it as follows: we have each of n attributes represented as a bit. We can put 1 when an attribute has to be a part of superkey or 0 otherwise. So this will be (2^n), because we have two choices (1 or 0) for each of the n bits/attributes. We subtract 1 to avoid all 0's, that is considering 'no-attribute' as a superkey. So (2^n)-1.
This situation can occur when all attributes can functionally determine all other attributes. This occurs when there is a cycle of functional dependencies among attributes. For example if there is a relation R(A,B,C,D), then the FD cycle would be:
A->B
B->C
C->D
D->A
The superkeys would are A,B,C,D,(AB),(AC),(AD),(BC),(BD),(CD),(ABC),(ACD),(ABD),(BCD),(ABCD), total (2^4)-1=15
The maximum possible number of candidate keys will occur for size-r keys where nCr is biggest. Or in other words, when all size-r combinations of attributes are candidate keys, maximum number of candidate keys occur.
This can be seen from above example. Above A,B,C,D are all candidate keys, so none of their superkeys (say (AB), or (BCD) or (ABCD)) are candidate keys. Similarly if, in any relation (AB) is a candidate key, then none of its superkey (say ABC or ABD) can be a candidate key.
In general, nCfloor(n/2) is the maximum number of possible candidate key for relation on n attributes.
PS: this considers the definition that candidate key is a minimal superkey (one from which no attribute can be removed while still leaving it capable to uniquely identify / functionally determine all other attributes)
The maximum number of superkeys for R with n attributes is 2^n, which is the size of the power set of R's attributes. This is obvious when you realise that ∅ (the empty set) may be a candidate key and that ∅ is a subset of every set of attributes.
The maximum number of candidate keys is given by nC(n/2) (binomial coefficient).
Related
I need help finding the candidate keys from the given relation:
R (A,B,C,D,E) with FD:
A -> BE
B -> BE
B -> D
STEPS:
I know all the attributes can identify themselves so: A,B,C,D,E -> {ABCDE}
Now I know A -> BE, so i can cross out BE and get this: ACD -> {ABCDE}
The prime attributes are (A,C,D)
D would now be replaced by B giving me my other candidate key of (ABC)
I have the answer has ACD and ABC but apparently, it's AC. What am I doing wrong?
A simple reasoning is the follow.
A and C must be present in any candidate key, since they never appear in the right part of some FD.
So, let’s see if they are already a candidate key by computing their closure, AC*.
By applying the rules for computing the closure of a set of attributes, we can see easily that AC* is equal to ABCDE, so AC is a candidate key. The only other attribute that appears in the left side of a FD is B, so we should check that no other candidate key contains it. But noting that the attributes AC must be present in any key and already they form a candidate key, so adding any attribute to them produces only superkeys, we can conclude that this is the only candidate key.
Example:
Let R = (A, B, C, D)
Let F = {C -> AD, AB -> C}
Then how can I find the candidate keys?
The answer is {AB, BC}
Why?
Given a relation schema R with a set of attributes T and a non-empty set of non-trivial functional dependencies F describing a certain set of constraints that are assumed to hold in that schema:
Every attribute that does not appear in the right part of a FD in F must be present in any candidate key.
Every attribute that does not appear in the left part of a FD in F cannot be present in any candidate key.
To find all the candidate keys, for all the other attributes, you should try to add to the attributes of 1 above every possible combination of them, and see if the closure determines all the attributes of the relation (and such that you cannot remove any attribute from the combination without losing this property).
Note that, if the set F is empty, the only candidate key is constituted by all the attributes T.
In practice there are algorithms that can be relatively efficient (since the problem of finding all the keys is in the general case exponential).
A simple approach is to start from a canonical cover of the functional dependencies, in this case for instance from:
{ A B → C
C → A
C → D }
and after finding the attributes that must be present in any candidate key (in this case B), try to add to them the left hand side of the dependencies (in this case both AB, that is A, and C) (in any order, and possibly combining them) and compute the closure to see if they determine all the attributes. When you discover that some set of attributes determines all the relation attributes, you have found a candidate key (and it is not necessary to add other attributes to it). In your example:
(A B)+ = A B C D
(B C)+ = A B C D
So A B and B C are candidate keys (since you cannot remove any attribute to both of them without losing the property of determining all the other attributes). And since there are no other attributes (a part from D that cannot be present in a candidate key), you know that you have found all the candidate keys.
I have a problem about the 2nd normal form. The rule says : “A relation is in second formal form when it is in 1NF and there is no such non-key
attribute that depends on part of the candidate key, but on the entire candidate key.” (Neeraj Sharma, 2010) My problem is about the candidate key. It is only the primary key of a relation or all possible candidate keys.
Thank you for your help
It counts for any candidate key. If it counted only for the primary key, simply adding a surrogate id would be enough to put any table into 3NF. However, that wouldn't help to ensure that each fact is recorded once only and independent of other facts.
Trying to clear your doubt by an example:
According to 2NF "Partial Dependencies are not allowed in a relation."
Assume this relation: R(A,B,C,D)
lets suppose there are 3 CK's related to this relation (Assume CK's: AB,AC,B).
Then first write all the attributes that are present in any of CK's,these are called Prime attributes.Other than that are called non prime attributes.
Here:
Prime Attributes (3)= {A,B,C}
Non Prime Attributes (1)={D}
Now According to 2 NF, any FD should not be in this form:
This kind of FD's aren Not allowed in 2NF:
"Part of any candidate key(Partial Dependency) ---> Non Prime attribute"
Means:
Here : C---> D(Not allowed in 2 NF because C is a part of CK "AC" and D is non prime attribute)
Hope this helps. For more detail, you can also refer : Detailed explanation of Normal forms
I am trying to understand the notion of data redundancy. Can someone please help to explain what is the difference between the notion of "a relation schema is redundant" and "a relation schema is value redundant"? Below is the formal definition, which I don't quite get.
So far my understanding is that if some data in a relation can be derived using functional dependencies over that relation, that data is redundant. However I don't know why they distinguish "redundant" and"value redundant". Many thanks in advance!
A schema is redundant for a sigma if some relation with that heading and satisfying the FDs in sigma has two equal subrows on the attributes of some FD in the closure of sigma. Eg: If X->Y and Y->Z are in sigma but X->Z is not then X->Z is nevertheless in the closure of sigma, so X->Z also has to hold. So if some relation satisfying sigma's FDs has two rows with the same (X,Y), (Y,Z) or (X,Z) value then the schema is redundant. Ie a schema is redundant when some satisfying relation actually exhibits certain (informally) "redundant" subrows per the closure of a sigma.
A schema is value-redundant for a sigma if some relation with that heading and satisfying the FDs in sigma has an element that when given a different value always gives a relation that doesn't satisfy the FDs in sigma. Ie it has an element value that given the rest of the element values must be that value. Eg in any of the above 3 cases of there being equal subrows (ie XY, YZ or XZ), the element in the determined subrow (ie respectively Y, Z or Z) has to have that value given the rest of the element values. Ie a schema is value-redundant when some satisfying relation actually exhibits a certain (informally) "redundant" subrow per a sigma.
Notice that redundancy is in terms of the closure of sigma but value-redundancy is in terms of just sigma.
The text will go on to show that a schema is redundant for a sigma if and only if it is value-redundant. So to determine redundancy, instead of having to (expensively) calculate the closure of sigma we can just use sigma per value-redundancy (in a less expensive way).
I have a R= {A,B,C,D,E,F,G,H,I,J,K} F={ABGH->IJKF,JIGH->ABF, A->CDE} I need to find all minimal candidate keys of R, and How to normalize R to BCNF
I got the following answers so far:
ABGH, GHJIK and AGHIJK.
But I was check my answer from this site :checked site
I don't know why 'K' is not part of the answer and I am not sure if my answers were correct. Thanks!
There are two candidate keys of R: {ABGH} and {GHIJ}.
{GHJIK} is not a candidate key, but if it were, then {AGHIJK} would not be a minimal key.
The attribute K isn't part of the two candidate keys, because the closure of {ABGH} contains K, and the closure of {GHIJ} contains K. For example, for {ABGH} . . .
ABGH->ABGH (trivial)
ABGH->IJKF (given), therefore
ABGH->ABGHIJKF
A->CDE (given), therefore
ABGH->ABCDEGHIJKF, or in alpha order
ABGH->ABCDEFGHIJK
BCNF decomposition will be
R1 (ACDE) nd R2(ABFGHIJK)
because in R (A--> CDE) is partial dependency. So we decompose it in R1 where A is ck and R2 where ABGH and GHIJ are candidate keys.
Any no. of attributes added to candidate key forms a super key. We can also thus say that candidate key is the minimal super key. Here ABGH and GHIJ can determine all other attributes of the relation so they become candidate keys hence GHIJK becomes a super key (due to addition of K) and not a candidate key and on similar principles AGHIJK (due to addition of A and K) also becomes a super key as it has GHIJ which is a candidate key.