I am working with a relational database's set of attributes and set of functional dependencies and have a specific question about which keys would be considered candidate keys of this schema.
The set of attributes I am working with is:
R = (A, B, C, D, E, F, G, H)
And the set of functional dependencies are:
F = { AC -> B, AB -> C, AD -> E, C -> D, BC -> A, E -> G, ABE -> D, FG -> E}
So here's what I am trying to figure out: Would this set of attributes have any candidate keys since H is not determined/mentioned at all in the set of functional dependencies?
By definition, candidate keys determine everything else, correct? If H is not determined by anything but itself, would there still be any candidate keys in this set?
Any insight is appreciated. Thanks!
Recall (Wikipedia) that
In the relational model of databases, a candidate key of a relation is
a minimal superkey for that relation; that is, a set of attributes
such that the relation does not have two distinct tuples (i.e. rows or
records in common database language) with the same values for these
attributes (which means that the set of attributes is a superkey)
there is no proper subset of these attributes for which (1) holds
(which means that the set is minimal).
Hence,
So here's what I am trying to figure out: Would this set of attributes have any candidate keys since H is not determined/mentioned at all in the set of functional dependencies?
This simply means that H will be contained in every candidate key R might have. For instance, ACFH is a candidate key. You can infer B because of AC->B, D because of C->D, E because of AD->E, and G because of E->G. On the other hand, you cannot infer F from ACH, H from ACF, C from AFH and A from CFH.
Related
If I Have R(E, F, G, H), what would be the candidate keys from these functional dependencies?
FD1: EF -> G
FD2: EF -> H
FD3: G -> E
FD4: H -> F
My thought process was that EF would be considered a candidate key, since EF -> G and EF -> H, therefore EF+ = {E, F, G, H}. Could I say the same in saying that GH is also a candidate key, since G -> E, H -> F, therefore GH -> EF and GH+ = {E, F, G, H}? Would there be any other candidate keys?
The schema has four candidate keys: EF, EH, FG, GH. You can easily verify this fact by computing the closure of each pair of attributes, and noting that it contains all the attributes.
The question is naturally how to find them. The trivial method is simply to try the closure of all the subsets of attributes of the relation, but this is obviously inefficient, being an exponential process.
There are more efficient algorithms to find all the candidate keys, but they are quite complex. There are simple heuristics that can help in reducing the complexity of the solution, without using a formal algorithm.
First, you should start from a canonical cover, otherwise these heuristics cannot be applied (in your example you have already a canonical cover). The first step is that you can exclude any attribute that appears only in the right hand sides of the dependencies (not in this case), and consider that all the attributes appearing only in left hand sides must be always part of any key (also not in this case).
Then, you can start from the left hand sides of the dependencies, and compute their closures to see if those sets of attributes can determine all the others. If this is not the case, you can add the other attributes, one at time, and again compute the closure of the resulting set, stopping considering those attributes when you have found a key or the set includes a subset already considered.
For instance, from EF you have found that you can determine all the other attributes, so this is a candidate key. Then, considering G, you can add E, noting that EG+ = EG, so this is not a candidate key, then add H, noting that GH+ = EFGH, so this is a candidate key, and finally add F, finding that FG is a candidate key. Of course, when a set of attributes is a candidate key you do not add to it other attributes. Another set of tests starts with H, first HE (which produces a candidate key), then HF, which do not produce a candidate key. At this point we should check if adding an attribute to EG or to HF we obtain a candidate key, but we can safely stop here since we will obtain just a superset of a set already considered (like EGF, for instance, that contains GF).
Example:
Let R = (A, B, C, D)
Let F = {C -> AD, AB -> C}
Then how can I find the candidate keys?
The answer is {AB, BC}
Why?
Given a relation schema R with a set of attributes T and a non-empty set of non-trivial functional dependencies F describing a certain set of constraints that are assumed to hold in that schema:
Every attribute that does not appear in the right part of a FD in F must be present in any candidate key.
Every attribute that does not appear in the left part of a FD in F cannot be present in any candidate key.
To find all the candidate keys, for all the other attributes, you should try to add to the attributes of 1 above every possible combination of them, and see if the closure determines all the attributes of the relation (and such that you cannot remove any attribute from the combination without losing this property).
Note that, if the set F is empty, the only candidate key is constituted by all the attributes T.
In practice there are algorithms that can be relatively efficient (since the problem of finding all the keys is in the general case exponential).
A simple approach is to start from a canonical cover of the functional dependencies, in this case for instance from:
{ A B → C
C → A
C → D }
and after finding the attributes that must be present in any candidate key (in this case B), try to add to them the left hand side of the dependencies (in this case both AB, that is A, and C) (in any order, and possibly combining them) and compute the closure to see if they determine all the attributes. When you discover that some set of attributes determines all the relation attributes, you have found a candidate key (and it is not necessary to add other attributes to it). In your example:
(A B)+ = A B C D
(B C)+ = A B C D
So A B and B C are candidate keys (since you cannot remove any attribute to both of them without losing the property of determining all the other attributes). And since there are no other attributes (a part from D that cannot be present in a candidate key), you know that you have found all the candidate keys.
The relation R=(A,B,C,D,E) and functional dependencies F are given as follows:
F={A->BC, CD->E, B->D, E->A}
E, BC and CD can be a candidate keys, but B cannot.
Anyone could point me how this fact is calculated? I google it but couldn't understand more as what I known before.
You can find all the dependent attributes of a given set of attributes by computing the closure of its functional dependencies. Let me demonstrate:
A -> ABC -> ABCD -> ABCDE
A determines BC (given) as well as itself (trivially) therefore A -> ABC. Add the fact that B -> D to get ABC -> ABCD. Finally, add CD -> E to get ABCD -> ABCDE. We stop here because we've determined the whole relation, therefore A is a candidate key.
You should verify that, starting from E, BC and CD, you can indeed determine the whole relation.
Starting from B, we get:
B -> BD
and that's it. The rest of the relation can't be determined from BD, so it's not a candidate key.
A more visual way of doing it is to sketch the functional dependencies:
Starting from any set of attributes, try finding a path to every other attribute by following the arrows. You can only get to E if you start at E or visited both C and D.
From B, you can reach D, but without C, you're not allowed to go to E, which also excludes A. So B can't be a candidate key.
Consider R(A,B,C,D,E)
F = {BC->AE, A->D, D->C, ABD->E}.
I need to find all candidate key of the schema.
I know that BA,BC,BD are the keys, but i want to know how do discover them.
I saw some answers in candidate keys from functional dependencies = but i didn't fully understand them.
form what they suggest, I got L={B}, M={A,C,D}, R={E}
Now i need to add from M one at a time to L.
I start with A, i get BA. So BA->A, BA->B (trivial) and because A->D so BA->D and because D->C we get BA->C.
But, how we get E?
adapting the answer from https://stackoverflow.com/a/14595217/3591273
Since we have the functional dependencies: BC->AE, A->D, D->C, ABD->E, we have the following superkeys:
ABCDE (All attributes is always a super key)
ABCD (We can get attribute E through ABD -> E)
ABC (Just add D through A -> D)
ABD (Just add C through D -> C)
AB (We can get D through A -> D, and then we can get C through D -> C)
BC (We can get E through BC -> E, and then we can get C through D -> C)
BD (We can get C through D -> C, and then we can get AE through BC -> AE)
(One trick here to realize, is that since B never appears on the right side of a functional dependency, every key must include B, ie key B is independent and cannot be derived from other keys)
Now that we have all our super keys, we can see that only the last
three are candidate keys. Since the first four can all be trimmed
down. But we cannot take any attributes away from the last three
superkeys and still have them remain a superkey.
so the minimal keys are AB, BC, BD
update
this was a reduction approach, i.e succesively reduce the trivial superkey by use of functional dependencies, but one can take the opposite road and use an augment approach, i.e start with single trivial keys and augment them with other keys wrt dependency relations untill keys become superflous
This is more of a concept check than anything, but suppose I have a relation R on attributes ABCD with the functional dependencies B -> ACD and C -> D. The solitary key for this relation is B, and a superkey for this relation is BC, correct?
The only candidate key in your example is indeed B. Any collection of attributes containing B will be a superkey, e.g., B itself, AB, BC or ABD.
candidate key is B.
As all the attributes can be derived from B, B becomes superkey..