How can I find candidate keys? - database

Example:
Let R = (A, B, C, D)
Let F = {C -> AD, AB -> C}
Then how can I find the candidate keys?
The answer is {AB, BC}
Why?

Given a relation schema R with a set of attributes T and a non-empty set of non-trivial functional dependencies F describing a certain set of constraints that are assumed to hold in that schema:
Every attribute that does not appear in the right part of a FD in F must be present in any candidate key.
Every attribute that does not appear in the left part of a FD in F cannot be present in any candidate key.
To find all the candidate keys, for all the other attributes, you should try to add to the attributes of 1 above every possible combination of them, and see if the closure determines all the attributes of the relation (and such that you cannot remove any attribute from the combination without losing this property).
Note that, if the set F is empty, the only candidate key is constituted by all the attributes T.
In practice there are algorithms that can be relatively efficient (since the problem of finding all the keys is in the general case exponential).
A simple approach is to start from a canonical cover of the functional dependencies, in this case for instance from:
{ A B → C
C → A
C → D }
and after finding the attributes that must be present in any candidate key (in this case B), try to add to them the left hand side of the dependencies (in this case both AB, that is A, and C) (in any order, and possibly combining them) and compute the closure to see if they determine all the attributes. When you discover that some set of attributes determines all the relation attributes, you have found a candidate key (and it is not necessary to add other attributes to it). In your example:
(A B)+ = A B C D
(B C)+ = A B C D
So A B and B C are candidate keys (since you cannot remove any attribute to both of them without losing the property of determining all the other attributes). And since there are no other attributes (a part from D that cannot be present in a candidate key), you know that you have found all the candidate keys.

Related

find candidate key from functional dependency

I need help finding the candidate keys from the given relation:
R (A,B,C,D,E) with FD:
A -> BE
B -> BE
B -> D
STEPS:
I know all the attributes can identify themselves so: A,B,C,D,E -> {ABCDE}
Now I know A -> BE, so i can cross out BE and get this: ACD -> {ABCDE}
The prime attributes are (A,C,D)
D would now be replaced by B giving me my other candidate key of (ABC)
I have the answer has ACD and ABC but apparently, it's AC. What am I doing wrong?
A simple reasoning is the follow.
A and C must be present in any candidate key, since they never appear in the right part of some FD.
So, let’s see if they are already a candidate key by computing their closure, AC*.
By applying the rules for computing the closure of a set of attributes, we can see easily that AC* is equal to ABCDE, so AC is a candidate key. The only other attribute that appears in the left side of a FD is B, so we should check that no other candidate key contains it. But noting that the attributes AC must be present in any key and already they form a candidate key, so adding any attribute to them produces only superkeys, we can conclude that this is the only candidate key.

Determining Candidate Keys from Functional Dependencies

If I Have R(E, F, G, H), what would be the candidate keys from these functional dependencies?
FD1: EF -> G
FD2: EF -> H
FD3: G -> E
FD4: H -> F
My thought process was that EF would be considered a candidate key, since EF -> G and EF -> H, therefore EF+ = {E, F, G, H}. Could I say the same in saying that GH is also a candidate key, since G -> E, H -> F, therefore GH -> EF and GH+ = {E, F, G, H}? Would there be any other candidate keys?
The schema has four candidate keys: EF, EH, FG, GH. You can easily verify this fact by computing the closure of each pair of attributes, and noting that it contains all the attributes.
The question is naturally how to find them. The trivial method is simply to try the closure of all the subsets of attributes of the relation, but this is obviously inefficient, being an exponential process.
There are more efficient algorithms to find all the candidate keys, but they are quite complex. There are simple heuristics that can help in reducing the complexity of the solution, without using a formal algorithm.
First, you should start from a canonical cover, otherwise these heuristics cannot be applied (in your example you have already a canonical cover). The first step is that you can exclude any attribute that appears only in the right hand sides of the dependencies (not in this case), and consider that all the attributes appearing only in left hand sides must be always part of any key (also not in this case).
Then, you can start from the left hand sides of the dependencies, and compute their closures to see if those sets of attributes can determine all the others. If this is not the case, you can add the other attributes, one at time, and again compute the closure of the resulting set, stopping considering those attributes when you have found a key or the set includes a subset already considered.
For instance, from EF you have found that you can determine all the other attributes, so this is a candidate key. Then, considering G, you can add E, noting that EG+ = EG, so this is not a candidate key, then add H, noting that GH+ = EFGH, so this is a candidate key, and finally add F, finding that FG is a candidate key. Of course, when a set of attributes is a candidate key you do not add to it other attributes. Another set of tests starts with H, first HE (which produces a candidate key), then HF, which do not produce a candidate key. At this point we should check if adding an attribute to EG or to HF we obtain a candidate key, but we can safely stop here since we will obtain just a superset of a set already considered (like EGF, for instance, that contains GF).

Double functional dependency

I have a question regarding functional dependencies.
I understand that functional dependency means that the value of an attribute can be determined by the value of another attribute.
Suppose we have this table
|A|B|C|D|
Here A and B are the primary keys.
Is it correct to say that both C and D are functionally dependent on both A and B ?
You are saying “A and B are the primary keys” but this phrase is ambiguous: you mean: “The primary key is A B” or “the are two candidate keys, A and B”? (and note that in a relation in a relational database you can have only a single primary key and many candidate keys).
Given the definition of a (candidate) key, that is that it determines all the other attributes and that you cannot remove any attribute without losing this property, in the first case you can say that:
A B -> C D
or, which is equivalent, that:
A B -> C
A B -> D
(so C e D depends on the combination of A and B), while in the second case, you have that:
A -> C D
B -> C D
or, which is equivalent, that:
A -> C
A -> D
B -> C
B -> D
(that is, C and D are functionally dependent both on A and on B).
"S (functionally) determines T" means that all appearances of a particular subtuple value for attribute set S have the same subtuple value for attribute set T. If we say an attribute X is determining or determined then it's understood that we really mean that set {X} is determining/determined.
A superkey is a set of attributes that determines every attribute. A CK (candidate key) is a superkey that contains no smaller superkey. There can be many CKs. One CK can be chosen as PK (primary key). (PKs play no role in relational theory.)
Since there can only be one PK, it's odd that you talk about a relation value or variable having more than one. Maybe you mean two CKs. Maybe you mean a 2-attribute PK.
It happens that if every subtuple value for a set of attributes appears just once then it is a superkey. (Each single-attribute superkey is a CK unless {} is the CK, which happens when the relation is limited to one tuple.) So it determines all attributes. But in general the dependencies tell us what the superkeys & CKs are.
So if each of A and B are CKs then each determines C and D, ie {C} and {D}. And if {A,B} is a PK then it determines C and D, ie {C} and {D}. It happens that if both T1 and T2 are determined by S then T1 U T2 is too. So either way, the CK(s) here determine(s) {C,D} also.
PS There is an ambiguity in English where it is not clear whether "both C and D are functionally dependent" means that C is dependent and D is dependent or that {C,D} is dependent. Similarly for "are functionally dependent on both A and B". So it is clearer to say "the set ..." rather than just using "both" and/or "and".

Understanding candidate key

Consider R(A,B,C,D,E)
F = {BC->AE, A->D, D->C, ABD->E}.
I need to find all candidate key of the schema.
I know that BA,BC,BD are the keys, but i want to know how do discover them.
I saw some answers in candidate keys from functional dependencies = but i didn't fully understand them.
form what they suggest, I got L={B}, M={A,C,D}, R={E}
Now i need to add from M one at a time to L.
I start with A, i get BA. So BA->A, BA->B (trivial) and because A->D so BA->D and because D->C we get BA->C.
But, how we get E?
adapting the answer from https://stackoverflow.com/a/14595217/3591273
Since we have the functional dependencies: BC->AE, A->D, D->C, ABD->E, we have the following superkeys:
ABCDE (All attributes is always a super key)
ABCD (We can get attribute E through ABD -> E)
ABC (Just add D through A -> D)
ABD (Just add C through D -> C)
AB (We can get D through A -> D, and then we can get C through D -> C)
BC (We can get E through BC -> E, and then we can get C through D -> C)
BD (We can get C through D -> C, and then we can get AE through BC -> AE)
(One trick here to realize, is that since B never appears on the right side of a functional dependency, every key must include B, ie key B is independent and cannot be derived from other keys)
Now that we have all our super keys, we can see that only the last
three are candidate keys. Since the first four can all be trimmed
down. But we cannot take any attributes away from the last three
superkeys and still have them remain a superkey.
so the minimal keys are AB, BC, BD
update
this was a reduction approach, i.e succesively reduce the trivial superkey by use of functional dependencies, but one can take the opposite road and use an augment approach, i.e start with single trivial keys and augment them with other keys wrt dependency relations untill keys become superflous

Database candidate keys from functional dependencies - specific technicality

I am working with a relational database's set of attributes and set of functional dependencies and have a specific question about which keys would be considered candidate keys of this schema.
The set of attributes I am working with is:
R = (A, B, C, D, E, F, G, H)
And the set of functional dependencies are:
F = { AC -> B, AB -> C, AD -> E, C -> D, BC -> A, E -> G, ABE -> D, FG -> E}
So here's what I am trying to figure out: Would this set of attributes have any candidate keys since H is not determined/mentioned at all in the set of functional dependencies?
By definition, candidate keys determine everything else, correct? If H is not determined by anything but itself, would there still be any candidate keys in this set?
Any insight is appreciated. Thanks!
Recall (Wikipedia) that
In the relational model of databases, a candidate key of a relation is
a minimal superkey for that relation; that is, a set of attributes
such that the relation does not have two distinct tuples (i.e. rows or
records in common database language) with the same values for these
attributes (which means that the set of attributes is a superkey)
there is no proper subset of these attributes for which (1) holds
(which means that the set is minimal).
Hence,
So here's what I am trying to figure out: Would this set of attributes have any candidate keys since H is not determined/mentioned at all in the set of functional dependencies?
This simply means that H will be contained in every candidate key R might have. For instance, ACFH is a candidate key. You can infer B because of AC->B, D because of C->D, E because of AD->E, and G because of E->G. On the other hand, you cannot infer F from ACH, H from ACF, C from AFH and A from CFH.

Resources