Does union of candidate keys together form a candidate key? - database

Relation R consists of columns {A,B,C,D}. A uniquely defines a tuple. So does B. A and B are candidate keys, since they are minimal. What about a set {A,B}? {A,B} together uniquely defines a tuple, but it is not minimal.
What is the term for {A,B}. Usually non-minimal candidate keys are called super keys. Is there a special name for a union of candidate keys?
EDIT:
Excuse me for imprecise question. It can indeed be clearer. As far as I understand, key == candidate key == minimal set of attributes that uniquely define a tuple.

The union of candidate keys K1 and K2 yields a candidate key iff K1=K2, that is, if they are in fact the very same key. In all other cases, it will by definition yield a (super)key that isn't irreducible, and if your definition of "candidate key" is that it is an irreducible (super)key then (the result of) that union is obviously no longer a candidate key.
As for keys terminology, I think most respectable (there are others too) textbooks stick to the convention :
"superkey" = just any key
"candidate key" = irreducible (=minimal) superkey
non-minimal superkey = "proper superkey" (and not just "superkey" as you stated)
"key" is supposed to be used as synonym for "candidate key" but the linguistics of the word cause it to often also be used with the meaning of "just any key". Beware !
And no, I don't think there is a special term for the particular kind of proper superkey that happens to be a union of two (or more) candidate keys. There is no useful purpose in ever knowing such a thing about a key.

Candidate key:
A candidate key is a combination of attributes that can be uniquely used to identify a database record without referring to any other data.
The word Candidate actually means that the keys are candidates for Primary key selection, so it is clear that yes it's up-to you which candidate key or combination of candidate keys you want to qualify for Primary key.

Related

Is Composite key a candidate key

Is composite key a subset of a pool of candidate keys? If it is a subset then why there is a new terminology for composite key, if candidate key can also formed with multiple columns.
Update
Simple Key - A simple key is one that has only one attribute.
Candidate Key - a candidate key of a relation is a minimal super key for that relation; that is, a set of attributes such that:
a. The relation does not have two distinct tuples (i.e. rows or records in common database language) with the same values for these attributes (which means that the set of attributes is a super key)
b. There is no proper subset of these attributes for which (1) holds (which means that the set is minimal).
I would like to insist the point that candidate keys are minimal super key.
Compound key - a compound key is a key that consists of two or more attributes that uniquely identify an entity occurrence.
A composite key contains at least one compound key and one more attribute. Composite keys may also include simple keys and non-key attributes.
Now composite key does not satisfy the condition of candidate key of minimal super key.
I may be wrong but please help me in understanding this concept.

Can trivial superkey be considered as candidate key?

Suppose relation R(A,B,C,D) exists with no functional dependency. So what should be considered as its candidate key? Clearly any individual attribute or proper subset of all attributes cannot be a candidate key because by no means they can identify non prime attributes. So can ABCD be considered as candidate key? Or this relation will not have any candidate key?
Suppose relation R(A,B,C,D) exists with no functional dependency. So can ABCD be considered as candidate key?
Yes, the key1 is comprised from all attributes together.
This is quite rare in practice, though. It mostly happens with junction/link tables that implement many-to-many (or many-to-many-to-many etc.) relationship.
Or this relation will not have any candidate key?
A relation must have at least one key, otherwise it's not a relation2.
Relation is a set, and any given object either belongs to a set or doesn't - it cannot belong multiple times (unlike for multiset). Without at least one key, the same tuple would be able to belong multiple times.
1 Just saying "key" is synonymous with "candidate key".
2 At the very least, all attributes, taken together, can be considered a key (as in your case).

why superkey is required when we can identify a tuple uniquely through primary key?

Defination of Superkey and Primary key in wikipedia
A superkey is a set of attributes within a table whose values can be used to uniquely identify a tuple.
and
The primary key has to consist of characteristics that cannot be duplicated by any other row. The primary key may consist of a single attribute or a multiple attributes in combination.
I've gone through many books and surfed on internet but what i found in them is what is primarykey and what is superkey.
But what i want to know is why superkey is required when we can identify a tuple uniquely through primarykey ?
Superkeys are defined for conceptual completeness. You never need a superkey for reference purposes. A reference to a primary key will do just fine.
The concept of superkeys can be useful when you are analyzing a body of data in order to discover all the functional dependencies in it.
Once you have discovered a key, the next question is whether or not it is a superkey. If is is, you turn your attention to the candidate key contained in the superkey.
Let's define what these terms mean in the first place:
A "superkey" is any set of attributes that, when taken together, uniquely identify rows in the table.
A minimal1 superkey is called "candidate key", or just "key".
All keys in the same table are logically equivalent, but for historical and practical reasons we choose one of them and call it "primary", while the remaining are "alternate" keys.
So, every primary key is key, but not every key is primary. Every key is superkey, but not every superkey is key.
Constraints that physically enforce keys in the database are: PRIMARY KEY constraint (for primary key) and UNIQUE constraint (for alternate key). These constraints should not be created on all superkeys, only on keys.
It is not unusual to have multiple keys in the same table, depending on the nature of your data. For example, a USER table might have unique USER_ID and unique USER_NAME. Since both of them need to be unique on their own, you must create2 both keys, even though only one of them is strictly needed for identification.
1 That is, a superkey that would stop being unique (and therefore, being a superkey) if any of the attributes were removed from it.
2 I.e. create PRIMARY KEY or UNIQUE constraint.
A word key is usually a short for a candidate key.
Superkey means a super-set of a key (key attributes and some more).
Irreducible superkey is called a candidate key. (Irreducible means that if you remove one attribute, it is not a key any more); in general, there is more than one candidate key for a given relation (actually a relational variable).
One candidate key that designer choses to prefer (for some reason) is called the primary key.
This was on a logical level, keys are defined for relational variables, so called relvars.
In physical implementation:
Relvar maps to a table.
Primary key to the primary key of the table.
Other candidate keys (except PK) map to alternate keys (unique not null).
A primary key is a superkey. Having only one such key constraint and only one way to identify tuples isn't necessarily sufficient.
Firstly, the relational model's versatility derives very much from the fact that it does not predetermine how data can or should be accessed in a table. A user or application is free to query a table based on whatever set of attributes may be necessary or convenient at the time. There is no obligation to use a "primary" key, which may or may not be relevant for some queries.
Secondly, uniqueness constraints (usually on candidate keys) are a data integrity feature. They guarantee data isn't duplicated in the key attributes. That kind of constraint is often useful on more than one set of attributes where business rules dictate that things should be unique. Uniqueness of one thing alone obviously doesn't guarantee uniqueness of another thing.
Thirdly, the query optimiser can take advantage of any and all keys as a way of optimising data access through query rewrites. From the optimiser's point of view the more keys it has to work with in a table the better.
I think superkey is just part of the relational algebra abstraction - your primary key is (likely) to be the minimal superkey but you might have other superkeys whereas you only have one primary key.

primary key in a entity

If there is no unique column can identify each row in the table,
then my primary key will be at least a set of two fields.
Is that correct?
If it is correct,then when I draw the Relationship Diagram, I have to underline the two attributes that formed the primary key?
Thankyou
Here is some terminology:
A superkey is a set of columns that, taken together, uniquely identify rows.
A candidate key (or just: "key") is a minimal1 superkey. Sometimes a key contains just one column, sometimes it contains several (in which case it is called "composite").
For practical reasons, we classify keys as either primary or alternate. One table has one primary key and zero or more alternate keys.
A key is "natural" if it arises from the intrinsic properties of data. In other words, it "means" something.
A key is "surrogate" if it doesn't have any meaning by itself - it is there only for identification purposes. It's typically implemented as an auto-incrementing integer, but there may be other strategies such as GUIDs (useful for replication). It is quite common for natural keys to be composite, but that almost never happens for surrogates.
If there are no "obvious" natural keys, the whole row can always act as a key2. However, this is rarely practical and in such cases you'll typically introduce a surrogate key just for the purpose of identifying rows.
Sometimes, but not always, it is useful to introduce a surrogate in addition to the existing natural key(s).
An ER diagram will clearly identify the PK3, whether it is natural or surrogate and whether it is composite or not. How exactly this will look like depends on a notation being used, but PK will typically be drawn in a graphically distinct manner and possibly prefixed with "PK".
1 I.e. if you were to remove any column from it, it would no longer be unique.
2 A database table is a physical representation of the mathematical concept of "relation". Since relation is set, there is no purpose in having two identical rows, so at the very least the whole row must be unique (an element is either in the set or isn't - it cannot be "twice" in the set, as opposed to multiset).
3 Assuming it not just entity-level so no attributes are show at all.
You are correct, after a fashion. Technically, a primary key and a unique key can be two distinct things. You can have a primary key on a table or entity uniquely identifying that entity and also. On the same table, you can have a unique key constraint which can then be used to ensure that no two rows, according to criteria chosen by you, end up having the same property. So you can have both a primary key and a unique constraint on the same table. Simply have a primary key column that will be autogenerated in your DB and then pick the two columns in your table that you want to use to enforce the unique key constraint
If you don't have primary key you can identify your datas but it's not performant.
And as best practise you use primary on your table.
The preference is to use auto increment column as primary key

Normalisation--2NF vs 3NF

We say 2NF is "the whole key" and 3NF "nothing but the key".
Referencing this answer by Smashery:
What are database normal forms and can you give examples?
The example used for 3NF is exactly the same as 2NF--it's a field which is dependent on only one key attribute. How is the example for 3NF different from the one for 2NF?
Suppose that some relation satisifies a non-trivial functional dependency of the form A->B, where B is a nonprime attribute.
2NF is violated if A is not a superkey but is a proper subset of a candidate key
3NF is violated if A is not a superkey
You have spotted that the 3NF requirement is just a special case (but not really so special) of the 2NF requirement. 2NF in itself is not very important. The important issue is whether A is a superkey, not whether A just happens to be some part of a candidate key.
Since you ask very specific question about an answer for existing so question here is an explanation of that (and basically I'll say what dportas already said in his answer, but in more words).
The examples of design that is not in 2NF and not in 3NF are not the same.
Yes, the dependency in both cases is on a single field.
However, in non 2NF example:
dependency is on the part of the primary key
while in non 3NF example (which is in 2NF):
dependency is on a field that is not a part of the primary key (and also notice that in that example it does satisfy 2NF; this is to show that even if you check for 2NF you should also check for 3NF)
In both cases to normalize you would create additional table which would not exhibit update anomalies (example of update anomaly: in 2NF example, what happens if you update Coursename for IT101|2009-2, but not for IT101|2009-1? You get inconsistent=meaningless=unusable data).
So, if you memorize the key, the whole key and nothing but the key, which covers both 2NF and 3NF, that should work for you in practice when normalizing. The distinction between 2NF and 3NF might seem subtle to you (question if in the additional dependency the attribute(s) on which the data is dependent are part of candidate key or not) - and, well, it is - so just accept it.
2NF allows non-prime attributes to be functionally dependent on non-prime attributes
but
3NF allows non-prime attributes to be functionally dependent only on super key
Thus,when a table is in 3NF it is in 2NF and 3NF is stricter than 2NF
Hope this helps...
You have achieved the 3rd NF when there are no relations between the key and other columns that don't depend on it.
Not sure my professor would have said that like this but this is what it is.
If you're "in the field". Forget about the definitions. Look for "best practices". One is DRY : Don't Repeat Yourself.
If you follow that principle, you already master everything you need for NF.
Here is an example.
Your table has the following schema:
PERSONS : id, name, age, car make, car model
Age and name are related to the person entry (=> id) but the model depends to the car and not the person.
Then, you would split it in two tables:
PERSONS : id, name, age, car_models_id (references CAR_MODELS.id)
CAR_MODELS : id, name, car_makes_id (references CAR_MAKES.id)
CAR_MAKES : id, name
You can have replication in 2FN but not in 3FN anymore.
Normalization is all about non-replication, consistency, and from another point of view foreign keys and JOINs.
The more normalized the better for data but not for performance nor understanding if it gets really too complicated.
2NF follows the partial dependency whereas 3NF follows the transitive functional dependency. It is important to know that the 3NF must be in 2NF and support transitive functional dependency.
First, we have to know the tools we work with:
candidate key attribute;
non candidate key attribute;
partial dependency;
full dependency;
Candidate Key Attribute
A candidate key attribute is any column or combination of columns that can be/form primary key. You can have many candidate keys, but you will pick only one of these to be primary key. Still, any candidate key attribute is important in 2NF. No need to be primary key, or any key, it is enough to be a candidate key attribute. 2NF refers to CANDIDATE KEY. Those that say key or primary key instead of "candidate key" add to the confusion.
Non Candidate Key Attribute
Any column that can't be primary key and can't be part of the primary key.
Partial Dependency
Partial dependency arrives when there is a candidate key formed by MORE THAN ONE column, AND a non candidate key attribute depends only on A column that constitutes the candidate key.
Full Dependency
Any non candidate key attribute, if depends on a candidate key, then depends on the WHOLE candidate key. If the candidate key is formed by more than one column, then the dependent column must depend on any column that forms the candidate key.
Now you have the tools to understand 2NF and 3NF.
2NF does not allow partial dependency. If you find a non candidate key attribute that is partially dependent on a candidate key attribute, you must beak partial dependency to make it full dependency. So 2NF allows a non candidate key attribute to be full dependent on a candidate key attribute that is not primary key. It is just a possible primary key, if you pick it, but you are not forced to pick it. 2NF is compliant only by that.
Let's say you have it in 2NF. All non candidate key attributes are full dependent on candidate key attributes. But a non candidate key attribute is full dependent on a candidate key attribute that you did not pick it to be primary key. 3NF do not allow it. All full dependencies must be with primary key (at this point you picked a primary key already).

Resources