Can trivial superkey be considered as candidate key? - database

Suppose relation R(A,B,C,D) exists with no functional dependency. So what should be considered as its candidate key? Clearly any individual attribute or proper subset of all attributes cannot be a candidate key because by no means they can identify non prime attributes. So can ABCD be considered as candidate key? Or this relation will not have any candidate key?

Suppose relation R(A,B,C,D) exists with no functional dependency. So can ABCD be considered as candidate key?
Yes, the key1 is comprised from all attributes together.
This is quite rare in practice, though. It mostly happens with junction/link tables that implement many-to-many (or many-to-many-to-many etc.) relationship.
Or this relation will not have any candidate key?
A relation must have at least one key, otherwise it's not a relation2.
Relation is a set, and any given object either belongs to a set or doesn't - it cannot belong multiple times (unlike for multiset). Without at least one key, the same tuple would be able to belong multiple times.
1 Just saying "key" is synonymous with "candidate key".
2 At the very least, all attributes, taken together, can be considered a key (as in your case).

Related

Does union of candidate keys together form a candidate key?

Relation R consists of columns {A,B,C,D}. A uniquely defines a tuple. So does B. A and B are candidate keys, since they are minimal. What about a set {A,B}? {A,B} together uniquely defines a tuple, but it is not minimal.
What is the term for {A,B}. Usually non-minimal candidate keys are called super keys. Is there a special name for a union of candidate keys?
EDIT:
Excuse me for imprecise question. It can indeed be clearer. As far as I understand, key == candidate key == minimal set of attributes that uniquely define a tuple.
The union of candidate keys K1 and K2 yields a candidate key iff K1=K2, that is, if they are in fact the very same key. In all other cases, it will by definition yield a (super)key that isn't irreducible, and if your definition of "candidate key" is that it is an irreducible (super)key then (the result of) that union is obviously no longer a candidate key.
As for keys terminology, I think most respectable (there are others too) textbooks stick to the convention :
"superkey" = just any key
"candidate key" = irreducible (=minimal) superkey
non-minimal superkey = "proper superkey" (and not just "superkey" as you stated)
"key" is supposed to be used as synonym for "candidate key" but the linguistics of the word cause it to often also be used with the meaning of "just any key". Beware !
And no, I don't think there is a special term for the particular kind of proper superkey that happens to be a union of two (or more) candidate keys. There is no useful purpose in ever knowing such a thing about a key.
Candidate key:
A candidate key is a combination of attributes that can be uniquely used to identify a database record without referring to any other data.
The word Candidate actually means that the keys are candidates for Primary key selection, so it is clear that yes it's up-to you which candidate key or combination of candidate keys you want to qualify for Primary key.

Is this table normalized to 2NF?

I'm studying normalization and was wondering if this table could be considered to be normalized to 2NF?
Yes, it is:
☑ 1NF: A relation is in first normal form if and only if the domain of each attribute contains only atomic (indivisible) values, and the value of each attribute contains only a single value from that domain.
Since the names of the lakes/creeks, though containing the fishes' names, are not dividable, they are atomic in themselves.
In other words: the first words of the lake/creek names alone are not sufficient to identify the lakes/creeks properly and so are the second words.
☑ 2NF: [...] a relation is in 2NF if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the relation. A non-prime attribute of a relation is an attribute that is not a part of any candidate key of the relation.
There is no proper subset of the PK attribute Fish , since it is just one (apart from the empty set {}, see comments). Best Lake is non-prime since it doesn't belong to the PK and it doesn't depend on a subset (because there is no proper one apart from the empty one) but on the whole PK.

Can a table be in 3NF with no primary keys?

1.
A table is automatically in 3NF if one of the following holds:
(i) If a relation consists of two attributes.
(ii) If 2NF table consists of only one non key attribute.
2.
If X → A is a dependency, then the table is in 3NF, if one of the following conditions exists:
(i) If X is a superkey
(ii) If A is a part of superkey
I got the above claims from this site.
I think that in both the claims, 2nd subpoint is wrong.
The first one says that a table in 2NF will be in 3NF if we have all non-key attributes and the table is in 2NF.
Consider the example R(A,B,C) with dependency A->B.
Here we have no candidate key, so all attributes are non-prime attributes and the relation is not in 3NF but in 2NF.
The second one says that for a dependency of the form X->A if A is part of a super key then it's in 3NF.
Consider the example R(A,B,C) with dependencies A->B, B->C . Here a CK is {A}. Now one of the super keys can be AC and the RHS of FD B->C contains part of AC but still the above relation R is not in 3NF.
I think it should be A should be part of a candidate key and not super key.
Am I correct?
Also can a particular relation be in 1NF, 3NF or 2NF if there are no functional dependencies present?
A CK (candidate key) is a superkey that contains no smaller superkey. A superkey is a unique set of attributes. A relation is a set of tuples. So every relation has a superkey, the set of all attributes. So it has at least one CK.
A FD (functional dependency) holds by definition when each value of a determining set of attributes appears always with the same value for its determined set. Every relation value or variable satisfies "trivial" FDs, the ones where the determined set is a subset of the determining set. Every set of attributes determines {}. So every relation satisfies at least one FD. However, the correct forms of definitions typically specifically talk about non-trivial FDs. Don't use the web, use textbooks, of which dozens are free online, although not all are well-written. Many textbooks also forget about FDs where the determinant and/or determined set is {}.
Your first point is not a correct definition of 3NF. Since its phrased "if..." instead of "if and only if", maybe it's not trying to be a definition. However, it is still wrong. (i) is wrong because a relation with two attributes is not in 3NF if one is a CK and the other has the same value in every tuple, ie it is determined by {}.
Similarly the second point is not a proper definition and also even if you treat it as only a consequence of 3NF (if...) it's false. It would be a definition if it used if and only if and talked about an FD that holds and it said it was a non-trivial FD and some other things were fixed.
Since those are neither correct definitions nor correct implications, there's a unlimited number of ways to disprove them. Read a book (or my posts) and get correct definitions.
Some comments re your reasoning:
First one says that, a table in 2NF will be in 3NF if we have all non key attributes and table is in 2NF.
I have no idea why you think that.
Here we have no candidate key
There's always one or more CKs. You need to read a definition of CK. There are also non-brute-force algorithms for finding them all.
Second one says that, for the dependency of form X->A if A is part of super key then it's in 3NF.
I have no idea why you think that.
A should be part of candidate key and not super key.
A correct defintion like the second point does normally say "... or (ii) A-X is part of a CK". But I can't follow your reasoning.
Sound reasoning involves starting from assumptions and writing new statements that we know are true because we applied a definition, a previously proved statement (theorem) or a sound rule of reasoning, eg from 'A implies B' and 'A' we can derive 'B'. You seem to need to read about how to do that.

Normalization Dependencies

Im just trying to make sure that im thinking of it the right way
1)full dependencies are when one or more primary keys determine another attribute
2)partial dependencies are when one of the primary keys determines another attribute or attributes
3)transitive dependencies are when a nonkey attribute determines another attribute
am i thinking of it right?
This answer is directly from my CS course and obtained from the Connolly and Begg text book.
Full Functional Dependencies
Identify the candidate keys (here, propertyNo, iDate and pAddress). This is because any combination of those 3 can allow you to find what the other attributes are for a given tuple (I can find the staffNo that did the inspection given those three things, I can find the carReg the staffNo used given those 3 things etc.). But note, you need all of those 3 to find the other attributes, not just a subset. Full dependencies always relate to non-candidate keys depending on candidate keys, either depending on all or depending on some.
Partial Dependencies
Given those three candidate keys, look within the candidate keys. Is there any subset(s) of the candidate key which is dependent on the other? Yes, it is pAddress. Given a propertyNo, you can figure out what the address of the property. Then look outside of the candidate keys. Is there any of these keys that depend on only parts of the candidate key, not all components? In this case there are not. So partial dependencies are always dependencies within the candidate keys or dependencies of non-candidate keys on only parts of the candidate keys rather than all components
Transitive Dependencies
Now, look at the non-candidate keys (staffNo, comments, iTime (inspection time), sName, carReg). Within those, is there anything that is functionally dependent on the other? Yes, it is sName - given a staffNo, you can figure out the name of the staff member. But staffNo is functionally dependent on the 3 candidate keys. So by transitivity, propertyNo + iDate + pAddress -> staffNo -> sName, so sName is transitively dependent on staffNo. Transitive dependencies always relate to attributes outside of candidate keys.
Not quite. It would help to be more exact in your terminology: when you say things like "one or more primary keys" you (presumably) really mean "one or more of the columns of the primary key"?
The distinction between a full and a partial dependency only arises when a key consists of more than one column (a composite key):
1) Full dependencies are when the full key is required (all columns of the key) to determine another attribute.
2) Partial dependencies are when the key is composite and some but not all of the columns of the key determine another attribute. (This may still be more than one column.)
3) Transitive dependencies are as you said.
Fully dependent means dependent on all the attributes in question, usually meaning all the attributes of a candidate key. It doesn't have to be a key designated as "primary" because primary keys don't play any special role in dependency theory and normalization.
Partially dependent means dependent on a proper subset of those attributes, usually meaning a proper subset of some candidate key.
Depending on the context, transitive dependency can mean either one of the following:
(1) a dependency of the form A->B, B->C
(2) a dependency of the form A->B, B->C where B isn't a superkey
Almost always the term transitive dependency is used when referring to the situation described by (2) and has become virtually synonymous with that sense even though (1) is the more formally correct meaning.
Partial Dependency: Where an attribute in a table depends on only a part of the primary key and not on the whole key. (For detail see this link)
https://www.studytonight.com/dbms/second-normal-form.php
Transitive Dependency: When a non-prime attribute depends on other non-prime attributes rather than depending upon the prime attributes or primary key. (For detail see this link)
https://www.studytonight.com/dbms/third-normal-form.php

why superkey is required when we can identify a tuple uniquely through primary key?

Defination of Superkey and Primary key in wikipedia
A superkey is a set of attributes within a table whose values can be used to uniquely identify a tuple.
and
The primary key has to consist of characteristics that cannot be duplicated by any other row. The primary key may consist of a single attribute or a multiple attributes in combination.
I've gone through many books and surfed on internet but what i found in them is what is primarykey and what is superkey.
But what i want to know is why superkey is required when we can identify a tuple uniquely through primarykey ?
Superkeys are defined for conceptual completeness. You never need a superkey for reference purposes. A reference to a primary key will do just fine.
The concept of superkeys can be useful when you are analyzing a body of data in order to discover all the functional dependencies in it.
Once you have discovered a key, the next question is whether or not it is a superkey. If is is, you turn your attention to the candidate key contained in the superkey.
Let's define what these terms mean in the first place:
A "superkey" is any set of attributes that, when taken together, uniquely identify rows in the table.
A minimal1 superkey is called "candidate key", or just "key".
All keys in the same table are logically equivalent, but for historical and practical reasons we choose one of them and call it "primary", while the remaining are "alternate" keys.
So, every primary key is key, but not every key is primary. Every key is superkey, but not every superkey is key.
Constraints that physically enforce keys in the database are: PRIMARY KEY constraint (for primary key) and UNIQUE constraint (for alternate key). These constraints should not be created on all superkeys, only on keys.
It is not unusual to have multiple keys in the same table, depending on the nature of your data. For example, a USER table might have unique USER_ID and unique USER_NAME. Since both of them need to be unique on their own, you must create2 both keys, even though only one of them is strictly needed for identification.
1 That is, a superkey that would stop being unique (and therefore, being a superkey) if any of the attributes were removed from it.
2 I.e. create PRIMARY KEY or UNIQUE constraint.
A word key is usually a short for a candidate key.
Superkey means a super-set of a key (key attributes and some more).
Irreducible superkey is called a candidate key. (Irreducible means that if you remove one attribute, it is not a key any more); in general, there is more than one candidate key for a given relation (actually a relational variable).
One candidate key that designer choses to prefer (for some reason) is called the primary key.
This was on a logical level, keys are defined for relational variables, so called relvars.
In physical implementation:
Relvar maps to a table.
Primary key to the primary key of the table.
Other candidate keys (except PK) map to alternate keys (unique not null).
A primary key is a superkey. Having only one such key constraint and only one way to identify tuples isn't necessarily sufficient.
Firstly, the relational model's versatility derives very much from the fact that it does not predetermine how data can or should be accessed in a table. A user or application is free to query a table based on whatever set of attributes may be necessary or convenient at the time. There is no obligation to use a "primary" key, which may or may not be relevant for some queries.
Secondly, uniqueness constraints (usually on candidate keys) are a data integrity feature. They guarantee data isn't duplicated in the key attributes. That kind of constraint is often useful on more than one set of attributes where business rules dictate that things should be unique. Uniqueness of one thing alone obviously doesn't guarantee uniqueness of another thing.
Thirdly, the query optimiser can take advantage of any and all keys as a way of optimising data access through query rewrites. From the optimiser's point of view the more keys it has to work with in a table the better.
I think superkey is just part of the relational algebra abstraction - your primary key is (likely) to be the minimal superkey but you might have other superkeys whereas you only have one primary key.

Resources