1NF Normal Form with Functional Dependecy - database

I read following example, that relation A(X,Y,Z,P,Q,R) with the following functional dependency.
why this is in 1NF?
anyone could help me?

The diagram is not normal notation. I suppose that arrows point to determined attributes of FDs. I suppose that an arrow that doesn't come from a box means a FD with just one determinant attribute and an arrow that comes from a box means a FD with the boxed attributes as determinant attributes. Find out what the diagram notation means.
If so then the functional dependencies are Y → Z, XYZ → QR and P → QRX.
To show what normal form the relation is in we need to know what definitions you were given for normal forms. It happens that this relation is not in 2nd normal form. So it isn't in any higher normal form. So it is only in 1st normal form. So the only normal form definition we need to know is the one you were given for 2NF. That definition usually involves candidate keys. If so then we need to know what definition you were given for candidate key. The definition of 2NF can involve full and partial FDs. If yours does then we need to know what definitions of full and/or partial FD you were given. Give the definitions.
The only CK is PY because it determines every other attribute but no proper subset of it does. It is the only CK because there is no other such set of attributes. To justify this we need to reference the rules you were given for deriving FDs and CKs. (Eg this includes how we went from one FD list to another.) Give the rules.
But there are then also determinants Y (from Y → Z) and P (from P → QRX) that are proper subsets of that candidate key. So each of non-prime attributes Z, Q, R and X is partially dependent on a CK. But to be in 2NF there must be no non-prime attributes partially dependent on a candidate key. Ie every non-prime attribute must be fully functionally dependent on all CKs. So A is not in 2NF. So the highest normal form it is in is 1NF.

The picture doesn't make its meaning very clear in my opinion because it seems to be mixing two different notations for functional dependencies (FDs). Any answer will depend on how you want to interpret the diagram.
I'd hazard a guess that the diagram is supposed to indicate the following set of FDs: XY->Z, Y->QR, P->QRX. If that's correct then the possible candidate key respecting that set of FDs would be {Y,P}. If my interpretation of the diagram is correct then both Y and P are determinants in their own right. Since Y and P are proper subsets of a candidate key of A we can conclude that A violates 2NF and therefore the highest normal form that A can satisfy is 1NF.
Update: Your new picture specifies some dependencies. Collecting the determinant terms together we can summarize as:
P->XQR
XY->QR
Y->Z
I assume these are supposed be the dependencies actually satisfied by A. On the left-hand side we have P, X and Y so PXY will be a superkey of A. P->X therefore the candidate key (minimal superkey) can only be PY. P->XQR and Y->Z are both FDs with determinants that are proper subsets of the candidate key (PY) and that means those dependencies both violate 2NF. Recap: 2NF prohibits any FD where the left-hand side is a proper subset of a candidate key. So 1NF is the highest normal form of A.

As per 1NF, no two rows of a relation must have repeating values and no column must have more than one value in a row. This increases redundancy as there will be columns with same data repeating in many rows.
Name ID Course
A 1 Computer
B 2 Arts
C 3 Computer
Here Course column has repeated values. But every row has no column which has 2 values. Hence it is in 1NF.
1NF has the least number of restrictions. So any other forms like 2NF, 3NF by default would also be in 1NF.
Consider this analogy
1NF = Living Beings
2NF = Mammals
3NF = Humans
All mammals/2NF are by default living beings/1NF, and so on.
To satisfy the functional dependency X → Y, it is essential that each X value be associated with only one Y value. And thus it satisfies the 1NF criteria which does not allow multiple values in a row for a column.

Related

BCNF: Looking for example that actually uses superkey instead of candidate key

The definition of the Boyce–Codd normal form states that the determinants of all non-trivial functional dependencies have to be superkeys.
All the examples for relations in BCNF I found make use of candidate keys. I am looking for an example that actually has a superkey as determinant which is not a candidate key.
I fail to come up with a relation that only uses superkeys which can't be transformed to use candidate keys.
Let's say we have a relation with an candidate key and an additional functional dependency with a superkey as determinant.
R1(A,B,C)
{A}
A,B -> C
This additional FD is redundant because it contains an candidate key that obviously detemines the other attribute (A -> C).
Trying to build another example with two candidate keys is also useless.
R2(A,B,C,D)
{A,B},{B,C}
A,B,C -> D
This has the exact same problem as above.
I am actually wondering if there even is an example without candidate keys. But why would the definition be broader than necessary? Or are the definitions equivalent as the dependencies can always be transformed?
The point is that, when defining a normal form, we must express it in a general form, as a property of all the functional dependencies holding on a certain relation.
Instead, when we reason about a particular relation schema, we usually have only a subset of all the functional dependencies (since their number can be too large, being possibly exponential with the number of attributes). The particular set of dependencies used, denoted usually by the letter F, has a special property: it is a cover of all the dependencies holding in the relation, that is from it we can derive all the dependencies of the relation by applying, in all the possible ways, a set of axioms, called Armstrong’s axioms.
F, the set of dependencies specified together with the attributes in a relational schema, can be given in different ways: for instance in an exercise they can be given as input to the exercise, in real database design they can describe a set of constraints considered important for modelling a certain real-word domain, etc.
Even if they are extracted from the knowledge about a situation to be modeled through a database, they could contain dependencies implied by others already given, or maybe can contain redundant attributes, etc.
For these reasons, it is considered an important first step in normalization to find the canonical cover of the set of dependencies given, that is a cover constituted by a set of dependencies that: a) have only a single attribute on the rigth part; b) do not have superflous attributes on the left part (i.e. attributes that can be removed maintaing the property of being a cover); c) do not have redundant dependencies (i.e. dependencies that can be derived from others through the Armstrong’s axioms).
Now let’s consider the general definition of BCNF:
A relation schema R<T,F> is in BCNF if and only if for each non-trivial dependency X → Y of F+, X is a superkey.
Note that the we are talking about the dependencies in F+, which is the closure of F, in other words, which contains all the dependencies holding in R and derived in some way from F. So if the relation R has a candidate key XK, obviously not only XK → T holds, for instance, but for all the supersets S of XK we will have that S → T holds, and so the definition of the normal form must allow those dependencies.
Now, it is possible to prove, from the general definition of BCNF, the following theorem, that in some way simplifies it (and makes efficient the test to check if a relation is already in BCNF):
Theorem: A relation schema R<T,F> is in BCNF if and only if for each non-trivial dependency X → Y of F, X is a superkey.
See the difference? We are now talking about F and not F+.
And this theorem has the following corollary:
Corollary: A relation schema R<T,F> in which F is a canonical cover, is in BCNF if and only if for each non-trivial dependency X → A of F, X is a candidate key.
Since the dependencies in a canonical cover do not have superfluous attributes, if the relation is in BCNF every determinant (left hand side of a functional dependency) is obviously a candidate key (not a generic superkey), and this explain the difference between the definition and the examples that sometimes we found on the books.

Can a table be in 3NF with no primary keys?

1.
A table is automatically in 3NF if one of the following holds:
(i) If a relation consists of two attributes.
(ii) If 2NF table consists of only one non key attribute.
2.
If X → A is a dependency, then the table is in 3NF, if one of the following conditions exists:
(i) If X is a superkey
(ii) If A is a part of superkey
I got the above claims from this site.
I think that in both the claims, 2nd subpoint is wrong.
The first one says that a table in 2NF will be in 3NF if we have all non-key attributes and the table is in 2NF.
Consider the example R(A,B,C) with dependency A->B.
Here we have no candidate key, so all attributes are non-prime attributes and the relation is not in 3NF but in 2NF.
The second one says that for a dependency of the form X->A if A is part of a super key then it's in 3NF.
Consider the example R(A,B,C) with dependencies A->B, B->C . Here a CK is {A}. Now one of the super keys can be AC and the RHS of FD B->C contains part of AC but still the above relation R is not in 3NF.
I think it should be A should be part of a candidate key and not super key.
Am I correct?
Also can a particular relation be in 1NF, 3NF or 2NF if there are no functional dependencies present?
A CK (candidate key) is a superkey that contains no smaller superkey. A superkey is a unique set of attributes. A relation is a set of tuples. So every relation has a superkey, the set of all attributes. So it has at least one CK.
A FD (functional dependency) holds by definition when each value of a determining set of attributes appears always with the same value for its determined set. Every relation value or variable satisfies "trivial" FDs, the ones where the determined set is a subset of the determining set. Every set of attributes determines {}. So every relation satisfies at least one FD. However, the correct forms of definitions typically specifically talk about non-trivial FDs. Don't use the web, use textbooks, of which dozens are free online, although not all are well-written. Many textbooks also forget about FDs where the determinant and/or determined set is {}.
Your first point is not a correct definition of 3NF. Since its phrased "if..." instead of "if and only if", maybe it's not trying to be a definition. However, it is still wrong. (i) is wrong because a relation with two attributes is not in 3NF if one is a CK and the other has the same value in every tuple, ie it is determined by {}.
Similarly the second point is not a proper definition and also even if you treat it as only a consequence of 3NF (if...) it's false. It would be a definition if it used if and only if and talked about an FD that holds and it said it was a non-trivial FD and some other things were fixed.
Since those are neither correct definitions nor correct implications, there's a unlimited number of ways to disprove them. Read a book (or my posts) and get correct definitions.
Some comments re your reasoning:
First one says that, a table in 2NF will be in 3NF if we have all non key attributes and table is in 2NF.
I have no idea why you think that.
Here we have no candidate key
There's always one or more CKs. You need to read a definition of CK. There are also non-brute-force algorithms for finding them all.
Second one says that, for the dependency of form X->A if A is part of super key then it's in 3NF.
I have no idea why you think that.
A should be part of candidate key and not super key.
A correct defintion like the second point does normally say "... or (ii) A-X is part of a CK". But I can't follow your reasoning.
Sound reasoning involves starting from assumptions and writing new statements that we know are true because we applied a definition, a previously proved statement (theorem) or a sound rule of reasoning, eg from 'A implies B' and 'A' we can derive 'B'. You seem to need to read about how to do that.

2nd normal form violation with lhs of the dependency is composite (prime and non-prime together)

I was studying functional dependencies and normalization and I've come across a question. The original question is below:
"Given the relation R = {v,w,x,y,z} and functional dependency set {v->w,y->z,yz->v,wx->z} find BCNF composition and check if dependency preservation holds."
First I tried to find minimal cover and came up with this:
Minimal Cover:
v -> w
y -> z
y -> v
wx -> z
Then I tried to found candidate keys, came up with only one candidate key:
Candidate Keys:
xy
Then I started to check normal forms:
1st Normal Form: check
2nd Normal Form:
I thought the below dependencies are violating 2nd normal form:
1) y -> z
2) y -> v
3) wx -> z
The first two were easy to solve. However, I've never seen an example of the 3rd where the left-hand side is a composite of prime and non-prime attributes. How do we solve this kind of situation? Do we make a new relation for the 3rd making w and x primary key?
If I solve that part, the 3rd and BC normal forms will be easy I guess.
Whether one considers a FD (functional dependency) to "violate 2NF" depends on one's definition of 2NF. A common definition of 2NF is, no FDs hold where a non-prime attribute is partially functionally dependent on a CK (candidate key). So are the violating FDs the ones where a non-prime attribute is partially functionally dependent on a CK? Or the ones where a non-prime attribute is functionally dependent on a proper subset of a CK, by which the preceding FDs are partial? Or both? And/or others? Or what? The fact is that it isn't individual FDs that violate NFs but the set of all FDs that hold. If you want to talk about individual FDs violating then you need to give a definition for 2NF & then give & justify a definition of violating FD based on how the definition talks about such FDs.
The following uses the 2NF definition above & talks about "bad" FDs explicitly disallowed by that definition, where a non-prime attribute is partially functionally dependent on a CK.
Those three FDs are not bad. A FD is partial when its right hand side is functionally determined by a proper/smaller subset of its left hand side. None of those three FDs are partial dependencies on a CK (candidate key). None of them are even partial, because none has a right hand side that is determined by a subset of the left hand side (determinant). And none of them are even on a CK, because none of them have a CK as their left hand side.
You might consider the first two to "violate 2NF" per a 2NF definition that there are no FDs with left side a proper subset of a CK & right side a non-prime attribute. That definition explicitly disallows those FDs. So we do not have 2NF.
However the FDs xy->z & xy->v are partial, because proper/smaller subsets of xy determine z & v. And they are bad: xy is a CK and Z & v are non-prime attributes so both have a non-prime attribute partially dependent on a CK. So we do not have 2NF.
wx->z isn't bad. And it doesn't "violate 2NF" per a 2NF definition that there are no FDs with left side a proper subset of a CK & right side a non-prime attribute.
It doesn't matter whether "the left-hand side is a composite of prime and non-prime attributes". What matters is what is mentioned in your definitions. (It happens that you will never see such "an example" of a bad or "violating" FD. Because both those require left-hand sides with only CK attributes.)
Read some academic definitions for partial FD & 2NF. (Many textbooks/presentations/courses are free online.) Memorize and apply definitions, theorems and algorithms exactly. You seem to not understand numerous things:
Being in BCNF implies being in all lower NFs. Getting to BCNF does not require going through lower NFs.
Examples of decompositions you have seen are not presentations of decomposition algorithms.
We don't normalize via successive NFs. We use an algorithm for the NF we want. (Going through lower NFs can even mean good higher-NF designs become unavailable.)
When some FDs hold, all the ones implied by them by Armstrong's axioms also hold.
To determine CKs & NFs it's not enough to know that some FDs hold, we need to know what FDs hold & what FDs don't hold. You need to know a closure or cover of FDs.
Each time we decompose we get new relations & sets of FDs & CKs for each.
The FDs that hold in a component are all those of the original whose attributes are in it. (Those of a closure, not just those of a cover.)
A FD is partial when its right hand side is functionally determined by a proper/smaller subset of its left hand side.
A common 2NF definition explicitly disallows partial FDs of non-prime attributes on CKs.
"Violating FD" is not a helpful term, refer to the things that definitions mention.

2NF and 3NF Normalization

I seem to have a strange problem when doing normalization problems. When I'm giving relations with actual names I can figure these out easily but when I'm given letters it seems to be a lot harder.
For the following problem I don't know why it's not 3NF and why it is 2NF.
Given R (A, B, C, D, E, F)
FDs = {AB->C, DBE->A, BC->D, BE->F, F->D}
So for 2NF all the right hand side attributes must be fully functionally dependent on the left hand side attributes. For 3NF either all the left hand side attributes must be superkeys or the right hand attributes must be prime attributes.
I tried drawing this out, but I can't even find a candidate key. Can anyone help me determine why this is not 3NF? Also, what is the candidate key here? Since I don't see any attribute that has a closure equal to the original relation.
I seem to have a strange problem when doing normalization problems.
When I'm giving relations with actual names I can figure these out
easily but when I'm given letters it seems to be a lot harder.
Yes, its less intuitive with letters. I will tell you a neat method which you can follow to determine the candidate keys in such situations :
Make three columns left(L), middle(M) and right(R) where left columns consists of all the attributes that appear only on the left side in all the given functional dependencies. In our case such attributes will be B and E since they are always on the left side of any FD given (or you can say they are never on the right side in any of the given FD.). Similarly middle column contains attribute that appear on both left and right side of the given FD's. So we have A,C,D and F in the middle column. The right column contains attributes which only occur on the right hand side of FD's (never on the LHS of any given FD's). So we have :
L | M |R
B,E|A,C,D,F|-
Now that you have this table remember the following rules: (these are very intuitive)
Attributes in the left(L) column are always part of the candidate keys
Attributes in the right(R) column are never part of the candidate keys
Attributes in the middle(M) column may or may not be a part of the candidate keys.
So in our case we start with checking if BE is a candidate key. We find BE-closure consists of all the attributes of the relation R so it is the candidate key. (Note: If BE would not have been the candidate key then we would have taken attributes from middle(M) column one-by-one and combine it with BE and check its closure eg. BEA,BEC,BED ...)
So now we have only 1 candidate key BE. So our prime attributes are {B,E} and non-prime attributes are {A,C,D,F}.
We know that 3NF is violated if RHS is a non-prime attribute and LHS is not a candidate key. Given FD's are:
AB->C
DBE->A
BC->D
BE->F
F->D
We note that in all these FD's RHS is a non-prime attribute. So in all of these LHS should be a key for it to be in 3NF. We see that (1),(3) and (5) violate this so it is not in 3NF. (Note: In (2) we can see that D on the LHS is an extraneous attribute so its BE->A and hence (2) does not violate 3NF rule)

Boyce-Codd Normal Form explain

Accoring to Boyce-Codd Normal Form Definition,
Reln R with FDs F is in BCNF if, for all X -> A in F+
-A is subset of X (called a trivial FD), or
-X is a superkey for R.
“R is in BCNF if the only non-trivial FDs over R are key constraints.”
If R in BCNF, then every field of every tuple records information that
cannot be inferred using FDs alone.
What I dont understand is the above two statements about normal form,
Can someone give me an example?
Thanks!
Some Pre-requisite terms before I try to Explain:
• Non-key attribute: An attribute that is not part of any candidate key is known as non-key /non-prime attribute.
• Superkey: A set of attributes within a table whose values can be used to uniquely identify a tuple. A candidate key is a minimal set of attributes necessary to identify a tuple; this is also called a minimal superkey.
Now, BCNF is the advance version of 3NF, stricter than 3NF.
A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
Consider a relation : R(A,B,C,D)
The dependencies are:
A->BCD
BC->AD
D->B
So, Candidate keys(or minimal super keys) are A and BC.
But in dependency: D->B, D is not a superkey.
Hence it violates BCNF form.
We can break this relation into R1 and R2 as:
R1(A,D,C) and R2(D,B) to get BCNF.

Resources