How do you tell if a relation R is in BCNF and 3NF?
I'm reading a textbook, and it's telling me that there are 3 main attributes you're looking at, but I'm having trouble understanding what they're saying, or at least applying what they're saying when given a relation and FD's.
The 3 attributes:
Given a relation R with the attribute A, and X a subset of attributes of R, for every FD X⟶A in F, one of the following statements is true:
A ∈ X; that is, it is a trivial FD (∈ meaning "is found in X")
X is a superkey
A is part of some key for R
The top two correspond to BCNF, and 3NF's include the third.
The book SQL Antipatterns by Bill Karwin has a nice example about BCNF and 3NF on page 303 that is a little complicated but I believe points out the difference more succinctly than any description of the difference I've read so far.
For example, suppose we have three tag types: tags that describe the
impact of the bug, tags for the subsystem the bug affects, and tags
that describe the fix for the bug. We decide that each bug must have
at most one tag of a specific type. Our candidate key could be bug_id plus
tag, but it could also be bug_id plus tag_type. Either pair of
columns would be specific enough to address every row individually.
bug_id tag tag_type
------------------------
1234 crash impact
3456 printing subsystem
3456 crash impact
5678 report subsystem
5678 crash impact
5678 data fix
The book then changes this single table (which satisfies 3NF) into two tables that satisfy BCNF:
bug_id tag
----------
1234 crash
3456 printing
3456 crash
5678 report
5678 crash
5678 data
tag tag_type
------------------
crash impact
printing subsystem
report subsystem
data fix
Related
Suppose we have a table with 3 columns A,B and C
A B C
---------------
1 2 3
2 4 5
4 6 7
n 5 n
Here 'n' means null.
Can we say that A -> B and A -> C? I know the definition of functional dependencies but I'm just confused in the case of null values.
If null is considered a value, then the answer is yes. A -> B, C holds in the given data. However, to be a value imposes certain requirements. All operators applicable to the domain (e.g. integers) like equality, addition, less than, and so on, must be well-defined in the presence of nulls.
If null is not a value, then the answer is more complicated. Functional dependencies, strictly speaking, apply to relations. If a table represents a relation, then we can refer to functional dependencies in the table. However, a symbol that represents the absence of a value is metadata, not data. It allows multiple union-incompatible relations to be represented by a single table. In this case, we can't apply the concept of functional dependency to the table since it's not clear which relation we're talking about.
Further confusing things, SQL DBMSs don't handle nulls consistently. In some cases, they're handled like values, in others like the absence of values. If you want to understand and describe a table logically, the best option is to decompose it into a set of null-free relations, and then to analyze each of those parts independently.
In the case of your example table, we run into a problem if null isn't a value. The last row has no unique identifier (it can't be B:4 since another row has B:4 as well) and we can't determine anything from a lack of information. The example can't be decomposed into a set of relations without discarding that row.
If we change the last row to have B:5 instead, then we decompose it into two relations: R1 = {(A:1, B:2, C:3), (A:2, B:4, C:5), (A:4, B:6, C:7)} and R2 = {(B:2), (B:4), (B:6), (B:5)}. We can say A -> B, C holds in R1 but not in R2.
This is an example from a textbook:
Consider the relation R (A ,B ,C ,D ,E ) with FD’s AB -> C,
C -> B, and A -> D.
We get that the key is ABE and ACE. With decompositions: ABE+=ACE+=ABCDE.
How do you check minimality? I know that AB+=ABD and the textbook says that because AB+ does not include C. Then it is minimal. C+=AB and A+=AD are also minimal. But I do not know why. How do you check minimality?
Also, do we have to find all the FD's besides the ones given to check whether to perform 3-NF or not?
We then check if AB -> C can be split into A -> C and B -> C, we notice that these do not stand on their own so AB -> C is not splittable.
We are left with the final relations: S1(ABC), S2(BC), S3(AD) and the key (since not present) S4(ABE) (or S4(ABC)). We then remove S2 because it's a subset of S1.
If it is in 3NF and there are no violations, then why do they split the original relation into: S1(A, B, C), S2(A, D), and S4(A, B, E).
Book name and page: Ullman's Database Systems page 103
How do you check minimality?
The authors don't use the word minimality here. To check for the minimal basis, follow the procedure in the first two paragraphs of example 3.27. It boils down to
". . . verify that we cannot eliminate any of the given dependencies."
". . . verify that we cannot eliminate any attributes from a left side."
Also, do we have to find all the FD's besides the ones given to check whether to perform 3-NF or not?
That question doesn't really make sense. 3NF isn't something you perform. The example in the textbook has to do with the synthesis algorithm for 3NF schemas. The synthesis algorithm decomposes a relation R into relations that are all in at least 3NF.
The synthesis algorithm operates on the FDs you've been given. In an academic setting, as you might find in a textbook, the assumption is that you've been given enough information to solve the problem. In real-world applications, you might be given a set of FDs from a business analyst. Don't assume the analyst has given you enough information; look for more FDs.
We then check if AB -> C can be split into A -> C and B -> C, we notice that these do not stand on their own so AB -> C is not splittable.
No. You verify (not notice) that you can't eliminate any attributes from a left side. Eliminating A leaves B->C; eliminating B leaves A->C. Neither of these are implied by the three original FDs. So you can't eliminate any attributes from a left side.
If [the original relation] is in 3NF and there are no violations . . .
The original relation is not in 3NF. It's not even in 2NF. (A->D)
I have a problem about the 2nd normal form. The rule says : “A relation is in second formal form when it is in 1NF and there is no such non-key
attribute that depends on part of the candidate key, but on the entire candidate key.” (Neeraj Sharma, 2010) My problem is about the candidate key. It is only the primary key of a relation or all possible candidate keys.
Thank you for your help
It counts for any candidate key. If it counted only for the primary key, simply adding a surrogate id would be enough to put any table into 3NF. However, that wouldn't help to ensure that each fact is recorded once only and independent of other facts.
Trying to clear your doubt by an example:
According to 2NF "Partial Dependencies are not allowed in a relation."
Assume this relation: R(A,B,C,D)
lets suppose there are 3 CK's related to this relation (Assume CK's: AB,AC,B).
Then first write all the attributes that are present in any of CK's,these are called Prime attributes.Other than that are called non prime attributes.
Here:
Prime Attributes (3)= {A,B,C}
Non Prime Attributes (1)={D}
Now According to 2 NF, any FD should not be in this form:
This kind of FD's aren Not allowed in 2NF:
"Part of any candidate key(Partial Dependency) ---> Non Prime attribute"
Means:
Here : C---> D(Not allowed in 2 NF because C is a part of CK "AC" and D is non prime attribute)
Hope this helps. For more detail, you can also refer : Detailed explanation of Normal forms
I am reading the book database management systems by Ramakrishnan, and in the chapter related to schema refinement and normal forms, i saw a sentence saying:
K is a candidate key for R means that K ----> R , where R is the relation.
We also have the decomposition rule:
If X ---->YZ, then X----->Y and X----->Z
Then, my question is, for example let R=XABCDE and X be the key. Then, since X--->XABCDE, using the second rule repeatedly, we can say X-->A, X--->B, and so on. Then that means X determines all of the attributes. But i am confused here:Then we cannot have a row in the table such that for the same X value, there is a different A value. For example, let X be the id number of a person attribute, and A be the model of the car that person has. Then a person cannot have two cars, but we do not have such a constraint, it must be able to have two or more cars.
What am i doing wrong here? Can anyone help?
Thanks
For example, let X be the id number of a person attribute, and A be the model of the car that person has. Then a person cannot have two cars, but we do not have such a constraint, it must be able to have two or more cars.
What am i doing wrong here? Can anyone help?
You went wrong before you started normalizing R.
Part of the job of a database designer is to decide what the database is supposed to store. This has nothing to do with normalization. In textbook problems, this part is done before the problem is presented to you.
If you start with R{XABCDE}, where "X" is a person's ID number, and "A" is a kind of car, sample data for R might look like this.
person_id car_model B C D E
--
1 Buick Wildcat ...
2 Toyota Corolla ...
3 Honda Accord ...
Or it might look like this.
person_id car_model B C D E
--
1 Buick Wildcat, Nissan Sentra ...
2 Toyota Corolla ...
3 Honda Accord ...
Or it might look like this.
person_id car_model B C D E
--
1 Buick Wildcat ...
1 Nissan Sentra ...
2 Toyota Corolla ...
3 Honda Accord ...
The first example suggests that you want to store only one car per person. That's a defensible design decision (unless the database needs to know how many cars each person has). Universities rarely care how many cars you have; they just want to know which one is supposed to have a parking sticker.
Deciding what to store has nothing to do with normalization.
The other examples suggest that you want to store more than one car per person, in which case you need to do some normalization at the very least (in the second example) or reconsider your choice of primary key at the very least (in the third example).
Once you've decided what to store, you can start normalizing. Really, how could you start normalizing before you decide what to store? That would be impossible.
In relation R(XABCDE), if X is a key then for any value of X the relation only permits one value for A,B,C,D and E at any point in time. If that constraint doesn't match the reality you intended to model then maybe X was the wrong choice of key.
I was learning database normalization and join dependencies and
5NF. I had a hard time. Can anyone give me some practical examples of the multivalue dependency rule:
MVD3: (transitivity) If X ↠ Y and Y ↠ Z, then X ↠ (Z − Y).
Functional dependency / Normalization theory and the normal forms up to and including BCNF, were developed on the hypothesis of all data attributes (columns/types/...) being "atomic" in a certain sense. That "certain sense" has long been deprecated by now, but essentially it boiled down to the notion that "a single cell value in a table could not itself hold a multiplicity of values". Think, a textual CSV list of ISBN numbers, a table appearing as a value in a cell in a table (truly nested tables), ...
Now imagine an example with courses, professors, and study books used as course material. Imagine all of that modeled in a single 3-column table which says that "Professor (P) teaches course (C) and uses book (B) as course material." If there can be more than one book (B) used for any given course (Cn) and there can be more than one course (C) taught by any given professor (Pn) and there can be more than one professor (P) teaching any given course (Cn), then this table is clearly all-key (key is the full set of attributes {P,C,B} ).
This means that this table satisfies BCNF.
But now imagine that there is a rule to the effect that "the set of books used for any given course (Cn) must be the same, regardless of which professor teaches it.".
In the days when normalization was developed to the form in which it is now commonly known, it was not allowed to have table columns (relation attributes) that were themselves tables (relations). (Because such a design was considered a violation of 1NF, a notion which is now considered suspect.)
Imagine for a moment that we are indeed allowed to model relation attributes to be of type relation. Then we could model our 3-column table (/relation) as follows : "Professor (P) teaches course (C) and uses THE SET OF BOOKS (SB) as course material.". Attribute SB would no longer be an ISBN number, as in the previous and more obvious design, but it would be a (probably unary) RELATION holding the entire set of ISBN numbers. If we draw our design like that, and we then consider our rule that "all professors use the same set of books for the same course", then we see that this rule is now expressible as an FD from (C) to (SB) !!! And this means that we have a violation of a lower NF on our hand !!!
4 and 5 NF have arisen out of this kind of problems (where the appearance of a single attribute value -courseID (C)- causes a requirement for the appearance of A MULTITUDE of rows (multiple (B) ISBn numbers) being recognised quite early on, but without the solution that is currently regarded as the best (RVA's), being recognised as a valid one. So 4 and 5 NF were created "new and further normal forms", where the then-existing definitions of 2, 3 and BC NF were already sufficient for dealing with the situation at hand, provided RVA's had been recognised as a valid design approach.
To support that claim, let's look at what whould be done to eliminate the NF violation in our {P,C,SB} design with the FD C->SB :
We would split the table into two separate tables {P,C} and {C,SB} with keys {P,C} and {C}, repsectively. Both tables satisfy BCNF.
But we still have this SB attribute that holds a set of ISBN numbers. Dealing with this can be done by applying a technique like "UNGROUPING". Applying this to our {C,SB} table would get us a {C,B} table, where B are the ISBN book numbers (or whatever identifier you like to use in your database), and the key to the table is {C,B}. This is exactly the same design we would get if we eliminated the 4/5 NF violation !!!
You might also want to take a look at Multivalue Dependency violation?