I'm studying normalization and was wondering if this table could be considered to be normalized to 2NF?
Yes, it is:
☑ 1NF: A relation is in first normal form if and only if the domain of each attribute contains only atomic (indivisible) values, and the value of each attribute contains only a single value from that domain.
Since the names of the lakes/creeks, though containing the fishes' names, are not dividable, they are atomic in themselves.
In other words: the first words of the lake/creek names alone are not sufficient to identify the lakes/creeks properly and so are the second words.
☑ 2NF: [...] a relation is in 2NF if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the relation. A non-prime attribute of a relation is an attribute that is not a part of any candidate key of the relation.
There is no proper subset of the PK attribute Fish , since it is just one (apart from the empty set {}, see comments). Best Lake is non-prime since it doesn't belong to the PK and it doesn't depend on a subset (because there is no proper one apart from the empty one) but on the whole PK.
I'm interested in database design and now reading the corresponding literature.
Through the book, i have faced a strange example that makes me feel uncertain.
There is a relation
In this table we have a composite primary key (StudentID, Activity). But ActivityFee is partially dependent on the key of the table (Activity -> ActivityFee), so the author suggests to divide this relation into two other relations:
Now if we take a look at the STUDENT_ACTIVITY, Activity becomes a foreign key and relation still has a composite primary key.
We've got the table in which the whole columns defines a composite primary key, is it OK?
If it is not, what should we do in this case? (probably define a surrogate key?)
What is a good way to deal with multivalued attribute (Activity in our case) in order eliminate possible data anomalies?
A table which consists only of a composite key is perfectly OK if that matches your business requirement.
Activity is not a multivalued attribute. There is a single value for activity for each tuple.
There is nothing wrong with a composite candidate key. (If your reference doesn't talk in terms of candidate keys, ie if it talks about primary keys in any other case than when there just happens to be only one candidate key, get a new reference.)
Your text will tell you what is good and bad design. There is no point in worrying about every property you notice about a relation that it might be "bad". The kind of "good" it is currently addressing is that given by "normalization".
"Activity" is not a "multivalued attribute". A "multivalued" attribute is a non-relational notion. The term is frequently but incorrectly used to mean either an "attribute" in a non-relational "table" that somehow (which is never explained) has more than one entry per "row", or for a column in a relational table that has a value with multiple similar parts (set, list, bag, table, etc) that somehow (which is never explained) doesn't apply to say, strings & numerals, or for a column in a relational table that has a value with multiple different parts (record, tuple, etc) that somehow (which is never explained) doesn't apply to, say, dates. (Sometimes it is even misapplied to mean a bunch of attributes with similar names and values, which ought to be replaced by a single attribute with a row for each original name.) (These are just cases of unwanted designs.) "Multivalued" gets used as an antonym of the similarly misused/abused term "atomic".
Having the same (value or) subrow of values appear more than once in a column or table is, again, neither good or bad per se. Again, your reference will tell you what is good design.
I have been learning Normalization from "Fundamentals of Database Systems by Elmasri and Navathe (6th edition)" and I am having trouble understanding the following part about 2NF.
The following image is an example given under 2NF in the textbook
The candidate key is {SSN,Pnumber}
The dependencies are
SSN,Pnumber -> hours, SSN -> ename, pnumber->pname, pnumber -> plocation
The formal Definition:
A relation schema R is in 2NF if every nonprime attribute A in R is
fully functionally dependent on the primary key of R.
for example in the above picture:
if suppose, I define an additional functional dependency SSN -> hours, then taking the two functional dependencies,
{SSN,Pnumber} -> hours and SSN -> hours
the relation wont be in 2NF, because now SSN ->hours is now a partial functional dependency as SSN is a proper subset for the given candidate key {SSN,Pnumber}.
Looking at the relation and its general definition on 2NF, i presume that the above relation is in 2NF
As far as my understanding goes and how i understand what 2NF is,
A relation is in 2NF if one cannot find a proper subset (prime attributes)
of the on the left hand side (candidate key) of a functional dependency
which defines the NPA(non prime attribute).
My first question is, Why is the above relation not in 2NF? (The textbook has considered the above relation as not in 2NF)
There is, however, a informal ways(steps as per the textbook where a normal person not knowing normalization can take to reduce redundancy) being defined at the beginning of this chapter which are:
■ Making sure that the semantics of the attributes is clear in the schema
■ Reducing the redundant information in tuples
■ Reducing the NULL values in tuples
■ Disallowing the possibility of generating spurious tuples
The guideline mentioned is as follows:
My second question is, If the above steps described are taken into account, and consider why the following relation is not in 2NF, do you assume the following functional dependencies, which are,
{SSN,Pnumber} -> Pname
{SSN,Pnumber} -> Plocation
{SSN,Pnumber} -> Ename
making the decomposition of the relation correct? If the functional dependencies assumed are incorrect, then what are the factors leading for the relation to not satisfy 2NF condition?
When looked at a general point of view ... because the table contains more than one primary attributes and the information stored is concerned with both employee and project information, one can point out that those need to be separated, as Pnumber is a primary attribute of the composite key, the redundancy can somehow be intuitively guessed. This is because the semantics of the attributes are known to us.
what if the attributes were replaced with A,B,C,D,E,F
My Third question is, Are functional dependencies pre-determined based on "functionalities of database and a database designer having domain knowledge of the attributes" ?
Because based on the data and relation state at a given point the functional dependencies can change which was valid in one state can go invalid at a certain state.In general this can be said for any non primary attribute determining non primary attribute.
The formal definition :
A functional dependency, denoted by X → Y, between two sets of
attributes X and Y that are subsets of R specifies a constraint on the
possible tuples that can form a relation state r of R. The constraint is
that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must
also have t1[Y] = t2[Y].
So won't predefining a functional dependency be wrong as on cannot generalize relation state at any given point?
Pardon me if my basic understanding of things is flawed to begin with.
Why is the above relation not in 2NF?
Your original/first/informal "definition" of 2NF is garbled and not helpful. Even the quote from the textbook is wrong since 2NF is not defined in terms of "the PK (primary key)" but rather in terms of all the CKs (candidate keys). (Their definition makes sense if there is only one CK.)
A table is in 2NF when there are no partial dependencies of non-prime attributes on CKs. Ie when no determinant of a non-prime attribute is a proper/smaller subset of a CK. Ie when every non-prime attribute is fully functionally dependent on every CK.
Here the only CK is {Ssn, Pnumber}. But there are FDs (functional dependencies) out of {Ssn} and {Pnumber}, both of which are smaller subsets of the CK. So the original table is not in 2NF.
If the above statement is taken into account, do you assume the following functional dependencies
so won't the same process of the decomposition shown based on the informal way alone be difficult each time such a case arrives?
A table holds the rows that make some predicate (statement template parameterized by column names) into a true proposition (statement). Given the business rules, only certain business situations can arise. Then given the table predicates, which give table values from a business situation, only certain database values can arise. That leads to certain tables having certain FDs.
However, given some FDs that hold, we can formally use Armstrong's axioms to get all other FDs that must also hold. So we can use both informal and formal ways to find which FDs hold and don't hold.
There are also shorthand rules that follow from the axioms. Eg if a set of attributes has a different subrow value in each tuple then so does every superset of it. Eg if a FD holds then every superset of its determinant determines every subset of its determined set. Eg every superset of a superkey is a superkey & no proper subset of a CK is a CK. There are also algorithms.
Are functional dependencies pre-determined based on "functionalities of database and a database designer having domain knowledge of the attributes" ?
When normalizing we are concerned with the FDs that hold no matter what the business situation is, ie what the database state is. Each table for each business can have its own particular FDs per the table predicate & the possible business situations.
PS Do "make sense" of formal things in terms of the real world when their definitions are in terms of the real world. Eg applying a predicate to all possible situations to get all possible table values. But once you have the necessary formal information, only use formal definitions and procedures. Eg determining that a FD holds for a table because it holds in every possible table value.
so would any general table be in 2NF based on a solo condition of a table having a composite primary key?
There are tables in 5NF (hence too all lower NFs) with all sorts of mixes of composite & non-composite CKs. PKs don't matter.
It is frequently wrongly said that having no composite CKs guarantees 2NF. A table without composite keys and where {} does not determine any attribute is in 2NF. But if {} determines an attribute then it's a proper/smaller subset of any/every CK with any attributes. {} determines an attribute when every row has to have the same value for that attribute.
Why is the above relation in 2NF?
EP1, EP2, and EP3 are in 2NF because, for each one, the key identifies the non-key. No part of any key identifies any part of any non-key. That is what is meant by for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].
By contrast, you might say EMP_PROJ is over-specified. If ssn identifies, ename (as the text says it does), then the combination of {ssn, pnumber} is too much. There exists a subset of the key {ssn,pnumber} that identifies a part of the non-key, {ename}. That situation does not occur in a table conforming to 2NF, as EP1, EP2, and EP3 illustrate.
Are functional dependencies ... based on ... domain knowledge of the attributes?
Emphatically, yes! That's all they're based on. The DBMS is just a logic machine. The ideas of "employee" and "hours" don't exist for it. The database designer chooses to define tables that model some real-world universe of discourse, and imposes meaning on the columns. He gives names to the attributes (above) in X and Y. He decides which columns serve to identify a row based on what is true about the universe being modeled.
if a table has a composite primary key, regardless of the functional dependencies is not in 2NF?
No. Remember, 2NF is defined in terms of FDs. What could it mean to speak of conforming to 2NF "regardless" of them?
The number of columns in the key is immaterial. It's some set, X, identifying the complement, Y.
I'm not sure if I thoroughly understand your questions, but I'll give a try to explain.
Your first statement about 2NF:
a relation is in 2NF if one cannot find a proper subset on the left hand side of a functional dependency which defines the NPA
is correct, as well as your supposition
if {SSN,Pnumber} -> hours and SSN -> hours then this relation wont be in 2NF
because what that means that you could determine 'hours' from 'SSN' alone, so using the composite key {SSN,Pnumber} to determine 'hours' will be redundant, and thus violates the 2NF requirements.
What you call the left hand side of an FD is usually called a key. You use the key to find the related data. In order to save space (and reduce complexity), you should always try to find a minimal key, and break up larger tables into smaller ones if possible, so you do not have to save information in more places than necessary. This is what normalization to the normal forms is all about, and being studied for about half a century now, substantial theory on the matter has been developed, and some rules chrystalized from it, like 1NF, 2NF, 3NF etc.
Your second question confuses me a lot, because from what you are saying, it seems you already understands this.
Could there be some confusion about the FD's? From the figure, it seems to me as they are defined like this:
{SSN,Pnumber} -> hours
{SSN} -> ename
{Pnumber} -> Pname,Plocation
Just like the three lower tables are modeled, together they add up to the relation (table) modeled above.
So, in the first table, you would need the composite key {SSN,Pnumber} to access any data in the relation (search in the table), while that clearly is not necessary for most of the fields.
Now, I'm not sure about what purpose that table would fulfill in real life. While that is not formally necessary, as long as the FD's are given, it might be easier to imagine why the design will benefit from normalization.
So let's day it's about recording workhours per emplyee per project in some organization. SSN identifies the employee, (whose name also is stored as ename because it is easier to remember, but could be duplicate), Pnumber identifies the project, which name and location is also stored much for the same reason.
Then if you as a manager need to register that an employee worked another few hours on some project, you would use your manager app on your device, which in turn will update the tables seamlessly (you cannot expect managers to understand the logics of normalization)
Behind the scenes, however, it would amount to some query, in SQL that would be an 'INSERT' statement which added another row to the relevant table(s).
Now you can see that in the above table, you would have to insert all the six attributes, while with the normalized tables below, you will only need to add a row to table EP1,consisting of three attributes. In a large organization with thousands of employees delivering their worksheets every week, that will quickly become huge differences in storage requirements. That has a number of benefits, perhaps the most significant beeing search speed.
Your third question I don't understand at all, I'm afraid. In a way you could say FD's are predetermined once you have decided what data you will save in your database. The FD's are not dupposed to change. When modeled in the DB, they will not change. If you later find you will alter the design, then that will be new relations with new FD's.
The text you seem to be quoting from somewhere simply says that if you have the FD X -> Y (X gives or determines Y) then if you have any two tuples (records) in that relation (table) that have the same value of X, they must also hve the same value of Y. Or in our example, if Pnumber somewhere is given the value of 888, Pname is 'Battleship' and Plocation is 'Kitchen Sink', then if somewhere else (some other record) the Pnumber 888 is used then also there Pname must be 'Battleship' and Plocation must be 'Kitchen Sink' because Pname and Plocation is functionally dependant on Pnumber.
Now that was almost another chapter in your textbook, or what? Hope it helps, because it took me some time to write :-)
A table can be said to be in 2NF, if the primary key is composed of multiple columns, and that if for each row these columns were concatenated together into a single string, then the resulting column would qualify as the primary key. Alternatively a single column primary key will also qualify as 2NF.
In this case the same employee could have multiple phone numbers (PNUMBER), so a you cannot have a compound primary key that includes the phone number.
According to 2nd normalization form "All non-key attributes are fully functional dependent on the primary key". It's mean all non-key attributes cannot be dependent on a subset of the primary key.
In facebook we can login by email_id, user_name, or mobile_number (so email_id, user_name, or mobile_number are primary keys). And after login by using any of these methods we access the whole account.
My question is "Is not it partially dependency of non-key attributes on subset of primary key?".
I posted this question in facebook community but didn't get any answer.
For 2NF non-key attributes cannot be dependent on a proper subset of a candidate key. (Every set is a subset of itself. A proper subset is a smaller subset.) (There can be multiple candidate keys. One can be picked as primary.)
If all the keys of a relation variable are single columns then the only proper subsets would be the empty set. A column is functionally dependent on no columns if and only if all the values in that column are the same. There are no other proper subsets for columns to be functionally dependent on. So if all columns can have different values and all candidate keys have just one column then a relation variable must be in 2NF.
Functional dependencies and normal forms apply to a particular relation variable. You have to hypothesize a particular design to ask about its particular tables.
A "whole account" is typically not going to be represented (as part of either implementation state or user description) by only one relation variable just because it would have a lot of update anomalies that normalization would get rid of.
Im just trying to make sure that im thinking of it the right way
1)full dependencies are when one or more primary keys determine another attribute
2)partial dependencies are when one of the primary keys determines another attribute or attributes
3)transitive dependencies are when a nonkey attribute determines another attribute
am i thinking of it right?
This answer is directly from my CS course and obtained from the Connolly and Begg text book.
Full Functional Dependencies
Identify the candidate keys (here, propertyNo, iDate and pAddress). This is because any combination of those 3 can allow you to find what the other attributes are for a given tuple (I can find the staffNo that did the inspection given those three things, I can find the carReg the staffNo used given those 3 things etc.). But note, you need all of those 3 to find the other attributes, not just a subset. Full dependencies always relate to non-candidate keys depending on candidate keys, either depending on all or depending on some.
Partial Dependencies
Given those three candidate keys, look within the candidate keys. Is there any subset(s) of the candidate key which is dependent on the other? Yes, it is pAddress. Given a propertyNo, you can figure out what the address of the property. Then look outside of the candidate keys. Is there any of these keys that depend on only parts of the candidate key, not all components? In this case there are not. So partial dependencies are always dependencies within the candidate keys or dependencies of non-candidate keys on only parts of the candidate keys rather than all components
Transitive Dependencies
Now, look at the non-candidate keys (staffNo, comments, iTime (inspection time), sName, carReg). Within those, is there anything that is functionally dependent on the other? Yes, it is sName - given a staffNo, you can figure out the name of the staff member. But staffNo is functionally dependent on the 3 candidate keys. So by transitivity, propertyNo + iDate + pAddress -> staffNo -> sName, so sName is transitively dependent on staffNo. Transitive dependencies always relate to attributes outside of candidate keys.
Not quite. It would help to be more exact in your terminology: when you say things like "one or more primary keys" you (presumably) really mean "one or more of the columns of the primary key"?
The distinction between a full and a partial dependency only arises when a key consists of more than one column (a composite key):
1) Full dependencies are when the full key is required (all columns of the key) to determine another attribute.
2) Partial dependencies are when the key is composite and some but not all of the columns of the key determine another attribute. (This may still be more than one column.)
3) Transitive dependencies are as you said.
Fully dependent means dependent on all the attributes in question, usually meaning all the attributes of a candidate key. It doesn't have to be a key designated as "primary" because primary keys don't play any special role in dependency theory and normalization.
Partially dependent means dependent on a proper subset of those attributes, usually meaning a proper subset of some candidate key.
Depending on the context, transitive dependency can mean either one of the following:
(1) a dependency of the form A->B, B->C
(2) a dependency of the form A->B, B->C where B isn't a superkey
Almost always the term transitive dependency is used when referring to the situation described by (2) and has become virtually synonymous with that sense even though (1) is the more formally correct meaning.
Partial Dependency: Where an attribute in a table depends on only a part of the primary key and not on the whole key. (For detail see this link)
https://www.studytonight.com/dbms/second-normal-form.php
Transitive Dependency: When a non-prime attribute depends on other non-prime attributes rather than depending upon the prime attributes or primary key. (For detail see this link)
https://www.studytonight.com/dbms/third-normal-form.php