Consider the following table:
The primary key is a composite key consisting of PatID and PhysName. My professor says this table is in 3rd normal form. I thought it's not even in second normal form because the non-key attribute, Name, is not dependent on the entire primary key. You can identify the Name simply by looking at PatID. It is not dependent on PhysName.
In order to really know whether the table is in 2NF or not, you would have to have the functional dependencies explicitly laid out for you.
Inferring the FDs from a small sample of data is a risky business. The smaller the sample, the greater the risk.
We would have to see a patient with two physicians here to see whether the name is the same. I expect it would be, but that's only common sense.
When you move on from classroom exercises to million dollar projects, you'll find that common sense is an unreliable substitute for data analysis.
Given a table value we can see what FDs (functional dependencies) hold in it, hence what its CKs (candidate keys) are and what NFs (normal forms) it satisfies (up to BCNF). (We can't know the CKs & NFs without knowing the FDs.)
A FD (or any constraint) holds in a variable when it holds in every value that can arise. Then its CKs and satisfied NFs are based on those FDs. So for a variable, example data tells us that certain FDs don't hold, and the "trivial" FDs must hold, but for the other FDs example data just doesn't tell us whether they hold.
Since the table value doesn't have {PatId, PhysName} as CK, your instructor must mean that that some variable with that value has that CK. (Of course, you should have got value vs variable straight anyway.) In order to consider that that variable has that CK, they must have decided something like:
the table holds rows that make a true statement from "a physician named PhysName tends a patient they identify as PatId and know by name PatName"
the physicians with a given name each only knows their patients with a given id by one name
(we don't know that it's false that) two different physicians could identify two different patients by the same id
likely that each physician has a unique name
likely that each physician identifies every one of their patients by an id
likely that a physician identifies just one patient via a given id
likely that a physician identifies a patient via only one id
likely that "identifies" always means a 1:1 correspondence of entities & ids
likely that each patient has only one name
etc
You need to know whether it's value vs variable, and it's pointless to argue about a variable and constraints (including FDs) until you agree on the predicate and the BRs (Business rules).
PS Re BRs, predicates & constraints:
A proposition is a statement about a situation: "a physician named 'Scholl, F.' tends a patient they identify as 99999 and know by name 'Gore, Z.'". A predicate is a statement template mapping from a row of column names & values to a proposition: "a physician named PhysName tends a patient they identify as PatId and know by name PatName". A table variable holds the rows that form true propositions in a situation.
BRs (business rules) give variable predicates and characterize what situations can arise. Hence what table variable values can arise, hence what FDs hold, hence the CKs, etc.
Related
The AP database example provided online by Murach's SQL Server 2016 for developers has an Invoice table with the surrogate key InvoiceID but with no natural keys. Most of the other tables have natural keys that uniquely identify each row, so I was curious: why would they provide a table without a natural key to identify what each row represents?
I got the AP database creation script from here:
https://www.murach.com/shop/murach-s-sql-server-2016-for-developers-detail
I think they made a mistake. The natural key here presumably ought to be (VendorID, InvoiceNumber). I have never seen a real accounts payable system that allowed duplicate invoice numbers for the same vendor. Paying the same invoice twice obviously isn't a good idea!
The most common motivation for creating a surrogate is to reduce the impact of having to change other key values. Natural key (AKA business key) values sometimes need to change. Surrogate keys need to change much less frequently because fewer people ever see them and so there's much less reason to change them. That relative stability may have some technical advantages in situations where the business key values are expected to change. Even in the presence of a surrogate, business keys are still critically important because they are the things that users and business processes depend on.
An invoice is a man made piece of data. That's all it is. It has no natural key, because it has no natural identifier. The person or process who creates a new invoice assigns it a number, call it Invoice Number. But that number is just as artificial as Invoice.Id would be. If you want to consider one of those a surrogate, go ahead.
An automobile is a man made piece of gear, but it isn't just data. It's something to drive around. When a new automobile is made it gets assigned a unique identifier, called the Vehicle Identification Number, or VIN. But that key is ultimately just as artificial as Invoice Number. It's just pulled out of thin air, made so that it will be unique, and assigned to the car. There is nothing more "Natural" about VIN than there is about Invoice Number. And there is nothing less "natural" about identifiers that are chosen by the DBMS, perhaps using an autonumber feature.
Edit in response to comments: VIN is assigned at the business level, but it's sole legitimate function is to identify a vehicle. There are rules for its formation, but those rules exist to prevent the same VIN from being assigned to two vehicles. If one of the digits in the VIN says the seating capacity of the vehicle, that's the seating capacity on the day of manufacture. It's possible to change the seating capacity of a vehicle after it's in operation, by ripping out one of the seats.
If all keys that are used by the business domain (alternatively the "conceptual domain") are natural, it must be recognized that in certain businesses a key will be generated inside a computerized system and eventually acquire meaning as it is used at the business layer. Arguments have been made in answers to other questions that surrogate keys should never be revealed to the application user, or perhaps even to the application program, lest it begin to be used in a meaningful way. That's ultimately a philosophical question, and not one of database design.
I have been learning Normalization from "Fundamentals of Database Systems by Elmasri and Navathe (6th edition)" and I am having trouble understanding the following part about 2NF.
The following image is an example given under 2NF in the textbook
The candidate key is {SSN,Pnumber}
The dependencies are
SSN,Pnumber -> hours, SSN -> ename, pnumber->pname, pnumber -> plocation
The formal Definition:
A relation schema R is in 2NF if every nonprime attribute A in R is
fully functionally dependent on the primary key of R.
for example in the above picture:
if suppose, I define an additional functional dependency SSN -> hours, then taking the two functional dependencies,
{SSN,Pnumber} -> hours and SSN -> hours
the relation wont be in 2NF, because now SSN ->hours is now a partial functional dependency as SSN is a proper subset for the given candidate key {SSN,Pnumber}.
Looking at the relation and its general definition on 2NF, i presume that the above relation is in 2NF
As far as my understanding goes and how i understand what 2NF is,
A relation is in 2NF if one cannot find a proper subset (prime attributes)
of the on the left hand side (candidate key) of a functional dependency
which defines the NPA(non prime attribute).
My first question is, Why is the above relation not in 2NF? (The textbook has considered the above relation as not in 2NF)
There is, however, a informal ways(steps as per the textbook where a normal person not knowing normalization can take to reduce redundancy) being defined at the beginning of this chapter which are:
■ Making sure that the semantics of the attributes is clear in the schema
■ Reducing the redundant information in tuples
■ Reducing the NULL values in tuples
■ Disallowing the possibility of generating spurious tuples
The guideline mentioned is as follows:
My second question is, If the above steps described are taken into account, and consider why the following relation is not in 2NF, do you assume the following functional dependencies, which are,
{SSN,Pnumber} -> Pname
{SSN,Pnumber} -> Plocation
{SSN,Pnumber} -> Ename
making the decomposition of the relation correct? If the functional dependencies assumed are incorrect, then what are the factors leading for the relation to not satisfy 2NF condition?
When looked at a general point of view ... because the table contains more than one primary attributes and the information stored is concerned with both employee and project information, one can point out that those need to be separated, as Pnumber is a primary attribute of the composite key, the redundancy can somehow be intuitively guessed. This is because the semantics of the attributes are known to us.
what if the attributes were replaced with A,B,C,D,E,F
My Third question is, Are functional dependencies pre-determined based on "functionalities of database and a database designer having domain knowledge of the attributes" ?
Because based on the data and relation state at a given point the functional dependencies can change which was valid in one state can go invalid at a certain state.In general this can be said for any non primary attribute determining non primary attribute.
The formal definition :
A functional dependency, denoted by X → Y, between two sets of
attributes X and Y that are subsets of R specifies a constraint on the
possible tuples that can form a relation state r of R. The constraint is
that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must
also have t1[Y] = t2[Y].
So won't predefining a functional dependency be wrong as on cannot generalize relation state at any given point?
Pardon me if my basic understanding of things is flawed to begin with.
Why is the above relation not in 2NF?
Your original/first/informal "definition" of 2NF is garbled and not helpful. Even the quote from the textbook is wrong since 2NF is not defined in terms of "the PK (primary key)" but rather in terms of all the CKs (candidate keys). (Their definition makes sense if there is only one CK.)
A table is in 2NF when there are no partial dependencies of non-prime attributes on CKs. Ie when no determinant of a non-prime attribute is a proper/smaller subset of a CK. Ie when every non-prime attribute is fully functionally dependent on every CK.
Here the only CK is {Ssn, Pnumber}. But there are FDs (functional dependencies) out of {Ssn} and {Pnumber}, both of which are smaller subsets of the CK. So the original table is not in 2NF.
If the above statement is taken into account, do you assume the following functional dependencies
so won't the same process of the decomposition shown based on the informal way alone be difficult each time such a case arrives?
A table holds the rows that make some predicate (statement template parameterized by column names) into a true proposition (statement). Given the business rules, only certain business situations can arise. Then given the table predicates, which give table values from a business situation, only certain database values can arise. That leads to certain tables having certain FDs.
However, given some FDs that hold, we can formally use Armstrong's axioms to get all other FDs that must also hold. So we can use both informal and formal ways to find which FDs hold and don't hold.
There are also shorthand rules that follow from the axioms. Eg if a set of attributes has a different subrow value in each tuple then so does every superset of it. Eg if a FD holds then every superset of its determinant determines every subset of its determined set. Eg every superset of a superkey is a superkey & no proper subset of a CK is a CK. There are also algorithms.
Are functional dependencies pre-determined based on "functionalities of database and a database designer having domain knowledge of the attributes" ?
When normalizing we are concerned with the FDs that hold no matter what the business situation is, ie what the database state is. Each table for each business can have its own particular FDs per the table predicate & the possible business situations.
PS Do "make sense" of formal things in terms of the real world when their definitions are in terms of the real world. Eg applying a predicate to all possible situations to get all possible table values. But once you have the necessary formal information, only use formal definitions and procedures. Eg determining that a FD holds for a table because it holds in every possible table value.
so would any general table be in 2NF based on a solo condition of a table having a composite primary key?
There are tables in 5NF (hence too all lower NFs) with all sorts of mixes of composite & non-composite CKs. PKs don't matter.
It is frequently wrongly said that having no composite CKs guarantees 2NF. A table without composite keys and where {} does not determine any attribute is in 2NF. But if {} determines an attribute then it's a proper/smaller subset of any/every CK with any attributes. {} determines an attribute when every row has to have the same value for that attribute.
Why is the above relation in 2NF?
EP1, EP2, and EP3 are in 2NF because, for each one, the key identifies the non-key. No part of any key identifies any part of any non-key. That is what is meant by for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].
By contrast, you might say EMP_PROJ is over-specified. If ssn identifies, ename (as the text says it does), then the combination of {ssn, pnumber} is too much. There exists a subset of the key {ssn,pnumber} that identifies a part of the non-key, {ename}. That situation does not occur in a table conforming to 2NF, as EP1, EP2, and EP3 illustrate.
Are functional dependencies ... based on ... domain knowledge of the attributes?
Emphatically, yes! That's all they're based on. The DBMS is just a logic machine. The ideas of "employee" and "hours" don't exist for it. The database designer chooses to define tables that model some real-world universe of discourse, and imposes meaning on the columns. He gives names to the attributes (above) in X and Y. He decides which columns serve to identify a row based on what is true about the universe being modeled.
if a table has a composite primary key, regardless of the functional dependencies is not in 2NF?
No. Remember, 2NF is defined in terms of FDs. What could it mean to speak of conforming to 2NF "regardless" of them?
The number of columns in the key is immaterial. It's some set, X, identifying the complement, Y.
I'm not sure if I thoroughly understand your questions, but I'll give a try to explain.
Your first statement about 2NF:
a relation is in 2NF if one cannot find a proper subset on the left hand side of a functional dependency which defines the NPA
is correct, as well as your supposition
if {SSN,Pnumber} -> hours and SSN -> hours then this relation wont be in 2NF
because what that means that you could determine 'hours' from 'SSN' alone, so using the composite key {SSN,Pnumber} to determine 'hours' will be redundant, and thus violates the 2NF requirements.
What you call the left hand side of an FD is usually called a key. You use the key to find the related data. In order to save space (and reduce complexity), you should always try to find a minimal key, and break up larger tables into smaller ones if possible, so you do not have to save information in more places than necessary. This is what normalization to the normal forms is all about, and being studied for about half a century now, substantial theory on the matter has been developed, and some rules chrystalized from it, like 1NF, 2NF, 3NF etc.
Your second question confuses me a lot, because from what you are saying, it seems you already understands this.
Could there be some confusion about the FD's? From the figure, it seems to me as they are defined like this:
{SSN,Pnumber} -> hours
{SSN} -> ename
{Pnumber} -> Pname,Plocation
Just like the three lower tables are modeled, together they add up to the relation (table) modeled above.
So, in the first table, you would need the composite key {SSN,Pnumber} to access any data in the relation (search in the table), while that clearly is not necessary for most of the fields.
Now, I'm not sure about what purpose that table would fulfill in real life. While that is not formally necessary, as long as the FD's are given, it might be easier to imagine why the design will benefit from normalization.
So let's day it's about recording workhours per emplyee per project in some organization. SSN identifies the employee, (whose name also is stored as ename because it is easier to remember, but could be duplicate), Pnumber identifies the project, which name and location is also stored much for the same reason.
Then if you as a manager need to register that an employee worked another few hours on some project, you would use your manager app on your device, which in turn will update the tables seamlessly (you cannot expect managers to understand the logics of normalization)
Behind the scenes, however, it would amount to some query, in SQL that would be an 'INSERT' statement which added another row to the relevant table(s).
Now you can see that in the above table, you would have to insert all the six attributes, while with the normalized tables below, you will only need to add a row to table EP1,consisting of three attributes. In a large organization with thousands of employees delivering their worksheets every week, that will quickly become huge differences in storage requirements. That has a number of benefits, perhaps the most significant beeing search speed.
Your third question I don't understand at all, I'm afraid. In a way you could say FD's are predetermined once you have decided what data you will save in your database. The FD's are not dupposed to change. When modeled in the DB, they will not change. If you later find you will alter the design, then that will be new relations with new FD's.
The text you seem to be quoting from somewhere simply says that if you have the FD X -> Y (X gives or determines Y) then if you have any two tuples (records) in that relation (table) that have the same value of X, they must also hve the same value of Y. Or in our example, if Pnumber somewhere is given the value of 888, Pname is 'Battleship' and Plocation is 'Kitchen Sink', then if somewhere else (some other record) the Pnumber 888 is used then also there Pname must be 'Battleship' and Plocation must be 'Kitchen Sink' because Pname and Plocation is functionally dependant on Pnumber.
Now that was almost another chapter in your textbook, or what? Hope it helps, because it took me some time to write :-)
A table can be said to be in 2NF, if the primary key is composed of multiple columns, and that if for each row these columns were concatenated together into a single string, then the resulting column would qualify as the primary key. Alternatively a single column primary key will also qualify as 2NF.
In this case the same employee could have multiple phone numbers (PNUMBER), so a you cannot have a compound primary key that includes the phone number.
Thank you for your knowledge in advance. I am studying for the Microsoft Technology Exam and one of the practice questions is :
Creating a primary key satisfies the first normal form. True or False?
I personally think it is False because the first normal form is to get rid of duplicate groups. But there is a sentence in the text (Database Fundamentals, Exam 98-364 by Microsoft Press) that says the following:
"The first normalized form (1NF) means the data is in an entity format, which basically means that the following three conditions must be met:
• The table must have no duplicate records. Once you have defined a primary key for the table, you have met the first normalized form criterion."
Please help me understand this, please explain like I am five. Thanks.
I can't explain this stuff to a five year old. I've tried. But I may be able to shed a little light on the subject. The first thing you need to know is that there have been multiple definitions of 1NF over the years,and these definitions sometimes conflict with each other. This may well be the source of your confusion, or at least some of it.
A useful thing to know is what purpose Ed Codd had in mind when he first defined it. Ed Codd defined First Normal Form, which he called Normal Form, back in the paper he published in 1970. His purpose in that paper was to demonstrate that a database built along relational lines would have all the expressive power that existing databases had. Existing databases often dealt with a parent that owns a set of children. For example, if the parent data item contains data about a student, each child might contain data about one course the student is taking.
You can actually define such a structure in terms of mathematical relations by allowing one of the attributes of a relation to be itself a relation. I'm going to call that "nesting" relations, although I don't recall what Ed Codd called it. In defining the relational data model, which is closely patterned after mathematical relations, Ed Codd wanted, for a variety of reasons, to forbid such a structure. his reasons were mostly practical, to make it more feasible to build the first relational database.
So he devoted some of his paper to proving that you could limit attributes to "simple" values without reducing the expressive power of the relational data model. I'm going to sidestep what "simple" means for the moment, although it's worth coming back to. He called this limitation "normal form". Once a second normal form was discovered, normal form got renamed to first normal form.
When it came time to build a relational database the engineers decided on a data structure called a "table". (I don't know the actual history, but this is approximate). A table is a logical structure made up of rows and columns. It can be thought of as an array of records, where each record represents a row, and all the records have the same header.
Now, if you want such a structure to represent a relation, you have to throw in a restriction that will prevent two rows with exactly the same values. If you had such duplicates, this would not represent a relation. A relation, by definition, has distinct elements. This is where primary keys come in. A table with a primary key can't have duplicate rows, because it can't have duplicate keys.
But I'm not done yet. You didn't ask this, but it has come up a thousand times in stack overflow, so it's worth putting in here. A designer can defeat Ed Codd's original intent by creating a column that contains text that, in turn contains comma separated values. In Codd's original formulation, a list of values is not "simple".
This is enormously appealing to the neophyte because it looks simpler and more efficient, to store a table with comma separated values than to create two tables one for parent records and the other for child records, and to join them when they are both needed for one query. Joins are not simple to the neophyte, and they do take some computer resources.
The CSV in a column design turns out to be an unfortunate design in nearly every case. The reason is that certain queries that could have been done real fast via an index now require a full table scan. This can turn seconds into minutes or minutes into hours. It's much more expensive than a join.
So you have to teach the newbies why keyed access to all data is a good thing, and this means you have to teach them what 1NF is really all about. And this can be as hard as teaching a five year old. Newbies are typically less ignorant than five year olds, but they tend to be more stubborn.
First Normal Form is mostly a matter of definition rather than design. In a relational system, the data structures are relation variables. Since a relation always consists of unique tuples a relation variable will always have at least one candidate key. By convention we call one key per relation a "primary" key so in a relational database the primary key requirement is always satisfied.
Similarly, in a relational database all attributes contain values which are identifiable by name, not by positional index and so the issue of "repeating groups" does not apply. The concept of a "repeating group" exists in some non-relational systems and that was what Codd was referring to when he originally defined 1NF.
However, problems of the interpretation of 1NF arise because most modern DBMSs are not truly relational even though people try to use them like relational systems. Since SQL DBMSs are not relational, how are we to interpret relational concepts like 1NF in a SQL DBMS?
The essense of 1NF is that each table must have a key and that tuples consist of single values for each attribute. Most SQL-based systems don't support the concept of "repeating groups" (multiple values in a single attribute position) so it is usually safe to say that if a SQL table has a key and does not permit nulls in any attribute position then it is "relational" and satisfies the spirit of 1NF.
A primary key must be completely unique. So once this is part of a record, it is distinct from any other record.
eg.
Record 1
---------
KEY = 1
Name = Fred Boggs
Age = 84
Record 2
--------
KEY = 2
Name = Fred Boggs
Age = 84
These 2 records are different because the field KEY is different.
Therefore although the rest of the data is the same, it meets the requirements for 1NF.
You are only quoting a fragment of the text Database Administration Fundamentals. A more complete quote is:
The first normalized form (INF) means the data is in an entity format,
which basically means that the following three conditions must be met:
• The table must have no duplicate records. [...]
• The table also must not have multivalued attributes, meaning that
you can't combine in a single column multiple values that are
considered valid for a column. [...]
• The entries in the column or attribute must be of the same data
type.
(The history of the term "1NF" is full of confusions, vagueness and changes. But here's what this text says.)
Let me join the party ;)
For a question "is this relation in 1NF" to have a meaning, you first need a relation. And for your table to be a relation, you need a key. A table without any keys is not a relation.
Why? Because relation is a set (of tuples/rows) and a set cannot contain same element more than once (otherwise it would be multiset), which is ensured by a key.
Once you have a relation by having a key, you can see if all your attributes are atomic, and if they are, you have yourself a 1NF.
So the answer to...
Creating a primary key satisfies the first normal form. True or False?
...is False. You do need a key, but you also need atomicity.
I was thinking about this problem. In database design most of the times surrogate keys are used, but how to prevent data duplication and inconsistent data? I mean one could have a customer table made of customer_id, name, surname. What would prevent me of inserting the same customer twice with a different customer_id? Sure I could add a unique index to name and surname, but if one does so than what's the purpose of the surrogate primary key?
You're asking a business question, not a technical one.
"How do I know whether two people with the same name are the same person or not?"
Well typically customers are not identified by a name alone, there is also one of:
An account number
An email address
A postal address
A credit card number
A passport number
A date of birth
... etc.
The name is simply not a uniquely identifying characteristic, it's just an attribute of a customer that is probably non-unique, so you need something else to help identify them. Within the database this is the primary key of the customer table, but for business purposes it could be any number of attributes.
If there is a natural key, you cannot replace it with a surrogate key. You can only add the surrogate without removing the natural. This has pros and cons, as I described here.
Unfortunately, there is no good natural key in the case you described, since two different human beings can easily have the same combination of first and last name. Therefore, you'll have to come-up with some additional attributes that represent a better criteria for judging whether two people are "identical" or not, and then create the corresponding natural key. Discovering such criteria is part of the requirement gathering and therefore impossible for me to do without knowing more about your domain.
If you are unable to identify such natural key, then you can just leave customer_id alone. That means you made a decision to make it acceptable for two people to be identical in every aspect (except in their customer_id) and still be considered "different". Arguably, such customer_id may no longer be called "surrogate", since its value now has a meaning in your data model, is potentially visible in the UI etc.
What you have said is perfectly logical and correct. Surrogate keys are not any kind of substitute for a natural key (AKA business key or domain key, i.e. the set of attributes used to identify information in the database and relate it to the real world things the database is supposed to model). If you care about data integrity then natural keys are essential, whereas surrogates by definition are optional and supplemental. Add surrogate keys only when and where you find they have a useful benefit.
The only purpose of the id (or "surrogate key" as you call it) is to uniquely identify a record.
First, say you will use name as a key. What will you do if:
the customer changes its name (in some countries women change their surname to their husbands');
you make a typo in customers name and have to correct it afterwards?
Then you are in a big trouble, because despite of the fact that you can change it,
id should never be changed!
Otherwise, you can make a big mess not only in your database, consistency along backups, logs etc, but also in all the external sources refering to it.
Second, how do you know you won't get two customers with the same name?
You cannot stop people from describing the world wrongly in the database. You can only stop them from describing the world wrongly in the database if the way they described it can't ever happen.
When there is no previous "natural" identifying property used in the business outside the database being stored in the database then we have to pick a "surrogate" distinguishing identifier after the system starts. (Some people wouldn't use "natural" for such an identifier picked after the system starts even though it is used in the business outside the database. And some people wouldn't use "surrogate" for such a distinguishing identifier used in the business system outside the database.)
someone told me the following table isn't fit for the second database normalization. but i don't know why? i am a newbie of database design, i have read some tutorials of the 3NF. but to the 2NF and 3NF, i can't understand them well. expect someone can explain it for me. thank you,
+------------+-----------+-------------------+
pk pk row
+------------+-----------+-------------------+
A B C
+------------+-----------+-------------------+
A D C
+------------+-----------+-------------------+
A E C
+------------+-----------+-------------------+
Your question cannot be answered properly unless you state what dependencies are supposed to be satisfied here. You appear to have two attributes with the same name (pk), in which case this table doesn't even satisfy 1NF because it doesn't qualify as a relation.
About your example: that table doesn't fit the second database normalization (with your sample data, I presume that the C depends only on A). The second normalization form requires that:
No non-prime attribute in the table is functionally dependent on a
proper subset of a candidate key
(Wikipedia)
So the C depends on "A", which is a subset of your primary key. Your primary key is a special superkey. (dportas point out the fact that it can't be called candidate key, since it's not minimal).
Let's say more about the second normalization form. Transform your example a little for easy understanding, presume that there's a table CUSTOMER(customer_id, customer_name, address). A super key is a sub-set of your properties which uniquely determine a tube. In this case, there are 3 super key: (customer_id) ; (customer_id, customer_name) ; (customer_id, customer_name, address). (Customer name may be the same for 2 people)
In your case, you have determined (customer_id, customer_name) be the Primary Key. It violated the second form rules; since it only needs customer_id to determine uniquely a tube in your database. For the sake of theory accuration, the problem here raised from the choice of primary key(it's not a candidate key), though the same argument can be applied to show the redundance. You may find some useful example here.
The third normal form states that:
Every non-prime attribute is
non-transitively dependent on every
candidate key in the table
Let give it an example. Changing the previous table to fit the second form, now we have the table CUSTOMER(customer_id,customer_name, city, postal_code), with customer_id is primary key.
Clearly enough, "postal_code" depends on the "city" of customer. This is where it violated the third rule: postal_code depends on city, city depends on customer_id. That means postal_code transitively depends on customer_id, so that table doesn't fit the third normal form.
To correct it, we need to eliminate the transitive dependence. So we split the table into 2 table: CUSTOMER(customer_id, customer_name, city) and CITY(city, postal_code). This prevent the redundance of having too many tubes with the same city & postal_code.