Does this change match 3NF on northwind database? - database

I'd like to make those tables match the Inmons 3NF approach (im working on northwind database):
Before
I noticed Address thing keeps repeating and its not kinda even atomic so i decided to put another table in diagram called "Address" like this:
After
Is this valid approach?
As table Address stores all the addresses anyway can i share them for all other tables?
Thanks

Your solution is valid and is definitely more normalized.
However, it is not in 3NF yet. Strictly speaking, for a table to be in 3NF, you must not have any non-key inter-dependencies. In your example, there are such dependencies, between city and country for example. So every time someone enters Paris, they also need to enter France. This could lead to anomalies if someone accidentally enters Paris, Germany. For 3NF you would have to create an additional City table, which stores cities and their respective countries. City would be the key, country a non-key attribute. Address table would have a foreign key to city. I omitted Postcode and Region for brevity, but they would also need to be included in normalization. So, to make this 3NF you'd need a few more entities.
This complexity is why Kimball's star schema is more popular than Inmon's 3NF approach...

Related

Database - Trying to solidify understanding of normalization, how normalized is this?

how normalized is this table?
Example SqlFiddle
So I know that the topic and definition of normalization itself has been pretty well discussed but I was hoping I could get some clarification on my understanding of normalization. An example is a diagram I drew out in Access real quick, from what I think, I believe that these relationships and tables themselves all fit in the 3NF criteria. There is a Projects table with the following fields ProjNumber(PK), ProjName, and ProjDesc. Then there is an Assignments table with a compound key consistent of EmpID/ProjNumber, with the fields HourlyBillingRate, NumOfHours, and TeamNum. And lastly is the Teams table, which consists of the fields TeamNum(PK), TeamName, ProjNumber.
The ProjNumber from Assignments and Teams are both foreign keys that relate back to the Projects table, and the TeamNum field from Assignments is a foreign key relating back to the Teams table primary key. I'm not too sure if it's necessary to directly relate back to the Teams table, if I have the ProjNumber foreign key because that project would have an associated TeamNum.
The context of these tables is that there is a project that has to be done, a team associated with carrying out that team, and then the employees that are on that team which are paid an hourly billing rate for that proj they are working under.
The reason I use a compound key, is I wanted to answer the question of "What is the employee works on multiple projects?", so I couldn't make EmpID the sole primary key, thus I chose to make it a compound key because even if the employee works on multiple projects, the combination of the two will always be unique. I believe that each field is necessary and relevant fully with their respective primary keys.
Thoughts? Does it in fact fulfill 3NF criteria?
It depends. Your diagram and discussion appear to assume that the primary key is the only candidate key in each of the tables. That appears not to be the case.
In the Assignments table, it looks as though EmpID and TeamNumber is another candidate key, provided that TeamNumber may not be NULL.
If we look at this table with EmpId, TeamNumber as the key, then it is not in 2NF. ProjNumber is determined by TeamNumber, which is not the whole key.
So now the answer to your question turns on whether FDs are analyzed with respect to all candidate keys or just the declared prmary key. I have seen tutorials on on normalization that go both ways. I follow the one that considers all candidate keys, so the table is not in 2NF.
Unless I've misconstrued the FDs in your case, or Assigment.TeamNumber can be NULL.
HOWEVER, your SQL Fiddle presentation is different. Now, if there are several teams on one project, and an employee is assigned to one project for a few hours, there isn't any way to tell what team the employee was on. The FDs are not the same in the SQL Fiddle example and in the implicatins I take from your diagram.

Should I make a foreign key that can be null or make a new table?

I have a small question concerning with how I should design my database. I have a table dogs for an animal shelter and I have a table owners. In the table dogs all dogs that are and once were in the shelter are being put. Now I want to make a relation between the table dogs and the table owners.
The problem is, in this example not all dogs have an owner, and since an owner can have more than one dog, a possible foreign key should be put in the table dogs (a dog can't have more than one owner, at least not in the administration of the shelter). But if I do that, some dogs (the ones in the shelter) will have null as a foreign key. Reading some other topics taught me that that is allowed. (Or I might have read some wrong topics)
However, another possibility is putting a table in between the two tables - 'dogswithowners' for example - and put the primary key of both tables in there if a dog has an owner.
Now my question is (as you might have guessed) what the best method is of these two and why?
The only solution that is in keeping with the principles of the Relational Model is the extra table.
Moreover, it's hard to imagine how you are going to find any hardware that is so slow that the difference in performance when you start querying, is going to be noticeable. After all, it's not a mission-critical tens-of-thousands-of-transactions-per-second appliation, is it ?
I agree with Philip and Erwin that the soundest and most flexible design is to create a new table.
One further issue with the null-based approach is that different software products disagree over how SQL's nullable foreign keys work. Even many IT professionals don't understand them properly so the general user is even less likely to understand it.
The nullable foreign key is a typical solution.
The most straightforward one is just to have another table of owners and dogs, with foreign keys to the owner and dog tables with the dog column UNIQUE NOT NULL. Then if you only want owners or owned dogs you do not have to involve IS NOT NULL in your queries and the DBMS does not need to access them among all owners and dogs. NULLs can simplify certain situations like this one but they also complicate compared to having a separate table and just joining when you want that data.
However, if it could become possible for a dog to have multiple owners then you might need the extra table anyway as many:many relationship without the UNIQUE NOT NULL column and the column pair owner-dog UNIQUE NOT NULL instead. You can always start with the one UNIQUE NOT NULL and move to the other if things change.
In the olden days of newsgroups, we had this guy called -CELKO- who would pop up and say, "There is a design rule of thumb that says a relational table should model either an entity or a relationship between entities but never both." Not terribly formal but it is a good rule of thumb in my opinion.
Is 'owner' (person) really an attribute of a dog? It seems to me more like you want to model the relationship 'ownership' between a person and a dog.
Another useful rule of thumb is to avoid SQL nulls! Three-valued logic is confusing to most users and programmers, null behavior is inconsistent throughout the SQL Standard and (as sqlvogel points out) SQL DBMS vendors implementation things in different ways. The best way of modelling missing data is by the omission of tuple in a relvar (a.k.a. don't insert anything into your table!). For example, Fido is included in Dog but omitted from DogOwnership then according to the Closed World Assumption Fido sadly has no owner.
All this points to having two tables and no nullable columns.
I wouldn't do any extra table. If for some reason no nulls allowed (it's a good question why) - I would, and I know some solutions do the same, put instead of null some value, that can't be a real key. e.g NOT_SET or so.
hope it helps
A nullable column used for foreign key relationship is perfectly valid and used for scenarios exactly like yours.
Adding another table to connect the owners table with the dogs table will create a many to many relationship, unless a unique constraint is created on one of it's columns (dogs in your case).
Since you describe a one to many relationship, I would go with the first option, meaning having a nullable foreign key, since I find it more readable.

What is the importance of the 3nf

I've been given my assignment for college and one of the questions is to describe the importance of the 3NF.
I understand normalisation is to eliminate data redundancy.
Any help or resources would be of great help.
Normalization is important part of the database design, it is defined in 70's by E F Codd. As you already know, it reduces the duplication of data in a table(relation) but it keeps the referential integrity - the information is the same but presumably more optimized. It has a cost though - more tables, related with foreign key relationship. This usually adds abstraction on the database.
Since you need specifically for 3NF it should be ensured:
The relation R (table) is in second normal form (2NF)
Every non-prime attribute of R is non-transitively dependent on every superkey of R.
The table should be normalized first in First Normal Form (1NF) and Second Normal Form (2NF), so after that eventually in 3NF.
Also the row in the table should depend only "Nothing but the key". If contents of a group of field applies to more than one primary key, it should be put in another table.
For example, if you have employee table in a database that has the hometown of the employee you may have several duplicate rows with hometown New York. This hometown can be separated to another table hometown with primary key and column like name, related to the employee (through EmployeeHometown table), where only once would be listed New York (no duplicates).
It will be much easier, to check 5 hometowns on separate table, than to go through 100 employees, get their hometows.

Database Normalisation (1NF 2NF 3NF)

Currently I'm confused with the whole normalisation thing for databases.
Can anyone help me figure out how to go to 1NF following to 3NF? My 1NF version looks like this though not sure this is correct..:
http://imgur.com/i7JTcXw,qPMtPdq
The link contains both the UNF and my version of the 1NF table.
Having just looked here for the definitions :) :http://www.studytonight.com/dbms/database-normalization.php.
1nf requires that each row can reliably identified. In your table you do not have a clear primary key. Each row can be identified by flight number and part of the status fields (arrival or departure) and the scheduled time
I can see that your table violates 2nf because your status fields seem to contain multiple pieces of information and is not a of a single data type,ie it tells you 2 pieces of information: arrival/departure and the time. There is also an implied value in the actual status of 'Cancelled' which would not have an associated time.
3nf eliminates dependencies between fields that are not part of the primary key, in your case I would point the finger at the from and to fields: their values could be part of a lookup table as each flight number is normally dedicated to a particular route and as such repeating them in this table is unnecessary duplication. For example you seem to be going to 'Sidney,' but really you are going to 'Sidney' (no comma) so a query for all flights going to Sidney is going to find QF431.
Another reason for removing them is that as it stands the QF431 departure and destination airports could change between rows which could violate the rule that each flight number is unique to a flight path. With the current structure this rule could not be enforced by the dbms

database----database normalization

someone told me the following table isn't fit for the second database normalization. but i don't know why? i am a newbie of database design, i have read some tutorials of the 3NF. but to the 2NF and 3NF, i can't understand them well. expect someone can explain it for me. thank you,
+------------+-----------+-------------------+
pk pk row
+------------+-----------+-------------------+
A B C
+------------+-----------+-------------------+
A D C
+------------+-----------+-------------------+
A E C
+------------+-----------+-------------------+
Your question cannot be answered properly unless you state what dependencies are supposed to be satisfied here. You appear to have two attributes with the same name (pk), in which case this table doesn't even satisfy 1NF because it doesn't qualify as a relation.
About your example: that table doesn't fit the second database normalization (with your sample data, I presume that the C depends only on A). The second normalization form requires that:
No non-prime attribute in the table is functionally dependent on a
proper subset of a candidate key
(Wikipedia)
So the C depends on "A", which is a subset of your primary key. Your primary key is a special superkey. (dportas point out the fact that it can't be called candidate key, since it's not minimal).
Let's say more about the second normalization form. Transform your example a little for easy understanding, presume that there's a table CUSTOMER(customer_id, customer_name, address). A super key is a sub-set of your properties which uniquely determine a tube. In this case, there are 3 super key: (customer_id) ; (customer_id, customer_name) ; (customer_id, customer_name, address). (Customer name may be the same for 2 people)
In your case, you have determined (customer_id, customer_name) be the Primary Key. It violated the second form rules; since it only needs customer_id to determine uniquely a tube in your database. For the sake of theory accuration, the problem here raised from the choice of primary key(it's not a candidate key), though the same argument can be applied to show the redundance. You may find some useful example here.
The third normal form states that:
Every non-prime attribute is
non-transitively dependent on every
candidate key in the table
Let give it an example. Changing the previous table to fit the second form, now we have the table CUSTOMER(customer_id,customer_name, city, postal_code), with customer_id is primary key.
Clearly enough, "postal_code" depends on the "city" of customer. This is where it violated the third rule: postal_code depends on city, city depends on customer_id. That means postal_code transitively depends on customer_id, so that table doesn't fit the third normal form.
To correct it, we need to eliminate the transitive dependence. So we split the table into 2 table: CUSTOMER(customer_id, customer_name, city) and CITY(city, postal_code). This prevent the redundance of having too many tubes with the same city & postal_code.

Resources