Denormalize one or more tables

Denormalize one or more tables - database

I'm currently revising for an upcoming Database Management exam.
There is a question on denormalising the sample employees database. The question is as follows, the schema is also shown below:
Question:
Use the 'employees' database to de-normalise any two (or more) tables to produce a table in 2NF. you must precisely explain why the table is in 2NF and not 3NF.
Schema:
My Answer:
I would denormalise the 'salaries' table into the 'employees' table to. To normalise I would move {salary, from_date, to_date} into the employees table and remove the 'salaries' table. Note: from_date is no longer part of the primary key in 'employees'.
The 'employees' table is no longer in 3NF and is now in 2NF. This is because a 'transitive dependency' has been introduced into the table.
The transitive dependency is as follows: 'salary' depends on 'from_date'. It is transitive and not partial because 'from_date' is not a component of the primary key. In a partial dependency the determinant must be part of the primary key.
Basically for this question I need to create a transitive dependency. This schema seems a little sparse and I'm also a little thrown off by the fact that dates are part of primary keys.
If the above dependency is wrong could somebody possible point one out for me please?
Another possible solution is to denormalise 'departments' into 'dept_emp'. I could add the 'dept_name' into 'dept_emp'. But from looking at the SQL for this table shows 'dept_no' is part of the primary key.
Any guidance on this would be greatly appreciated.

Related

Database Normalization Process to 3NF CustomerRental for CustNo, PropNo, OwnerNo, etc

I am trying to normalize the following table. I want to go from the UNF form to 3NF form. I want to know, what do you do at the 1NF stage? It says it's where you remove the repetitive columns or groups (ex. ManagerID, ManagerName). This is considered repetitive because it's leads to the same data.
The Unnormalized data table has the following columns
CustomerRental(CustNo,CustName,PropNo,PAddress,RentStart,RentFinish,Rent,OwnerNo,OName)
There are no repeating columns/fields and each cell has a single value, but there is not a primary key. The functional dependencies I see in the table are:
{CustNo}->{Cname}
{PropNo}->{Paddress,RentStart,RentFinish,Rent,OwnerNo,Oname}
{CustNo,PropNo}->
{Paddress,RentStart,RentFinish,Rent,OwnerNo,OName,CustName}
{OwnerNo,PropNo}->{Rent,Paddress,Oname,RentInfo}
The primary key I picked was a composite key, CustNo + PropNo. Since it has a primary key, the table is in 1NF form, correct? This is what I thought, but the answer excludes CustNo and CustName from the table. They are in their own table.
From the above, I normalized it 2NF. At this stage, you are supposed to ensure that all non-prime attributes are fully dependent on the primary key. This is not the case. These are the functional dependencies in the table:
{OwnerNo}->{Oname}
{CustNo}->{CustName}
{PropNo}->{Paddress,Rent,OwnerNo,Oname}
I moved these values out of the table to create three new tables in 2NF form:
Customers(CustNo(PK),CustName)
Property(PropNo(PK),Paddress,City,Rent,OwnerNo,OwnerName)
Rentals(RentalNo(PK),CustNo,OwnerNo,PropNo,RentStart,RentFinish)
Now the main table, Rentals, is in 2NF form. It has a primary key, RentalNo, and each of the non-prime attributes depends on it.
I think that there is a transitive dependency on it. You can find OwnerNo through the PropNo. So, to make it comply with 3NF rules, you have to move the OwnerNo to its own table to create these tables:
Customers(CustNo,CustName)
Property(PropNo,Paddress,City,Rent)
Owners(OwnerNo,OwnerName)
Rentals(RentalNo,CustNo,PropNo,RentStart,RentFinish)
Is this correct? I read that at the 1NF stage, you are supposed to remove repetitive columns (ex. OwnerNo,OwnerName). Is this true? Why or why not?
The picture showing my tables is here:
Normalized Tables

We don't normalize to a NF (normal form) by going through lower NFs between it and 1NF. We use a proven algorithm for the NF we want. Find one in a published academic textbook. (If that doesn't describe the reference(s) you were told to use, find one that it does & quote it.)
Pay close attention to the terms and steps. Details matter. Eg you will need to know all the FDs (functional dependencies) that hold, not just some of them. Eg whenever some FDs hold, all the ones generated by Armstrong's axioms hold. Eg PKs (primary keys) are irrelevant, CKs (candidate keys) matter. Eg every table has a CK. Eg normalization to higher NFs does not change column names. So already your question does not reflect a correct process.
You really need to read & quote the reference(s) you were told to use in order to get to "1NF", because "1NF" is in the eye of the beholder. Normalization to higher NFs works on any relation.

Normalization. Looking for feedback on what should be 3NF

I am creating a database for a super market in my database class. The database needs to be at least 3nf but if possible, BCNF. Could someone let me know if this is satisfactory? I believe it is and I just want to make sure.

The values all look atomic, the only problem I see is in your System.orders table. 3NF should not store calculated values.
Here's a good topic page on calculated values.
The other one I'm not sure about is the lack of a primary key on System.orderItems. I've been told it's good practice for each table to have a Primary Key. On your relational table, you make both orderId and itemId a linked Primary Key.
Here's a good discussion on the need for a Primary Key.
Hope that helps.

Database - Trying to solidify understanding of normalization, how normalized is this?

how normalized is this table?
Example SqlFiddle
So I know that the topic and definition of normalization itself has been pretty well discussed but I was hoping I could get some clarification on my understanding of normalization. An example is a diagram I drew out in Access real quick, from what I think, I believe that these relationships and tables themselves all fit in the 3NF criteria. There is a Projects table with the following fields ProjNumber(PK), ProjName, and ProjDesc. Then there is an Assignments table with a compound key consistent of EmpID/ProjNumber, with the fields HourlyBillingRate, NumOfHours, and TeamNum. And lastly is the Teams table, which consists of the fields TeamNum(PK), TeamName, ProjNumber.
The ProjNumber from Assignments and Teams are both foreign keys that relate back to the Projects table, and the TeamNum field from Assignments is a foreign key relating back to the Teams table primary key. I'm not too sure if it's necessary to directly relate back to the Teams table, if I have the ProjNumber foreign key because that project would have an associated TeamNum.
The context of these tables is that there is a project that has to be done, a team associated with carrying out that team, and then the employees that are on that team which are paid an hourly billing rate for that proj they are working under.
The reason I use a compound key, is I wanted to answer the question of "What is the employee works on multiple projects?", so I couldn't make EmpID the sole primary key, thus I chose to make it a compound key because even if the employee works on multiple projects, the combination of the two will always be unique. I believe that each field is necessary and relevant fully with their respective primary keys.
Thoughts? Does it in fact fulfill 3NF criteria?

It depends. Your diagram and discussion appear to assume that the primary key is the only candidate key in each of the tables. That appears not to be the case.
In the Assignments table, it looks as though EmpID and TeamNumber is another candidate key, provided that TeamNumber may not be NULL.
If we look at this table with EmpId, TeamNumber as the key, then it is not in 2NF. ProjNumber is determined by TeamNumber, which is not the whole key.
So now the answer to your question turns on whether FDs are analyzed with respect to all candidate keys or just the declared prmary key. I have seen tutorials on on normalization that go both ways. I follow the one that considers all candidate keys, so the table is not in 2NF.
Unless I've misconstrued the FDs in your case, or Assigment.TeamNumber can be NULL.
HOWEVER, your SQL Fiddle presentation is different. Now, if there are several teams on one project, and an employee is assigned to one project for a few hours, there isn't any way to tell what team the employee was on. The FDs are not the same in the SQL Fiddle example and in the implicatins I take from your diagram.

How to implement this data structure in SQL tables

I have a problem that can be summarized as follow:
Assume that I am implementing an employee database. For each person depends on his position, different fields should be filled. So for example if the employee is a software engineer, I have the following columns:
Name
Family
Language
Technology
CanDevelopWeb
And if the employee is a business manager I have the following columns:
Name
Family
FieldOfExpertise
MaximumContractValue
BonusRate
And if the employee is a salesperson then some other columns and so on.
How can I implement this in database schema?
One way that I thought is to have some related tables:
CoreTable:
Name
Family
Type
And if type is one then the employee is a software developer and hence the remaining information should be in table SoftwareDeveloper:
Language
Technology
CanDevelopWeb
For business Managers I have another table with columns:
FieldOfExpertise
MaximumContractValue
BonusRate
The problem with this structure is that I am not sure how to make relationship between tables, as one table has relationship with several tables on one column.
How to enforce relational integrity?

There are a few schools of thought here.
(1) store nullable columns in a single table and only populate the relevant ones (check constraints can enforce integrity here). Some people don't like this because they are afraid of NULLs.
(2) your multi-table design where each type gets its own table. Tougher to enforce with DRI but probably trivial with application or trigger logic.
The only problem with either of those, is as soon as you add a new property (like CanReadUpsideDown), you have to make schema changes to accommodate for that - in (1) you need to add a new column and a new constraint, in (2) you need to add a new table if that represents a new "type" of employee.
(3) EAV, where you have a single table that stores property name and value pairs. You have less control over data integrity here, but you can certainly constraint the property names to certain strings. I wrote about this here:
What is so bad about EAV, anyway?

You are describing one ("class per table") of the 3 possible strategies for implementing the category (aka. inheritance, generalization, subclass) hierarchy.
The correct "propagation" of PK from the parent to child tables is naturally enforced by straightforward foreign keys between them, but ensuring both presence and the exclusivity of the child rows is another matter. It can be done (as noted in the link above), but the added complexity is probably not worth it and I'd generally recommend handling it at the application level.

I would add a field called EmployeeId in the EmployeeTable
I'd get rid of Type
For BusinessManager table and SoftwareDeveloper for example, I'll add EmployeeId
From here, you can then proceed to create Foreign Keys from BusinessManager, SoftwareDeveloper table to Employee

To further expand on your one way with the core table is to create a surrogate key based off an identity column. This will create a unique employee id for each employee (this will help you distinguish between employees with the same name as well).
The foreign keys preserve your referential integrity. You wouldn't necessarily need EmployeeTypeId as someone else mentioned as you could filter on existence in the SoftwareDeveloper or BusinessManagers tables. The column would instead act as a cached data point for easier querying.
You have to fill in the types in the below sample code and rename the foreign keys.
create table EmployeeType(
EmployeeTypeId
, EmployeeTypeName
, constraint PK_EmployeeType primary key (EmployeeTypeId)
)
create table Employees(
EmployeeId int identity(1,1)
, Name
, Family
, EmployeeTypeId
, constraint PK_Employees primary key (EmployeeId)
, constraint FK_blahblah foreign key (EmployeeTypeId) references EmployeeType(EmployeeTypeId)
)
create table SoftwareDeveloper(
EmployeeId
, Language
, Technology
, CanDevelopWeb
, constraint FK_blahblah foreign key (EmployeeId) references Employees(EmployeeId)
)
create table BusinessManagers(
EmployeeId
, FieldOfExpertise
, MaximumContractValue
, BonusRate
, constraint FK_blahblah foreign key (EmployeeId) references Employees(EmployeeId)
)

No existing SQL engine has solutions that make life easy on you in this situation.
Your problem is discussed at fairly large in "Practical Issues in Database Management", in the chapter on "entity subtyping". Commendable reading, not only for this particular chapter.
The proper solution, from a logical design perspective, would be similar to yours, but for the "type" column in the core table. You don't need that, since you can derive the 'type' from which non-core table the employee appears in.
What you need to look at is the business rules, aka data constraints, that will ensure the overall integrity (aka consistency) of the data (of course whether any of these actually apply is something your business users, not me, should tell you) :
Each named employee must have exactly one job, and thus some job detail somewhere. iow : (1) no named employees without any job detail whatsoever and (2) no named employees with >1 job detail.
(3) All job details must be for a named employee.
Of these, (3) is the only one you can implement declaratively if you are using an SQL engine. It's just a regular FK from the non-core tables to the core table.
(1) and (2) could be defined declaratively in standard SQL, using either CREATE ASSERTION or a CHECK CONSTRAINT involving references to other tables than the one the CHECK CONSTRAINT is defined on, but neither of those constructs are supported by any SQL engine I know.
One more thing about why [including] the 'type' column is a rather poor choice to make : it changes how constraint (3) must be formulated. For example, you can no longer say "all business managers must be named employees", but instead you'd have to say "all business managers are named employees whose type is <type here>". Iow, the "regular FK" to your core table has now become a reference to a VIEW on your core table, something you might want to declare as, say,
CREATE TABLE BUSMANS ... REFERENCES (SELECT ... FROM CORE WHERE TYPE='BM');
or
CREATE VIEW BM AS (SELECT ... FROM CORE WHERE TYPE='BM');
CREATE TABLE BUSMANS ... REFERENCES BM;
Once again something SQL doesn't allow you to do.

You can use all fields in the same table, but you'll need an extra table named Employee_Type (for example) and here you have to put Developer, Business Manager, ... of course with an unique ID. So your relation will be employee_type_id in Employee table.
Using PHP or ASP you can control what field you want to show depending the employee_type_id (or text) in a drop-down menu.

You are on the right track. You can set up PK/FK relationships from the general person table to each of the specialized tables. You should add a personID to all the tables to use for the relationship as you do not want to set up a relationship on name because it cannot be a PK as it is not unique. Also names change, they are a very poor choice for an FK relationship as a name change could cause many records to need to change. It is important to use separate tables rather than one because some of those things are in a one to many relationship. A Developer for instnce may have many differnt technologies and that sort of thing should NEVER be stored in a comma delimted list.
You could also set up trigger to enforce that records can only be added to a specialty table if the main record has a particular personType. However, be wary of doing this as you wil have peopl who change roles over time. Do you want to lose the history of wha the person knew when he was a developer when he gets promoted to a manager. Then if he decides to step back down to development (A frequent occurance) you would have to recreate his old record.

Tips for build foreign keys into a legacy database

I've got a database that doesn't have any foreign keys. I've done some checks and there are a a fair few orphaned records.
Its a pretty large database 500 + tables and I'm looking at the possibility of building the foreign keys back in.
Other than trawling though every single table over time?
Has anybody ever been through this process before and can maybe offer some insights or tips on how to make the process a little easier.
Any help advice appreciated.

I assume you mean "doesn't have any foreign key constraints"...if there were no foreign keys, you wouldn't know which records matched at all.
Do the primary and foreign key fields have the same name? As in, the PK table has a "CustomerId" field and the FK table(s) also have a "CustomerId" field? If so, you might be able to query the column properties (perhaps using INFORMATION_SCHEMA, you didn't mention an RDBMS) to figure out some implied relationships. Just query for all the tables that have a field called "CustomerId" that is not a PK and there's a good (but not certain) bet that those tables should have an FK constraint to the Customer table. You could even use the output of the query to generate the DDL to create the constraints.

You can work from the largest to smallest tables, or start with the least performant area of the database. Adding keys should help your performance significantly, but you'll have to resolve the orphan rows first. You may need input from the business for that. Expect them to be very confused about what's going on.