Is it possible to find 4 distinct functional dependencies in this table? - database

My professor gave a task to find 4 distinct functional dependencies in the following table:
Company(Company_Name, Street_Address, City, Zip, State, CEO_Name)
"He also gave a note: Each company has a different (unique) address meaning (Street_Address, City, Zip, State) together form a key. Different companies may have the same name. Each company has exactly one CEO, and one person cannot be the CEO of more than one company. CEO names may not be unique (there maybe 2 CEOs with the same name). To count 4 functional dependencies in a table with attributes (A, B, C, D): If A -> B then obviously A, C -> B as well. This should not count as 2 separate dependencies. On the other hand, A -> B and A -> C should be counted as 2 distinct functional dependencies."
But in my opinion, there are no 4 functional dependencies.
CEO, Company Name -> (Street_address, city, zip, state)
zip -> state
but since two companies can have the same name there should be also a primary key like "Company_Number". But creating knew tables is not the task...

Functional dependencies answer the question, "Give a single value for X, do we know one and only one value for Y?" Eitehr X or Y may be sets of attributes, not just a single attribute. Keep this in mind when you're reading through this answer.
Each company has a different (unique) address meaning (Street_Address, City, Zip, State) together form a key.
By definition, that key means that
Street_Address, City, Zip, State -> CEO_Name
Street_Address, City, Zip, State -> Company_Name
That's all the possible FDs for the candidate key {Street_Address, City, Zip, State}. Two of four--halfway home.
You identified {CEO_Name, Company_name} as the left-hand side of a functional dependency. In this particular case, you also identify it as a candidate key. Let's look at some made-up data.
Company_Name CEO_Name Street_Address City State Zip
--
Wibble, Inc. Mary Smith 123 E Main St Anytown PA 00001
Wibble, Inc. Mary Smith 456 S Darn St Sometown WY 10000
That data describes two different companies that happen to have the same name, having two different CEOs who happen to have the same name. This fits the description of the FDs, but clearly shows that {Company_Name, CEO_Name} is not a candidate key. The faked data also clearly shows that {Company_Name, CEO_Name} can't be the left-hand side of a functional dependency. Given a single value for {Company_Name, CEO_Name}, we don't have one and only one value for any of the other attributes.
Having eliminated the attributes Company_Name and CEO_Name as possibilities for the left-hand side, the only way to "manufacture" two more functional dependencies is to find them within the candidate key {Street_Address, City, Zip, State}. Not because there's anything special about the candidate key, but because those are the only attributes left.
My guess is that your teacher expects you to say
Zip -> City
Zip -> State
In the USA (in the "real" world), "Zip -> City, State" doesn't hold. ZIP codes have to do with how carriers drive their routes and deliver mail; ZIP codes aren't concerned with geography. A few cities (and ZIP codes) straddle state borders. Quite a lot of ZIP codes straddle adjoining cities within a single state. As the USPS cuts their budget, I expect the number of such ZIP codes to increase.
But in academia, this real-world behavior is often ignored for pedagogical reasons. That's why I'll bet your teacher expects {Zip -> City, State}.

Related

How to normalize the schema to BCNF

I am having some issues with normalization. I have a schema REPAYMENT which looks like this:
Now, from what I've gathered the functional dependencies that hold in the schema is
{borrower_id} --> {name, address, request_date, loan_amount}
{request_date} --> {repayment_date, loan_amount}
{loan_amount] --> {repayment_amount}
(correct me if I'm wrong?)
I'm supposed to normalise the schema to BCNF, but I'm a bit confused. Is the candidate key request_date and borrower_id?
It can be used to register information on the re- payments on micro loans. A borrower, his name and address, are identified with an unique borrower_id. Borrowers can have multiple loans at the same time, but each of those loans ( specified by loan_amount, repayment_date and repayment_amount) have a different re- quest date. Thus a loan can be identified with the borrower ID and the request date of the loan. The borrower can repay multiple (different) loans on the same date, but each loan can only be repaid once (on one date with one amount). There is a system which for each request date and amount of a loan determines the repayment date and amount to be repaid. The loan amount requested and the repaid amount are not the same since there is an interest rate that applies.
From the definition of candidate key:
In the relational model of databases, a candidate key of a relation is
a minimal superkey for that relation; that is, a set of attributes
such that:
The relation does not have two distinct tuples (i.e. rows or records in common database language) with the same values for these
attributes (which means that the set of attributes is a superkey)
There is no proper subset of these attributes for which (1) holds (which means that the set is minimal).
Now your question :
Is the candidate key request_date and borrower_id?
It is a superkey, but not minimal one. Here's how we compute the candidate key.
Which attribute occurs only on the left side, considering all the F . D's ?
ITS borrower_id.This means that it must be a part of every key of this given schema. Now let us compute its closure.
Because of {borrower_id} --> {name, address, request_date, loan_amount}:
closure(borrower_id) = borrower_id, name, address, request_date, loan_amount.
Because of {request_date} --> {repayment_date, loan_amount} and closure(borrower_id) has request_date, this means
closure(borrower_id) = borrower_id, name, address, request_date, loan_amount, repayment_date
And finally because of {loan_amount] --> {repayment_amount} and closure(borrower_id) has loan_amount, this means
closure(borrower_id) = borrower_id, name, address, request_date, loan_amount, repayment_date, repayment_amount
Because closure of borrower_id contains all the attributes, borrower_id is a key and since it is minimal, it is indeed the candidate key and the only one.
Now let us decompose the schema into BCNF. The algorithm is:
Given a schema R.
Compute keys for R.
Repeat until all relations are in BCNF.
Pick any R' having a F.D A --> B that violates BCNF.
Decompose R' into R1(A,B) and R2(A,Rest of attributes).
Compute F.D's for R1 and R2.
Compute keys for R1 and R2.
Since {request_date} --> {repayment_date, loan_amount} and request_date is not a key, it violates BCNF so we split schema into two relations:
R1(request_date,repayment_date,loan_amount)
R2(borrower_id,name,address,request_date,repayment_amount)
Clearly R1 is in BCNF. But R2 is NOT in BCNF , because we missed the following F.D. which is:
address --> name
and we know address is not the key, so we split the R2 further as:
R3(borrower_id,address,request_date,repayment_amount)
R4(address,name)
Now, clearly both R3 and R4 are in BCNF. Had we not split the R2 further, we end up storing the same combination of address and name for every loan the person takes, which is redundancy.

Is it good practice to assign ranges to userid?

I'm building a database schema for users of my app, and I am thinking of setting the userid value according to user type. So,
buyers: 10001 to 19999
sellers: 20001 to 29999
shippers: 30001 to 39999
Next, I assign unique email addresses to the userid:
Login_table
Email.......password.......userid
aaaaa#yy.com....... password.......10005 ---> this email belong to user 10005 (a buyer)
bbbbb#yy.com.......password.......20008 ---> this email belongs to user 20008 (a seller)
ccccc#yy.com.......password.......30187 ---> this email belongs to user 30187 (a shipper)
I then have 3 tables for buyers, sellers, and shippers because each may have different attributes:
buyer_table
buyerid.......name....... mother
10005....... John....... Mary
10006 ....... Chris....... Nancy
seller_table
sellerid....... name....... pet
20008 ....... Adam....... Dog
20018 ....... Tony ....... cat
shipper_table
shipperid....... name....... car
30187....... George....... GMC
30188 ....... Larry ....... Honda
The advantage here is that I have a single login_table for all user types. I do not want to have 3 login tables for each type. Based on the userid value I know what type of user it is. Keeping three tables for each user (buyer_table, seller_table, and shipper_table) is good for making the schema more understandable, in addition to being able to assign different attributes to each user type.
Sounds good? Maybe.
However, I have a problem in that the login_table refers to “userid” while the three user tables each has a different id name for the user, so in the buyer_table I have buyerid as primary key, in the seller_table it is sellerid as primary key, and finally in the shipper_table, the shipperid is the primary key.
How can I link these three primary keys to the login_table? The login_table has userid as a foreign key to one of those three tables, but it is called “userid”, not buyerid, or sellerid, or shipperid!
1) Is it a good idea to classify the userid value according to ranges?
2) If so, how can I resolve the PK-FK issue as described above?
3) Am I off completely?
Having ranges of values for different kinds of similar objects is not bad. If you feel like doing so, you could use sequences wich support value ranges. This way, you could have a buyer sequence wich goes from 0-1000, a seller one from 1001 to 2000 and so on. That would also help you keeping track of the increasing index of the different kinds!

The optimal way to enter multiple addresses for the same record within a form?

So I've been developing a sort of data entry platform within accessing using forms and subforms.
I have a form titled PHYSICIAN. Each physician will have basic data like first/last name, DOB, title, contract dates, etc. The aspect I'm wanting to cover is addresses as they may have multiple, since they may work/practice at 2 or 3 or even 10 different locations.
Instead of having our data entry team key in a full record each time they need to add an address, I'd like a way for the form to retain ALL information not related to the address.
So if Ken Bone works at 7 places, I want to allow them to key all of those addresses a bit more efficiently than creating a new record.
There's one main issue I'm running into --- A subform or autopopulate option doesn't necessarily increment the autonumber ID (primary key) for the record. All of the information is being stored in 1 master table.
Is there a way around this or a more logical approach that you folks might suggest?
I recommend that you have a couple of tables perhaps even three.
tblDoctorInfo
- Dr_ID
- Name
- DOB
- Title
tblAddresses
- AddressID
- Address1
- Address2
- City
- State
- Zip
- Country
tblDr_Sites
- DrSites_ID
- Dr_ID
- AddressID
The tables might have data like this.
tblDoctorInfo
1, Bob Smith, 12/3/1989, Owner
2, Carl Jones, 1/2/1977, CEO
3, Carla Smith, 5/3/1980, ER Surgeon
tblAddresses
1, 123 Elm St, Fridley, MN 55038
2, 234 7th St, Brookdale, MN 55412
3, 345 Parl Ave, Clinton, MN 55132
tblDr_Sites
Then you could associate the tables with the third table. (Note each of the three tables have an ID field that increments).
1,1,1 This record means Dr. Bob works in Fridley
2,1,2 This record means Dr. Bob works in Brookdale
3,3,1 This record means Dr. Carla works in Fridley
4,2,3 This record means Dr. Carl works in Clinton
5,2,2 This record means Dr. Carl works in Brookdale
6,2,1 This record means Dr. Carl works in Fridley

Determining the functional dependencies of a relationship and their normal forms

I'm studying for a database test, and the study guide there are some (many) exercises of normalization of DB, and functional dependence, but the teacher did not make any similar exercise, so I would like someone help me understand this to attack the other 16 problems.
1) Given the following logical schema:
Relationship product_sales
POS Zone Agent Product_Code Qualification Quantity_Sold
123-A Zone-1 A-1 P1 8 80
123-A Zone-1 A-1 P1 3 30
123-A Zone-1 A-2 P2 3 30
456-B Zona-1 A-3 P1 2 20
456-B Zone-1 A-3 P3 5 50
789-C Zone-2 A-4 P4 2 20
Assuming that:
• Points of Sale are grouped into Zone.
• Each Point of Sale there are agents.
• Each agent operates in a single POS.
• Two agents of the same points of sale can not market the same product.
• For each product sold by an agent, it is assigned a Qualification depending on the product and
the quantity sold.
a) Indicate 4 functional dependencies present.
b) What is the normal form of this structure.
To get you started finding the 4 functional dependencies, think about which attributes depend on another attribute:
eg: does the Zone depend on the POS? (if so, POS -> Zone) or does the POS depend on the Zone? (in which case Zone -> POS).
Four of your five statements tell you something about the dependencies between attributes (or combinations of several attributes).
As for normalisation, there's a (relatively) clear tutorial here. The phrase "the key, the whole key, and nothing but the key" is also a good way to remember the 1st, 2nd and 3rd normal forms.
In your comment, you said
Well, According to the theory I've read I think it may be, but I have
many doubts: POS → Zone, {POS, Agent} → Zone, Agent → POS, {Agent,
Product_code, Quantity_Sold} → Qualification –
I think that's a good effort.
I think POS->Zone is right.
I don't think {POS, Agent} → Zone is quite right. If you look at the sample data, and you think about it a bit, I think you'll find that Agent->POS, and that Agent->Zone.
I don't think {Agent, Product_code, Quantity_Sold} → Qualification is quite right. The requirement states "For each product sold by an agent, it is assigned a Qualification depending on the product and the quantity sold." The important part of that is "a Qualification depending on the product and the quantity sold". Qualification depends on product and quantity, so {Product_code, Quantity}->Qualification. (Nothing in the requirement suggests to me that the qualification might be different for identical orders from two different agents.)
So based on your comment, I think you have these functional dependencies so far.
POS->Zone
Agent->POS
Agent->Zone
Product_code, Quantity->Qualification
But you're missing at least one that has a significant effect on determining keys. Here's the requirement.
Two agents of the same points of sale can not market the same product.
How do you express the functional dependency implied in that requirement?

How many address fields would you use for a UK database?

Address records are probably used in most database, but I've seen a number of slightly different sets of fields used to store them. The number of fields seems to vary from 3-7, and sometimes all fields are simple labelled address1..addressN, other times given specific meaning (town, city, etc).
This is UK specific, though I'm open to comments about the rest of the world too. Here you need the first line of the address (actually just the number) and the post code to identify the address - everything else is mostly an added bonus.
I'm currently favouring:
Address 1
Address 2
Address 3
Town
County
Post Code
We could add Country if we ever needed it (unlikely).
What do you think? Is this too little, too much?
The Post Office suggests (http://www.postoffice.co.uk/portal/po/content1?catId=19100182&mediaId=19100267) 7 lines:
Addressees Name
Company/Organisation
Building Name
Number of building and name of thoroughfare
Locality Name
Post Town
Post Code
They then say you do not need to include a County name provided the Post Town and Postcode are used.
The BSI have BS 7666 - that covers all addressing. I recommend you look there.
The 2000 version recommends
An address shall be based upon a logical data model comprising the following entities:
addressable object, with sub-types:
primary addressable object;
secondary addressable object;
street;
locality;
town;
administrative area, a.k.a. district;
county;
postcode.
See: http://landregistry.data.gov.uk/def/common/BS7666Address
I don't know whether this is minimal (I doubt it) but the heading on my cheque book says something pretty close to:
Lloyds TSB
Isle of Man Offshore Centre
Peveril Buildings
Peveril Square
Douglas
Isle of Man
IM99 0XX
United Kingdom
This causes fits when I try to enter it into the US banking system.
If I were you, I'd call Royal Mail and ask them... or look on their website for postcode lookup as a best practice.
There's different types of addresses, and each different type has a slightly different structure. Forward sorting offices have a different postal address structure than a residential home with a street number. What if the house has a name instead of a number? There are so many factors to consider.
Since I moved to Canada I had to do something similar and it's far more complicated than a straightforward residential address which generally has:
Street Number if applicable
Street Number Suffix if applicable
House Name
Street Name
Street Type
Street Direction if applicable
Unit Number for flats, townhouses or other types of building/location
Minor Municipality (Village)
Major Municipality (Major Town/City)
County
PostCode
Country if you include Scotland, Wales, Northern Ireland (and now I noticed Eire)
Then you get businesses that have their own Delivery Route, PO Boxes, Forward Sortation Offices...
It gets complicated in a real hurry.
Best bet - give Royal Mail a call and they should be able to give you information on their standard address templates.
EDIT: Your 3 field method isn't a bad one...particularly. However, data sanitization may be a significant issue using the field setup you have and you may need a fairly complex strategy for making sure that the address entered is valid. It's far easier to sanitize single dedicated fields to make sure input is correct than it is to parse various address tokens out of combined fields.
Another simpler way to gain this info is to go on the Royal Mail website and check their postcode lookup page.
On their main postcode lookup, they use 4 fields and I guess they have some form of validation on the street name/type field. They separate the house number and name and I guess they only allow major municipality. I'm assuming the county/country are assumed. If you break out their advanced search, they give you two extra fields for flat number and business name.
Given that some fields are combined on their site, you have to assume that there's some amount of validation to make sure that data entered can be gainfully used.
Premises elements
Sub Building Name
Building Name
Building Number
Organisation Name
Department Name
PO Box Number
Thoroughfare elements
Dependent Thoroughfare Name
Dependent Thoroughfare Descriptor
Thoroughfare Name
Thoroughfare Descriptor
Locality elements
Double Dependent Locality
Dependent Locality
Post Town
Postcode element
Postcode
This answer may be a few years late, but it's aimed at those like myself looking for guidance on how to correctly format postal addresses for both storing in a database (or the likes of it) and for printing purposes.
Taken from Royal Mail Doc, link below - conveniently titled the 'Programmers Guide'
Technical specififcation for users of PAF
Page 27 - 42 was most helpful for me.
It's very likely that a "UK" will be opened to Eire as well, and in some lines of business there will be legal differences, generally between Scotland / NI / the channel islands and England and Wales.
In short, I would add country to the list. Otherwise it's fine (no fewer certainly), though of course any address is traceable from a building reference, a post code and a country alone.
Where we live in France its just 3 lines:-
myname
village/location name
6 digit postcode followed by post town name in uppercase
Even from UK that's all that is required

Resources