3nf functional dependency in wiki's example - database

I read the wiki about the 3nf
https://en.wikipedia.org/wiki/Third_normal_form
it is the example that wiki give
Tournament Winners
Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
it say that the non-prime attribute Winner Date of Birth is transitively dependent on the candidate key {Tournament, Year} via the non-prime attribute Winner
I think functional dependency is that
for two row X1 , X2 if X1.col1 = X2.col1 and
X1.col2 = X2.col2, then col1 -> col2
I cannot understand that Winner Date of Birth->Winner(there may be someone who have same birthday and same name?)
and winner can -> candidate key {Tournament, Year} given the winner name Al Fredrickson, it may be Indiana Invitational 1998 or Des Moines Masters 1999)
So, how does it jump to the conclusion?

Informally, a functional dependency means one value on the left side cannot produce multiple values on the right, even when the left side exists in more than one row.1
So, in Wikipedia example, there is a functional dependency Winner -> Winner Date of Birth, simply because the same winner cannot have different dates of birth even when he/she exists in multiple rows (because he/she won multiple tournaments).
Since...
{Tournament, Year} -> Winner (since one tournament cannot have multiple winners)
and Winner -> Winner Date of Birth (as explained above)
and not Winner -> {Tournament, Year} (since one person can win multiple tournaments)
...then by definition there is a transitive dependency.
I cannot understand that Winner Date of Birth->Winner(there may be someone who have same birthday and same name?)
You flipped the direction. The functional dependency is not "from" the single value, it's "toward" it. Therefore Winner -> Winner Date of Birth, but not Winner Date of Birth -> Winner.
BTW, There cannot be two people with different names in this model. A better (more realistic) model would probably use a surrogate key to identify people, allowing for duplicated names.
1 Which is compliant with the mathematical concept of "function". No matter how many times you "call" a function (i.e. how many rows contain the f.d. left side), it always produces the same "result" (the f.d. right side). If it could produce multiple results, it would not be a function, it would be a "relation".

From what I understand:
For any {Tournament, Year} you have only one winner. Each winner has only one date of birth. Wiki claims that this can lead to vulnerability:
Assume you have entered a new row: {"Stupid tournament", "2013", "Al Fredrickson", "21 July 2012"} - you've entered an incorrect date of birth!
If you keep another table {WinnerID, WinnerBithday}, you'll prevent that.

What if an entry comes for Same Winner with Different Date of Birth? It is possible then How to prevent them?
From the base
Because each row in the table needs to tell us who won a particular
Tournament in a particular Year, the composite key {Tournament, Year}
is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament, Year} is a candidate key for the table.
If relation R is going to add same Winner Name with different Date of Birth then it will create another Unique record for the table but it is should not be done. We need unique record but this shows same winner with two different Dates of Birth can be exist in a table.
Even if we think of Duplication of Dates of Birth (for winners) we can
split that table in another table and can store {winner,winner date of
birth} to prevent duplication like wiki has shown.
reference
as there is nothing to stop the same person from being shown with
different dates of birth on different records.
That's why it needs to create another table to prevent duplication.

Related

Can a relationship type attribute be a derived attribute?

Is it possible to have this? Assuming there are many attributes in the relations A and B
First we must remember that a derived attribute is a changing attribute that can be achieved by another, such as your age (current date - date of birth)
Yes, imagine this situation, you have the relationship of many to many students - course
We can say that a student will take a course that will start on January 3rd and last 8 weeks of effective classes, we can't put directly the attribute "end date" because this date could change (for example because of a strike, a pandemic) but we can put the start date and make the calculation.

iterating over multindex - a groupby.value_counts() object is only through values and not through original date index

i want to know the percent of males in the ER (emergency room) during days that i defined as over crowded days.
i have a DF named eda with rows repesenting each entry to the ER. a certain column states if the entry occurred in an over crowded day (1 means over crowded) and a certain column states the gender of the person who entered.
so far i managed to get a series of over crowded days as index and a sub-index representing gender and the number of entries in that gender.
i used this code :
eda[eda.over_crowd==1].groupby(eda[eda.over_crowd==1].index.date).gender.value_counts()
and got the following result:
my question is, what is the most 'pandas-ian' way to get the percent of males\females in general. or, how to continue from the point i stopped?
as can be shown in the bottom of the screenshot, when i iterate over the elements, each value is the male of female consecutively. i want to iterate over dates so i could somehow write a more clean loop that will produce another column of male percentage.
i found a pretty elegant solution. i'm sure there are more, but maybe it can help someone else.
so i defined a multi-index series with all dates and counts of females and males. then used .loc to operate on each count of all dates to get percentage of males at each day. finally i just extract only the days that apply for over_crowd==1.
temp=eda.groupby(eda.index.date).gender.value_counts()
crowding['male_percent']=np.divide(100*temp.loc[:,1],temp.loc[:,2]+temp.loc[:,1])
crowding.male_percent[crowding.over_crowd==1]

How to enforce unique 2-tuple on oracle table?

I am trying to enforce the property that table Match should have all unique tuples (Team 1, Team 2). However, let Team 1 = Detroit Pistons and Team 2 = Chicago Bulls. I do not want to allow (Detroit Pistons, Chicago Bulls) to be inserted into the table if (Chicago Bulls, Detroit Pistons) already exists.
How can I enforce this constraint?
A) The tuples are semantically identical. (I think this is your case.)
That means the tuple {Chicago Bulls, Detroit Pistons} means exactly the same thing as the tuple {Detroit Pistons, Chicago Bulls}. Use a CHECK constraint to impose an order on the two columns.
CHECK (column_1 < column_2)
That kind of constraint would allow {Chicago Bulls, Detroit Pistons}, but it would reject {Detroit Pistons, Chicago Bulls}. This is kind of like imposing a canonical form on otherwise free-form data.
B) The tuples are semantically distinct.
That means the tuple {Chicago Bulls, Detroit Pistons} means one thing, and the tuple {Detroit Pistons, Chicago Bulls} means something else. For example, the first attribute might mean "home team", and the second might mean "visiting team". In this case, all you need is some kind of unique constraint on the pair of columns.
You can create a unique function-based index:
CREATE UNIQUE INDEX unq_match ON match ( LEAST(team1,team2), GREATEST(team1,team2) );
LEAST() will get the "lesser" of the two teams (whether by ID or name, it doesn't matter) while GREATEST will get the "greater" of the two. Unfortunately this particular solution doesn't scale up to 3-or-more-tuples.

Is it possible to find 4 distinct functional dependencies in this table?

My professor gave a task to find 4 distinct functional dependencies in the following table:
Company(Company_Name, Street_Address, City, Zip, State, CEO_Name)
"He also gave a note: Each company has a different (unique) address meaning (Street_Address, City, Zip, State) together form a key. Different companies may have the same name. Each company has exactly one CEO, and one person cannot be the CEO of more than one company. CEO names may not be unique (there maybe 2 CEOs with the same name). To count 4 functional dependencies in a table with attributes (A, B, C, D): If A -> B then obviously A, C -> B as well. This should not count as 2 separate dependencies. On the other hand, A -> B and A -> C should be counted as 2 distinct functional dependencies."
But in my opinion, there are no 4 functional dependencies.
CEO, Company Name -> (Street_address, city, zip, state)
zip -> state
but since two companies can have the same name there should be also a primary key like "Company_Number". But creating knew tables is not the task...
Functional dependencies answer the question, "Give a single value for X, do we know one and only one value for Y?" Eitehr X or Y may be sets of attributes, not just a single attribute. Keep this in mind when you're reading through this answer.
Each company has a different (unique) address meaning (Street_Address, City, Zip, State) together form a key.
By definition, that key means that
Street_Address, City, Zip, State -> CEO_Name
Street_Address, City, Zip, State -> Company_Name
That's all the possible FDs for the candidate key {Street_Address, City, Zip, State}. Two of four--halfway home.
You identified {CEO_Name, Company_name} as the left-hand side of a functional dependency. In this particular case, you also identify it as a candidate key. Let's look at some made-up data.
Company_Name CEO_Name Street_Address City State Zip
--
Wibble, Inc. Mary Smith 123 E Main St Anytown PA 00001
Wibble, Inc. Mary Smith 456 S Darn St Sometown WY 10000
That data describes two different companies that happen to have the same name, having two different CEOs who happen to have the same name. This fits the description of the FDs, but clearly shows that {Company_Name, CEO_Name} is not a candidate key. The faked data also clearly shows that {Company_Name, CEO_Name} can't be the left-hand side of a functional dependency. Given a single value for {Company_Name, CEO_Name}, we don't have one and only one value for any of the other attributes.
Having eliminated the attributes Company_Name and CEO_Name as possibilities for the left-hand side, the only way to "manufacture" two more functional dependencies is to find them within the candidate key {Street_Address, City, Zip, State}. Not because there's anything special about the candidate key, but because those are the only attributes left.
My guess is that your teacher expects you to say
Zip -> City
Zip -> State
In the USA (in the "real" world), "Zip -> City, State" doesn't hold. ZIP codes have to do with how carriers drive their routes and deliver mail; ZIP codes aren't concerned with geography. A few cities (and ZIP codes) straddle state borders. Quite a lot of ZIP codes straddle adjoining cities within a single state. As the USPS cuts their budget, I expect the number of such ZIP codes to increase.
But in academia, this real-world behavior is often ignored for pedagogical reasons. That's why I'll bet your teacher expects {Zip -> City, State}.

Determining the functional dependencies of a relationship and their normal forms

I'm studying for a database test, and the study guide there are some (many) exercises of normalization of DB, and functional dependence, but the teacher did not make any similar exercise, so I would like someone help me understand this to attack the other 16 problems.
1) Given the following logical schema:
Relationship product_sales
POS Zone Agent Product_Code Qualification Quantity_Sold
123-A Zone-1 A-1 P1 8 80
123-A Zone-1 A-1 P1 3 30
123-A Zone-1 A-2 P2 3 30
456-B Zona-1 A-3 P1 2 20
456-B Zone-1 A-3 P3 5 50
789-C Zone-2 A-4 P4 2 20
Assuming that:
• Points of Sale are grouped into Zone.
• Each Point of Sale there are agents.
• Each agent operates in a single POS.
• Two agents of the same points of sale can not market the same product.
• For each product sold by an agent, it is assigned a Qualification depending on the product and
the quantity sold.
a) Indicate 4 functional dependencies present.
b) What is the normal form of this structure.
To get you started finding the 4 functional dependencies, think about which attributes depend on another attribute:
eg: does the Zone depend on the POS? (if so, POS -> Zone) or does the POS depend on the Zone? (in which case Zone -> POS).
Four of your five statements tell you something about the dependencies between attributes (or combinations of several attributes).
As for normalisation, there's a (relatively) clear tutorial here. The phrase "the key, the whole key, and nothing but the key" is also a good way to remember the 1st, 2nd and 3rd normal forms.
In your comment, you said
Well, According to the theory I've read I think it may be, but I have
many doubts: POS → Zone, {POS, Agent} → Zone, Agent → POS, {Agent,
Product_code, Quantity_Sold} → Qualification –
I think that's a good effort.
I think POS->Zone is right.
I don't think {POS, Agent} → Zone is quite right. If you look at the sample data, and you think about it a bit, I think you'll find that Agent->POS, and that Agent->Zone.
I don't think {Agent, Product_code, Quantity_Sold} → Qualification is quite right. The requirement states "For each product sold by an agent, it is assigned a Qualification depending on the product and the quantity sold." The important part of that is "a Qualification depending on the product and the quantity sold". Qualification depends on product and quantity, so {Product_code, Quantity}->Qualification. (Nothing in the requirement suggests to me that the qualification might be different for identical orders from two different agents.)
So based on your comment, I think you have these functional dependencies so far.
POS->Zone
Agent->POS
Agent->Zone
Product_code, Quantity->Qualification
But you're missing at least one that has a significant effect on determining keys. Here's the requirement.
Two agents of the same points of sale can not market the same product.
How do you express the functional dependency implied in that requirement?

Resources