I am reading the book database management systems by Ramakrishnan, and in the chapter related to schema refinement and normal forms, i saw a sentence saying:
K is a candidate key for R means that K ----> R , where R is the relation.
We also have the decomposition rule:
If X ---->YZ, then X----->Y and X----->Z
Then, my question is, for example let R=XABCDE and X be the key. Then, since X--->XABCDE, using the second rule repeatedly, we can say X-->A, X--->B, and so on. Then that means X determines all of the attributes. But i am confused here:Then we cannot have a row in the table such that for the same X value, there is a different A value. For example, let X be the id number of a person attribute, and A be the model of the car that person has. Then a person cannot have two cars, but we do not have such a constraint, it must be able to have two or more cars.
What am i doing wrong here? Can anyone help?
Thanks
For example, let X be the id number of a person attribute, and A be the model of the car that person has. Then a person cannot have two cars, but we do not have such a constraint, it must be able to have two or more cars.
What am i doing wrong here? Can anyone help?
You went wrong before you started normalizing R.
Part of the job of a database designer is to decide what the database is supposed to store. This has nothing to do with normalization. In textbook problems, this part is done before the problem is presented to you.
If you start with R{XABCDE}, where "X" is a person's ID number, and "A" is a kind of car, sample data for R might look like this.
person_id car_model B C D E
--
1 Buick Wildcat ...
2 Toyota Corolla ...
3 Honda Accord ...
Or it might look like this.
person_id car_model B C D E
--
1 Buick Wildcat, Nissan Sentra ...
2 Toyota Corolla ...
3 Honda Accord ...
Or it might look like this.
person_id car_model B C D E
--
1 Buick Wildcat ...
1 Nissan Sentra ...
2 Toyota Corolla ...
3 Honda Accord ...
The first example suggests that you want to store only one car per person. That's a defensible design decision (unless the database needs to know how many cars each person has). Universities rarely care how many cars you have; they just want to know which one is supposed to have a parking sticker.
Deciding what to store has nothing to do with normalization.
The other examples suggest that you want to store more than one car per person, in which case you need to do some normalization at the very least (in the second example) or reconsider your choice of primary key at the very least (in the third example).
Once you've decided what to store, you can start normalizing. Really, how could you start normalizing before you decide what to store? That would be impossible.
In relation R(XABCDE), if X is a key then for any value of X the relation only permits one value for A,B,C,D and E at any point in time. If that constraint doesn't match the reality you intended to model then maybe X was the wrong choice of key.
Related
Currently we have been assigned the project related to family relationships systems. We have to input data in the form of NAME1 RELATION NAME2 and for each instance of our input we have to analyse the gender of NAME1 from the RELATION it has with the other member.
Now that's not the problem we are facing. Currently we are facing the trouble to solving inter family relationships, let's suppose this data is entered :
A FATHER B
B BROTHER C
Now from here I want to make the computer identify the relationships between
A and C. I was thinking of doing it with a linear search but our instructor believes that it will be a very slow process in searching it linearly and thus advises us to do it using binary search or hash table.
Can anybody please help us on how to solve this out?
You can see all of the work I have done.https://github.com/Jorker22/project
Allocate an array for each person, every index will represent a relation , like index 0 = MOTHER, index 1 = FATHER, index 3 = SON, and insert the connection in the right index.
You will be able binary searching with the right indexing.
Example array a for A: a[FATHER]=B,a[UNCLE]=C array b for B: b[BROTHER]=C.
With some helping functions you need to update C as UNCLE when b[BROTHER]=C added.
Followed this intro youtube.com/playlist?list=PLea0WJq13cnAfCC0azrCyquCN_tPelJN1 to create the ontology. It is a little reduced http://prntscr.com/bo4l3w and I added canBeTutor (meaning somebody can become tutor for somebody) object property on my own. As far as I understand, I can add SWRL rules and then launch reasoner to create new knowledge. So I added prntscr.com/bo4lk7 . I started Hermit reasoner prntscr.com/bo4lqx . But obtained inconsistent ontologies warning prntscr.com/bo4lu0 . Clicked Explain button and got the following explanation http://prntscr.com/bo4lyg . My onto is here synoparser.ru/onto/protege.owl
1. Could you please tell what that mean?
2. Just for general understanding. I read that reasoner can create new knowledge. Does that mean just relations, or also individuals and classes?
3. Where can I find readoner added knowledge in Protege 5 ?
The explanation in one of the figures you provided explains the inconsistency. The ontology says that
the classes Student and Lecturer and disjoint (that is, that no individual can be both a Student and Lecturer)
the domain of studies is Student, which means that if x studies y, then x is a Student
the domain of firstname is Lecturer, which means that if x firstname y, then x is a Lecturer
Now, since Student1 has firstname Andrew, Student1 must be a Lecturer. Since Student1 studies cs101, then Student1 must be a Student. But Student and Lecturer are disjoint; no individual can be both. But Student1 is both. That's an inconsistency.
I have a problem where I have two relations, one containing attributes song_id, song_name, album_id, and the other containing album_id and album_name. I need to find the names of all the albums that do not have songs in the song relation. The problem is I can only use Rename, Projection, Selection, Grouping(with sum,min,max,count), Cartesian Product, and Natural join. I have spent a good amount of time working on this and would appreciate any help that pointed me in the right direction.
As #ErwinSmout pointed out, difference is a generally easy way to do it. But since you can't use it, there is a tricky workaround using counts. I'm assuming that every album_id present in the songs relation is also present in the albums relation.
PROJECT album_id from the songs relation (note that relational algebra's PROJECT is equivalent to SQL's SELECT DISTINCT). I'll call this relation song_albums. Now lets take the count of the albums relation, call this m, and take the count of the new table, call this n.
Take the Cartesian product of the albums relation and the song_albums relation. This new relation has m*n rows. Now if you do a count, grouped by album_name, each of the m album_name's will have a count of n. Not very helpful.
But now, we SELECT from the relation rows where albums.album_id != song_albums.album_id. Now, if you do a count grouped by album_name, the count for those albums that were not in the original songs relation will be n, while those that were originally in there will have a count less than n, since rows would have been removed based on how many songs with that album were in the original songs relation.
Edit: As it turns out, this isn't a strictly relational-algebra solution: In SQL, a 1 x 1 table, such as the one containing n can simply be treated as an integer and used in an equality comparison. However, according to Wikipedia, selection must make a comparison between either two attributes of a relation, or an attribute and a constant value.
Another obstacle which will be dealt with by another ill-recommended Cartesian product: we can take the Cartesian product of the 1 x 1 relation containing n with our most recent relation. Now we can make a proper relational-algebra selection since we have an attribute that is always equal to n.
Since this has gotten rather complex, here is a relational-algebra expression capturing the above english explanation:
Note that n is a 1 x 1 relation with an attribute named "count".
It's impossible. The problem includes a negation, and in relational algebra, that can only be epxressed using relational difference, which you're seemingly not allowed to use.
I'm curious to see what your teacher presents as the solution to this problem.
I was learning database normalization and join dependencies and
5NF. I had a hard time. Can anyone give me some practical examples of the multivalue dependency rule:
MVD3: (transitivity) If X ↠ Y and Y ↠ Z, then X ↠ (Z − Y).
Functional dependency / Normalization theory and the normal forms up to and including BCNF, were developed on the hypothesis of all data attributes (columns/types/...) being "atomic" in a certain sense. That "certain sense" has long been deprecated by now, but essentially it boiled down to the notion that "a single cell value in a table could not itself hold a multiplicity of values". Think, a textual CSV list of ISBN numbers, a table appearing as a value in a cell in a table (truly nested tables), ...
Now imagine an example with courses, professors, and study books used as course material. Imagine all of that modeled in a single 3-column table which says that "Professor (P) teaches course (C) and uses book (B) as course material." If there can be more than one book (B) used for any given course (Cn) and there can be more than one course (C) taught by any given professor (Pn) and there can be more than one professor (P) teaching any given course (Cn), then this table is clearly all-key (key is the full set of attributes {P,C,B} ).
This means that this table satisfies BCNF.
But now imagine that there is a rule to the effect that "the set of books used for any given course (Cn) must be the same, regardless of which professor teaches it.".
In the days when normalization was developed to the form in which it is now commonly known, it was not allowed to have table columns (relation attributes) that were themselves tables (relations). (Because such a design was considered a violation of 1NF, a notion which is now considered suspect.)
Imagine for a moment that we are indeed allowed to model relation attributes to be of type relation. Then we could model our 3-column table (/relation) as follows : "Professor (P) teaches course (C) and uses THE SET OF BOOKS (SB) as course material.". Attribute SB would no longer be an ISBN number, as in the previous and more obvious design, but it would be a (probably unary) RELATION holding the entire set of ISBN numbers. If we draw our design like that, and we then consider our rule that "all professors use the same set of books for the same course", then we see that this rule is now expressible as an FD from (C) to (SB) !!! And this means that we have a violation of a lower NF on our hand !!!
4 and 5 NF have arisen out of this kind of problems (where the appearance of a single attribute value -courseID (C)- causes a requirement for the appearance of A MULTITUDE of rows (multiple (B) ISBn numbers) being recognised quite early on, but without the solution that is currently regarded as the best (RVA's), being recognised as a valid one. So 4 and 5 NF were created "new and further normal forms", where the then-existing definitions of 2, 3 and BC NF were already sufficient for dealing with the situation at hand, provided RVA's had been recognised as a valid design approach.
To support that claim, let's look at what whould be done to eliminate the NF violation in our {P,C,SB} design with the FD C->SB :
We would split the table into two separate tables {P,C} and {C,SB} with keys {P,C} and {C}, repsectively. Both tables satisfy BCNF.
But we still have this SB attribute that holds a set of ISBN numbers. Dealing with this can be done by applying a technique like "UNGROUPING". Applying this to our {C,SB} table would get us a {C,B} table, where B are the ISBN book numbers (or whatever identifier you like to use in your database), and the key to the table is {C,B}. This is exactly the same design we would get if we eliminated the 4/5 NF violation !!!
You might also want to take a look at Multivalue Dependency violation?
How do you tell if a relation R is in BCNF and 3NF?
I'm reading a textbook, and it's telling me that there are 3 main attributes you're looking at, but I'm having trouble understanding what they're saying, or at least applying what they're saying when given a relation and FD's.
The 3 attributes:
Given a relation R with the attribute A, and X a subset of attributes of R, for every FD X⟶A in F, one of the following statements is true:
A ∈ X; that is, it is a trivial FD (∈ meaning "is found in X")
X is a superkey
A is part of some key for R
The top two correspond to BCNF, and 3NF's include the third.
The book SQL Antipatterns by Bill Karwin has a nice example about BCNF and 3NF on page 303 that is a little complicated but I believe points out the difference more succinctly than any description of the difference I've read so far.
For example, suppose we have three tag types: tags that describe the
impact of the bug, tags for the subsystem the bug affects, and tags
that describe the fix for the bug. We decide that each bug must have
at most one tag of a specific type. Our candidate key could be bug_id plus
tag, but it could also be bug_id plus tag_type. Either pair of
columns would be specific enough to address every row individually.
bug_id tag tag_type
------------------------
1234 crash impact
3456 printing subsystem
3456 crash impact
5678 report subsystem
5678 crash impact
5678 data fix
The book then changes this single table (which satisfies 3NF) into two tables that satisfy BCNF:
bug_id tag
----------
1234 crash
3456 printing
3456 crash
5678 report
5678 crash
5678 data
tag tag_type
------------------
crash impact
printing subsystem
report subsystem
data fix