I want to know which normal form a canonical cover is in. I know before doing normalization we find the canonical cover so I think it is in first normal form. But it may be in no normal form as the definition of 1NF by Wikipedia is no row should have a duplicate:
First normal form enforces below criteria:
Eliminate repeating groups in individual tables.
Create a separate table for each set of related data.
Identify each set of related data with a primary key
Normal forms apply to relations (values and variables), not canonical covers. A cover is a set of FDs (functional dependencies) from which all the FDs in a relation follow. A canonical cover is a cover in a certain form. If you have a canonical cover and a relation's attributes then you can find what normal form the relation is in.
No row can have a duplicate by definition of "relation". A relation has a set of rows. (SQL tables, though, can have duplicate rows.) No column/attribute can be "multivalued" or a "repeating group". An attribute has a value.
Wikipedia has a lot of nonsense in its relational articles including the one you quote. But from this answer by me:
By definition, a relation's tuple's attribute has a value from a domain. Re: "repeating groups": It can't have any, that's something from pre-relational databases. Re "non-atomic": Codd defined relations as able to have relation-valued domains. He pointed out that the only way that a value could be considered (in the everyday sense) non-atomic in a relational context was to be relation-valued. Ie he defined "atomic" in a relational context to mean not a relation. He defined "normalized" to mean having no relation-valued (ie non-atomic) attributes. (All this in 1970.) Later he defined "1NF" as normalized. And developed "2NF" & "3NF". Then (after Kent & with Boyce) "BCNF". So his use of these terms assumed no relation-valued domains.
But normalization theory is presented independent of domains. Ie it's considered to just be decomposition per problematic JDs [join dependencies]. So "1NF" also gets used for just being a relation. And the other "NFs" get used without regard to domains. (Although if there are relation-valued domains then there can be constraints different from but similar to FDs & JDs that cause different but similar anomalies, and that cause constraints and anomalies in components even after decomposition per all problematic JDs.) Regardless of whether a relation has relation-valued domains and regardless of what one means by "1NF" or "normalized" or "normalization", the decomposition procedure you're following to remove problematic JDs from problematic FDs to what you're calling [a ]NF is independent of domains.
From another:
There is no such thing as a "multivalued attribute" in a relation. A tuple has an attribute value for each attribute name. [...] If you have an attribute that you consider to contain multiple parts, ie you want to generically query about the parts without using operators with parameters of their types, then it is usually good design to have a separate table with attributes for those parts. But that's not addressed by normalization. Any value can be considered to have multiple parts in multiple ways and it is your application/queries that determine when you stop making tables whose attributes are the values of parts of other values and just have an attribute for a value. Similarly, if you have a bunch of attributes that play a similar role (often with similar names) then it is usually good design to have a separate table with just one attribute for the role. But that's not addressed by normalization.
Candidate keys matter to FDs, MVDs, JDs and normalization. PKs [primary keys] don't. You can pick one CK as "the PK" but its primariness is irrelevant to the relational model. It might be relevant to some information modeling method or product.
If you have particular concerns about your relation(s), attributes, attribute types, canonical cover or normalization process then you should explain in your question.
Related
This is more of a notation and 'proper procedure' type of question than anything.
Please see below an image of a few relations in my Enhanced ERD logical model. A patient can be an OUTPATIENT or a RESIDENT, but there are no attributes which are specific to OUTPATIENTS or RESIDENTS. There are relationships which are specific to the subtypes though, as only OUTPATIENTS can be associated with visits and only RESIDENTs can be associated with beds.
I am in the process of converting this to a physical data model. Obviously it makes sense to not have OUTPATIENT or RESIDENT tables and only a PATIENT table which contains a discriminator for the type of patient.
But what is the proper way to model this?
How do I now model the relationships to VISITS and BEDS while still maintaining the constraint that the discriminator must be of a certain value to qualify for those relationships?
Do I just forget about representing this constraint in the physical data model and make sure its implemented in the code when the tables are created?
Or is there a notation for physical data models which represents this type of constraint?
Section of CareCenter schema in Extended ERD
I have done much searching and cannot seem to find anything about this. All of the material I have found talks about creating subtypes for the purpose of isolating attributes specific to a subtype and not relationships specific to a subtype.
Advice or reference to data you have found that I was not able to is greatly appreciated!
(If you are really trying to make sense of my section of EERD it may be helpful to know that PATIENT is a subtype of a PERSON supertype.)
1 Modelling & Notation
1.1 ERD
is pre-Relational, 1960’s. It cannot handle Relational Keys, which means it is hopeless for Relational Data Modelling. In the Relational paradigm, the Relational Key (which is composite) is central, therefore the identity of each entity cannot be analysed, or modelled, or defined, in ERD.
There is no definition in ERD for the Relational concepts of Independent/Dependent tables, or Identifying/Non-Identifying relations, as it is meaningless without a Relational Key, which leads to much confusion when extending ERD and attempting to add those. Further, as you have found, it has no notation of Domain/Datatype; Subtype; etc.
ERD never was a Standard. Since it is un-useable, each person who attempts to use it for an SQL implementation has to “extend” ERD, and that results in a million notations, all of which are different and incomplete. And which have to be explained to the reader. Whereas a Standard needs no explanation because it is complete and documented, once.
Technically, ERD is not a model (which implies a mathematical, logical basis). The semantics are primitive and nowhere near complete. In fact, it is hopeless for modelling, period, even for pre-Relational filing systems.
1.2 IDEF1X
is the Standard for Relational Data Modelling, available since the 1980's, a Standard since 1993. As such it is complete, whereas an extended ERD will never be complete, no matter how much you extend it.
The academics and authors of "textbooks" are clueless: as evidenced, they are 50 years behind the industry (definition) and 40 years behind (implementation on SQL platforms). They are stuck in 1960's Record Filing Systems, which is physical, characterised by a RecordID, and they market it as "relational".
Whereas Codd's Relational Model is completely logical, with a mathematical foundation, and provides far more Integrity; Power; and Speed.
To use ERD at all, you have to extend it, using some private notation, as you have done. Instead of moving incrementally and painfully in the direction of IDEF1X, I suggest you just switch to it, and obtain the full benefit. You may find this IDEF1X Introduction useful.
1.3 Logical vs Physical Data Model
There is a lot of nonsense written about the distinction.
The Logical model simply progresses, in iterations, to the point where it is stable, and then it is the Physical, which can be implemented on a specific SQL platform. That is, there is no “convert” process.
In good Data Modelling tools, such as ERwin, it is one file, not two or three, and the Logical vs Physical is simply different views of that one file. Eg. Domain in the Logical is DataType in the Physical. The Physical is of course specific to the target platform, eg. BOOLEAN in one is BIT in another. If you are not using a Data Modelling tool, or using a poor one, sure, you will have separate files and you have to deal with the attendant synchronisation problems.
But what is the proper way to model this? How do I now model the relationships to visits and beds while still maintaining the constraint that the discriminator must be of a certain value to qualify for those relationships?
In this regard, the question is not about Logical vs Physical DM, all aspects re the question are implemented in both.
Yes, it is about notation. There is no notation problem, or difference (Logical vs Physical) in IDEF1X, because it is complete.
Do I just forget about representing this constraint in the physical data model
No, they are drawn in both, they are implemented in the DDL.
and make sure its implemented in the code when the tables are created?
If you use a Data Modelling tool, it squirts out SQL that is specific to the target platform. Otherwise, sure, you have to write your own DDL and make sure it is correct. In any case, the SQL is the same (not counting the difference in SQL flavours).
Caveat. The pretend SQLs (all freeware “sqls” and Oracle) are not SQL compliant, their use of the term is not correct. They cannot implement ordinary SQL features such as Constraints for Subtypes or ACID Transactions; etc.
Or is there a notation for physical data models which represents this type of constraint?
No, there is no difference in the notation in IDFE1X. Your question appears to be due to your extensions to ERD. First, the ERD is not useable for Relational data modelling, and cannot cope with Relational Keys or Subtypes. Second, your extensions, good as they may be, do not have the ordinary Relational notation that IDEF1X has. Again, just switch to IDEF1X.
2 Codd’s Relational Model
As distinct from the variety of primitive nonsense written by the academics and in textbooks, misleadingly marketed as “relational”.
2.1 Subtype
I have done much searching and cannot seem to find anything about this. All of the material I have found talks about creating subtypes for the purpose of isolating attributes specific to a subtype and not relationships specific to a subtype.
There is no problem at all with a Subtype that has no attributes, same as there is no problem at all with a row that has no attributes. Keep in mind that each entity is a Fact (one fact in one place), and the Fact is established by the Relational Key, to which the attributes are quite secondary (Codd’s 3NF properly understood). Thus Resident and OutPatient are discrete Facts, whether each Subtype has attributes or not; whether the Fact exists for supporting a Foreign Key or not, is a separate issue.
Advice or reference to data you have found that I was not able to is greatly appreciated
You may find this Subtype document useful. For examples, go to my profile, and look up any answers that interest you.
If you require even further detail, there is a long discourse regarding Subtypes and notation, that I had with the single academic who is trying to cross the great chasm between academia and reality in this field, who recently "found" IDEF1X from my data models. I use a corrected form of IDEF1X (it was written by an academic), using the pre-existing IEEE notation when it is more precise. The discourse goes into the whys and wherefores of the original IDEF1X vs the corrected form. It is long at 70 posts, and there is a document that summarises it. Just ask.
Obviously it makes sense to not have OUTPATIENT or RESIDENT tables and only a PATIENT table which contains a discriminator for the type of patient.
No. Each Subtype is a separate table, in the Logical models (first) and Physical (last), and the DDL. The physical is merely the implementation level of the Logical, you should not have anything in the Physical that is not in the Logical (you do not want to implement a thing that is not logical, not semantic; not Relational (which is absolutely logical, and unlimited).
Consider that the database may be expanded in the future, and you may have attributes in the Subtypes.
- If the cluster is Exclusive, the Basetype table must have a Discriminator.
- If it is Non-Exclusive, there is no Discriminator.
Supertype means something quite different, the academics use terms loosely and incorrectly. Eg. the notion of Superkey is hysterical, and anti-Relational.
2.2 Data Model
Here is the logical model in IDEF1X notation, showing attributes, not domains.
I have corrected a few errors: given the level of modelling that you have demonstrated, I don't think they need a full explanation.
Person Subtype is Non-Exclusive (no Discriminator)
Patient Subtype is Exclusive (needs a Discriminator)
That is to be used in your code to determine the Subtype, otherwise JOIN to the Subtype
Since Resident::Bed is 1::1, the attributes (Bed FK) can be located in Resident.
This treatment ensures that the Bed that a Patient may be assigned to, exists.
Consider:
When an OutPatient visits the CareCenter, is not the purpose to obtain a treatment of some kind, which must be recorded ?
Is not the treatment obtained under a Physician’s control, and shouldn’t the treatment details be recorded ?
Therefore an OutPatient obtains a Treatment, same as a Resident, and it is common, in the Basetype.
Visit can be eliminated
(again, whether the treatment is received by a Resident or OutPatient regards the Subtype).
The data model in a PDF.
2.3 Predicate
The Predicates can be read directly from the graphic model, the evaluation of such provides an excellent feedback loop to the modelling process. Please read them and verify.
Eg. the Predicate Each Bed accommodates 0-to-n Residents would cause a brawl that can be avoided.
Again, the academics and authors do not understand the Relational Model, and thus they are clueless about Predicates. For a good introduction, refer to Relational Table Naming Convention, the Relationship, Verb Phrase section at the top, and the Predicate section at the end.
2.4 Null
Nulls in a Relational database are a clear indication of a Normalisation error. I have removed them.
3 Outstanding
The academics and authors understand only 1960's physical Record Filing Systems (placed in an SQL container for convenience), thus they understand only Referential Integrity. They do not understand Codd's Relational Model, thus they cannot understand, and they cannot teach, Relational Integrity, which is logical, and provides far more data integrity than 50-year-obsolete filing systems.
Your model allows any Physician to treat any Patient, which is typical for a RFS, if you follow the literature, but sub-normal for Relational.
I doubt that that is what you want in a database. I think you want only the treating Physician, the ProviderNo to treat the Patient.
As the model progresses, you may wish to ensure that a Bed is assigned to one Resident only. I didn’t model it because I need to be told: is admission and bed assignment two administrative steps or one ?
Do you not require lookup tables for Speciality and TreatmentName ?
Data Modelling is an iterative exercise: it is only when a model is erected, and contemplated, that the issues are exposed, which leads to the next iteration.
In ER diagrams, is it possible to relate two weak entities each other? If possible, how can uniquely identify records in them?
It's certainly possible. Consider the following ER diagram in which invoices are composed of lines, and receipts are decomposed into corresponding lines which are allocated to invoice lines. Multiple receipt lines can be allocated to the same InvoiceLine. It's perhaps a bit contrived but it'll serve as an example.
The InvoiceLine entity set is identified by (InvoiceNumber, LineNumber). Similarly, the ReceiptLine entity set is identified by (ReceiptNumber, LineNumber).
The determinant of a relationship between any entity sets is a combination of the determinants of the entity sets in many-roles. It doesn't matter whether the entity sets are weak or regular, or whether you have two or more entity sets involved in the relationship. In the case of 1:1 (or 1:1:1, etc) relationship, any of the entity sets involved can be used as a determinant.
In our example, ReceiptLine is the only entity set in a many-role (indicated by an N next to the Paid relationship diamond). This means the relationship is determined by the determinant of ReceiptLine, which is (ReceiptNumber, LineNumber).
If we translate our ER diagram to a tabular model, we get the following:
I translated it directly to help you see the correspondence between the diagrams, but in practice we could denormalize the Paid relationship relation into the ReceiptLine entity relation for a simpler physical model. That can only be done for relationships with a single determining entity set, so it's important that you understand the general approach first.
Possible duplicate of Can I avoid a relation loop in my database design?, but I'd like to get a broader answer than for that specific design.
The goal in this case is to store automated testing data as it’s generated. A portion of the relationship diagram is shown below.
A variable number of tests may be run on each build, hence the direct one-to-many relationship between Builds and Sessions.
Each build is made of several hundred parts, and each part number may be used on several hundred builds, hence the many-to-many relationship between Builds and DT_Parts, associated through LT_HeaderParts.
If an assembly error is found during testing, a part or parts may be switched out and the unit retested. Instead of duplicating hundreds of part records on each retest, I implement PartsChangeLog to document any changes made after a given session.
PartsChangeLog uses DT_Parts as a dictionary to save memory by storing integers instead of the varchar(20) part_number.
LT_HeaderParts and PartsChangeLog both have appear to have valid, non-redundant reasons for using DT_Parts, yet this setup creates a reference loop and poses the danger of creating a false many-to-many bridge from build_id to session_id that would yield incorrect relationships.
Is this an okay structure? Why or why not?
Trying to answer the actual title question "When is it okay to have a relationship loop in my database?".
One part of the answer is that it depends on the intended usage of the schema/diagram per se. Is it intended as a conceptual model, with the purpose of illustrating business concepts ? Then basically you can highlight just any relationship you like. By which I mean you can highlight anything in the form of a relationship if you think that relationship is of interest to the intended business audience. Or is it intended as a logical db schema ?
In that case it mostly depends on the precise "semantics" of the relationships. If two relationships are saying things that are semantically distinct, then you can bet your ... that both will be relevant to the business being modeled and that you should be keeping both.
The simplest example of such a loop is a bill-of-materials structure. Such structures have a single "parts" entity, with a many-to-many relationship of "containment". This "containment" relationship gets instantiated as a "containment" entity with two relationships to the "parts" entity. Each of these two relationships has different semantics (one saying "the containing part must be a known part" and the other saying "the contained part must be a known part") and so they should definitely be kept both.
What you have is two sets of parts associated with a session: build parts (session -> build ->> part) and changed parts (session ->> partschangelog -> part). As the answer to the question linked by JJ32 explains, consistency is the main concern in these situations. In this case, I suspect the set of changed parts should be a subset of the build parts, but your schema doesn't enforce this.
One way of enforcing it is via controlled redundancy. If you include build_id in PartsChangeLog as a non-prime attribute (and modifying the foreign key reference to Sessions accordingly), you can create two composite foreign key constraints referencing LT_HeaderParts (for build_id, part_added and build_id, part_removed).
This eliminates the possibility of associating inconsistent session_id and build_id via the many-to-many bridge; though if no parts were changed, there won't be such a bridge. That's understandable, our goal is not to replace the direct mapping between session_id and build_id, only to ensure consistency. The rest is up to the query developer.
I have read different tutorials and seen different examples of normalization, specially the notion of "repeating groups" in the first normal form. From them I have have gathered that repeating groups are "kind-of" multi-valued attributes (e.g. here and here).
But we already make separate tables for each multi-valued attribute by including foreign keys from the parent table during the process of mapping an ERM (Entity relationship Model) to a RDM (Relational Data Model)? Reference: this
Secondly, are those "repeating groups" essentially laid out horizontally in the same row, or can the same value occurring in the same column again and again, i.e. the same value of an attribute again and again, also a repeating group and should be eliminated?
In this example, the value English is repeating again and again. Is this a repeating group?
If I eliminate it to make another table SUBJECT with Subject Name and Module_ID(Foreign key), this is what I get. Sure it gets rid of the repeating value, but I am not sure if this is the right thing. Is it right?
The term "repeating group" originally meant the concept in CODASYL and COBOL based languages where a single field could contain an array of repeating values. When E.F.Codd described his First Normal Form that was what he meant by a repeating group. The concept does not exist in any modern relational or SQL-based DBMS.
The term "repeating group" has also come to be used informally and imprecisely by database designers to mean a repeating set of columns, meaning a collection of columns containing similar kinds of values in a table. This is different to its original meaning in relation to 1NF. For instance in the case of a table called Families with columns named Parent1, Parent2, Child1, Child2, Child3, ... etc the collection of Child N columns is sometimes referred to as a repeating group and assumed to be in violation of 1NF even though it is not a repeating group in the sense that Codd intended.
This latter sense of a so-called repeating group is not technically a violation of 1NF if each attribute is only single-valued. The attributes themselves do not contain repeating values and therefore there is no violation of 1NF for that reason. Such a design is often considered an anti-pattern however because it constrains the table to a predetermined fixed number of values (maximum N children in a family) and because it forces queries and other business logic to be repeated for each of the columns. In other words it violates the "DRY" principle of design. Because it is generally considered poor design it suits database designers and sometimes even teachers to refer to repeated columns of this kind as a "repeating group" and a violation of the spirit of the First Normal Form.
This informal usage of terminology is slightly unfortunate because it can be a little arbitrary and confusing (when does a set of columns actually constitute a repetition?) and also because it is a distraction from a more fundamental issue, namely the Null problem. All of the Normal Forms are concerned with relations that don't permit the possibility of nulls. If a table permits a null in any column then it doesn't meet the requirements of a relation schema satisfying 1NF. In the case of our Families table, if the Child columns permit nulls (to represent families who have fewer than N children) then the Families table doesn't satisfy 1NF. The possibility of nulls is often forgotten or ignored in normalization exercises but the avoidance of unnecessary nullable columns is one very good reason for avoiding repeating sets of columns, whether or not you call them "repeating groups".
See also this article.
the value English is repeating again and again. Is this a repeating
group?
No. The multiple appearances of English in SUBJECT_MODULE are not a repeating group or even either of the two things that people mistakenly mean by a repeating group. They are also not evidence of redundancy or lack of normalization. Such multiple appearances might be connected to redundancy or normalization, but they appear all the time when there is no redundancy and various levels of normalization.
If SUBJECT_MODULE is rows where "[SUBJECT_NAME] has [MODULE_NAME] identified by [MODULE_ID]" and a subject might have more than one module then somewhere you must have multiple mentions of that subject (perhaps via its name) with mentions of different modules (perhaps by name or id). That would not involve redundancy.
Student Age Subject
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths
The redundancy in this example from your question's second "this" link is not that Adam appears in two rows or that Adam appears with 15 in two rows. It is that if the table is rows where "[Student] is [Age] years old and takes [Subject]" then Student (eg Adam) can appear in multiple rows but always appears with the same Age (eg 15). But if the table were rows where "[Student] has a friend [Age] years old in [Subject]" then the table could be fully normalized already.
Sure it gets rid of the repeating value, but I am not sure if this is
the right thing.
It does for your example data, but it might not for other example data. You haven't told us enough. (Anyway as I said above the multiple appearances might not even need normalizing.)
Whether there are any normalization-relevant redundancies in SUBJECT_MODULE or even whether there are any valid decompositions including the one you gave depends on the usual information necessary to normalize to above 1NF. Namely whether some of its columns are functions of others (functional dependencies) and whether its rows are also those where "..." AND "..." (join dependencies).
By giving a possible decomposition you have said that it is also rows where "...[Subject_Name]...[Module_ID]..." AND "...[Module_Name]...[Module_ID]..." And you have given some example decomposition data. But we only know that it could be so decomposed because you added the decomposition. And the decomposition plus data still isn't enough for us to know whether it should be so decomposed.
I have read different tutorials and seen different examples of
normalization, specially the notion of "repeating groups" in the first
normal form.
"Repeating groups" are something from pre-relational databases and cannot possibly appear in a relational table (relation). They are like a named set of values that is like a field of a record but is not quite. A relational table is always in 1NF. Each column of a row has a single value of the column's type. A non-relational database is "normalized" to tables ie 1NF (first sense of "normalized") which gets rid of repeating groups. Then those tables/relations are "normalized" to higher normal forms (second sense of "normalized").
A relational table having multiple similar columns or having a column type with multiple similar parts are each just reminiscent of having a repeating group in a non-relational database. And the multiple columns and parts should become multiple rows in a separate table, just like the multiple members of a repeating group. But these problems have to do with relational quality of design, not repeating groups or normalization (in either sense) or being relational (ie being in 1NF).
Note that a non-relational database might itself have similar problems with multiple similar fields and/or named sets or with multiple similar parts of values of fields. Normalization to tables does not get rid of these when it gets rid of repeating groups.
Regardless of how they got into a relational design, removing them gives a "better" design. It is just because these design problems are reminiscent of repeating groups that people get confused and imagine that somehow a table could contain a repeating group. So the multiple similar columns and values with multiple similar parts (or the parts) get incorrectly called "repeating groups".
See this answer re "atomicity".
What is a KISS (Keep it Simple, Stupid) way to remember what Boyce-Codd normal form is and how to take a unnormalized table and BCNF it?
Wikipedia's info: not terribly helpful for me.
Chris Date's definition is actually quite good, so long as you understand what he means:
Each attribute
Your data must be broken into separate, distinct attributes/columns/values which do not depend on any other attributes. Your full name is an attribute. Your birthdate is an attribute. Your age is not an attribute, it depends on the current date which is not part of your birthdate.
must represent a fact
Each attribute is a single fact, not a collection of facts. Changing one bit in an attribute changes the whole meaning. Your birthdate is a fact. Is your full name a fact? Well, in some cases it is, because if you change your surname your full name is different, right? But to a genealogist you have a surname and a family name, and if you change your surname your family name does not change, so they are separate facts.
about the key,
One attribute is special, it's a key. The key is an attribute that must be unique for all information in your data and must never change. Your full name is not a key because it can change. Your Social Insurance Number is not a key because they get reused. Your SSN plus birthdate is not a key, even if the combination can never be reused, because an attribute cannot be a combination of two facts. A GUID is a key. A number you increment and never reuse is a key.
the whole key,
The key alone must be sufficient [and necessary!] to identify your values; you cannot have the same data represented by different keys, nor can a subset of the key columns be sufficient to identify the fact.
Suppose you had an address book with a GUID key, name and address values. It is OK to have the same name appearing twice with different keys if they represent different people and are not the "same data".
If Mary Jones in accounting changes her name to Mary Smith, Mary Jones in Sales does not change her name as well.
On the other hand, if Mary Smith and John Smith have the same street address and it really is the same place, this is not allowed. You have to create a new key/value pair with the street address and a new key.
You are also not allowed to use the key for this new single street address as a value in the address book since now the same street address key would be represented twice.
Instead, you have to make a third key/value pair with values of the address book key and the street address key; you find a person's street address by matching their book key and address key in this group of values.
and nothing but the key
There must be nothing other than the key that identifies your values. For example, if you are allowed an address of "The Taj Mahal" (assuming there is only one) you are not allowed a city value in the same record,
since if you know the address you would also know the city. This would also open up the possibility of there being more than one Taj Mahal in a different city.
Instead, you have to again create a secondary Location key with unique values like the Taj, the White House in DC, and so on, and their cities.
Or forbid "addresses" that are unique to a city.
So help me, Codd.
Here are some helpful excerpts from the Wikipedia page on Third Normal Form:
Bill Kent defines Third Normal Form this way:
Each non-key attribute "must provide
a fact about the key, the whole key,
and nothing but the key."
Requiring that non-key attributes be
dependent on "the whole key" ensures
that a table is in 2NF; further
requiring that non-key attributes be
dependent on "nothing but the key"
ensures that the table is in 3NF.
Chris Date adapts Kent's mnemonic to define Boyce-Codd Normal Form:
"Each attribute must represent a fact
about the key, the whole key, and
nothing but the key." Here the
requirement is concerned with every
attribute in the table, not just
non-key attributes.
This comes into play when a table has multiple compound candidate keys, and an attribute within one candidate keys has a dependency on a part of another candidate key. Third Normal Form wouldn't prohibit this, because it excludes key attributes. But BCNF applies the rule to key attributes as well.
As for how to make a table satisfy BCNF, you need to represent the extra dependency, with another attribute and possibly by splitting attributes into another table.
I googled "boyce codd normal form" and after wikipedia this is the second result. My textbook gives a very simple definition in terms of relational database management systems:
The left side of every nontrivial FD must be a superkey.
-"Database Systems The Complete Book" by Garcia-Molina, Ullman and Widom.
The best informal answer I've read is that, in BCNF, every "arrow" in every functional dependency is an "arrow" out of a candidate key. I don't recall the source, but it was probably something Chris Date wrote.
Basically Boyce-Codd is "fifth normal form". It is visually recognizable by the existance of "Attributive entities" in the data model, for things like Types (e.g. roles, status, process state, location-type, phone-type, etc).
The attributive entities (sub-subtypes) are lists of finite sets of values that further categorize a class level entity. So you may have a phone-type ('mobile', ' desk', 'VOIP') email account type ('business', 'personal', 'gaming'), role (project manager, data modeler, super model) etc.
Another morphological clue is the existance of super-types, (aka. master-classes, super-classes, meta-entities) such as Parties (subtypes being company, person, etc.).
It's basically Taxonomy gone wild (..no the video is not that exciting) to the atomic or leaf-level; see Bill Karwin's comment above for a more technical explanation.
Boyce-Codd level models are essentially highly detailed logical models, derived from more simplistic business-based conceptual models. **They are typically NOT implemented ver batim in the PHYSICAL model, because PDM optimization for performance (or functional simplicity) may result in the super-types and attributive entities being managed as drop-down lists in UIs, or in behind the scenes logic in the application, or in database constraints and methods to enforce referential integrity. (i.e. they may end up as look-up tables in the PDM schema, or they may be handled by code and not represented in the database).
So - why do them if they may not end up in the PDM? For the same reason you build a good 3NF model before you 'optimize', so that the database structure reflects the real world and is hence more stable than the typical kludges we inherit and have to do heroic acts to make work as our business/clients requirements change.
Often times it is easiest to listen to your gut and this will come naturally. Generally speaking, if you meet 3NF you have met BCNF. This doesn't cover detailed analysis of an ERD or have examples but there are thirteen rules according to Codd. I find it best to follow these rules but always remember there is no one correct way to do things so follow them loosely. So regarding the RDBMS, here are the rules:
http://www.87android.com/12-rules-of-relational-database-model-by-codd/
This may not answer the question directly, but if you are asking about how to get to BCNF or an easy way to remember it then you don't understand normalization well enough. That is of no concern though. Relational databases take many forms and very few are done well. The best thing you can do is know what it means to be relational, follow the rules above, and do not worry about the level of normalization. The process of normalization eliminates the duplication of data. Each level more so by moving into migration of functional dependencies. Keep that in mind and you will be fine, your gut and intellect will do the rest.