ER Diagram explained - database

Can anybody explain the part with the multivalued attribute middleNames. How would that look in the database/tables?

The diagram isn't valid. Multivalued attributes don't have weak keys, otherwise they would be weak entities. Also, multivalued attributes don't need cardinality indicators, since they're supposed to be multivalued. Optionality can be indicated with a dashed line.
My preferred way of correcting the diagram would be to drop the mnId and middleName component attributes, like so:
Physically, this would be implemented as:
Note the composite primary key on the two columns of the MiddleNames table.

Related

Is this notation correct in Database Design ERD?

I'm creating an ERD and in this m:n relationship I'm trying to indicate that there is a composite key in the LOCATION entity (by combining Location_ID and Department_ID). I realise that this will involve a joining table when it comes to creating a table relationship diagram, but in the ERD, is this notation correct to indicate a composite key?
Your PK,FK demonstration is technically not wrong but for an ERD you ultimately want to remove all many to many relationships otherwise it will cause you more problems down the line. You especially want to remove relationships like this if you have a composite key.
Here's a quick example of how i would do it roughly. (i could do a better one if i understood more info on your scenario and other tables etc...)
You ideally want to create another entity which holds both primary keys from the other tables and therefore creates a composite key. Notice, this also removes the many to many relationships.
I hope this gives you more of an understanding :)

How to avoid problems with composite entity keys and constraints (Products, Options, Option Groups, Special Orders)

I am modeling a database for a webshop and have come across ad issue. Basically the question is whether to ignore database normalization rules for simplicity's sake.
Below is the relevant part of my diagram prior to the issue.
Database diagram
Basically, the product can have options (size, flavor, color) but only from one option group. Since an option group can have many options and a product that uses it can take a subset, a ProductOption table is created. Next we have a SpecialOffers table. Next, a special offer can have many products and products can belong to many special offers, hence the association table SpecialOfferProducts. All this works fine until the special offer includes a product that has options. This is where I run into problems. I have a couple of ideas.
First idea:
Create an association table between SpecialOfferProducts and ProductOptions. I don't like this idea since both tables have composite primary keys and creating a table that has a composite primary key composed of two composite primary keys seems really weird and I have never seen anything like it.
Second idea:
Create a association table between SpecialOfferProducts and Options. This seems wrong since Options is not directly tied to Product. Still this would work and the primary key would be a little simpler.
Third idea:
This is the one that I like the most but it violates a few rules. Change the SpecialOfferProducts table. Make it have its own primary key and have SpecialOffers, Products and Options as foreign keys. Simply make the Options foreign key nullable and problem solved. Of course the problems are that I am not making an association table where I should and am making a foreign key nullable. This would slightly complicate my code to deal with all of this but I still feel that this is much simpler than the other approaches since I reduce the number of composite keys and I don't have to add another table in the case where the product in a special offer uses an option.
My question is, which one of this options is best? Is there a better option I have not mentioned?
Using Martin style notation
OptionGroups has (0,n) relationship with the table Options. Options has (1,1) relationship with the table OptionGroups. The purpose of these table is to store information like color, size, etc. An example wouldbe OptionGroups entry color that has Option entries black, white, etc.
Product table has (0,1) relationship with table OptionGroups. OptionGroups has (0,n) relationship with table Product. Product table has a (o,n) relationship with the table Options. Options table has a (o,n) relationship with the table Product. Many-to-many relation produces association table ProductOptions. ProductOptions has a composite PK ProductID, OptionsID. The purpose of these tables is to allow product to have (but does not have to have) options from a certain option group but does not need to have all options from that group.
Example 1. Product does not have any options, hence FK Product_OptionGroups is null. In this case the product does not have any entries in the ProductOptions table.
Example 2. Product has options (lets say color) and so the FK Product_OptionGroups is not null (has the ID of the coresponding option group). Option group color can have many colors and the product is allowed to use one or many of those colors. The colors in use by the product are entries in the table ProductOptions.
SpecialOffer table has a (1,n) relation to the table Products. Products table has a (0,n) relation to the table SpecialOffer. Many-to-many relation creates the association table SpecialOfferProducts. This table has a PK SpecialOfferID, ProductID. The table has a Quantity attribute indicating the quantity of the product.
Example. SpecialOffer A includes one instance of Product A and two instances Product B.
Lets say that the Product A has options. Now SpecialOfferProducts table must reference the correct option.(maybe the product can be blue and red and the special offer only includes the red product). This is where the current schema does not work and either an additional table must be introduced (idea 1 and 2) or the existing tables changed (idea 3).
Maybe you have some relation(ship)/association not representable in terms of your first three:
-- special offer S offers the pairing of P and option O
SpecialOfferProductOption(S, P, O)
-- PK (S, P, O)
-- FK (S, P) to SpecialOfferProducts, FK (P, O) to ProductOptions
You don't seem to understand the use of composite keys, CKs (candidate key), FKs (foreign keys) & constraints. Constraints (PKs, UNIQUE, FKs, etc) arise after you design relation(ship)s/associations sufficient to clearly describe your business situations (represented by tables), per the situations that can arise.
From an ER point of view, you are not properly applying the notions of participating entity (type), entity (type) key & associative entity (type).
You are needlessly & vaguely afraid of composite CKs. Even if you wanted to reduce use of composite keys, you should first find a straightforward design. If you don't want to use composite keys, introduce id PKs along with other CKs. But note that when you use ids as FKs that doesn't drop the obligation to properly constrain the tables that they appear in to agree where necessary with other ids or columns per the constraints you would have needed if you had used the composite CKs instead.
First idea:
Create an association table between SpecialOfferProducts and ProductOptions. I don't like this idea since both tables have composite primary keys and creating a table that has a composite primary key composed of two composite primary keys seems really weird and I have never seen anything like it.
It's not clear what you mean by this. Maybe you mean the above (good) design. Maybe you mean having duplicate product columns; but that's not what good design suggests.
From an ER perspective: You may be thinking of this as a relation(ship)/association on special orders & products. But then the entity keys would not be composite, they would identify special orders & products, and also options would participate. Or we can use the ER concept of reifying relation(ship)s/associations SpecialOfferProducts & ProductOffers to associative entities that are the two participants. That would use composite keys. (If options weren't considered entities then ER would call this a weak relation(ship)/association entity with special orders & products as identifying entities.) Regardless, special orders & products must agree on options, and if that isn't enforced via FKs then it still needs constraining.
If you have (been) read(ing) some published text(s) on information modeling & database design (as you should) you will see many uses of composite keys.
Second idea:
Create an association table between SpecialOfferProducts and Options. This seems wrong since Options is not directly tied to Product. Still this would work and the primary key would be a little simpler.
It's not clear what you mean by "directly tied", "seems" or "wrong".
Relational tables relation(ship)s/associations are among values, certain subrows of which may identify certain entities. Just use the relevant columns & declare the relevant constraints.
From an ER perspective: Considering that you seem to be confused about participant entities (special offer vs SpecialOfferProduct), maybe this is moot, but: Maybe if you tried to express yourself only using technical terms & without the confusion then you would be trying to say that this design needs a constraint that product-option pairs appear in ProductOptions and that it's messy that the constraint involves a relation(ship)/association whose associative entity ProductOption isn't one of the participating entities. I'd agree, but such a design is not "wrong".
Third idea:
This is the one that I like the most but it violates a few rules. Change the SpecialOfferProducts table. Make it have its own primary key and have SpecialOffers, Products and Options as foreign keys. Simply make the Options foreign key nullable and problem solved.
Besides just being needlessly complex, this design is bad. It involves a complex table meaning & complex constraints. When settting the table value you need to decide when to use & not use nulls. When reading you need to figure out what a row means based on whether it has a null. Introducing an id or nulls, possibly while dropping columns, does not remove the obligation to constrain remaining columns if that's not handled by remaining FK constraints. Normally we combine tables while introducing nulls in columns that are not part of every CK--not your case. Here your adding ids doesn't even obviate the need to constrain pairs of products and non-null option column values to be in ProductOptions. And when there is a NULL option column value there should still exist certain rows in ProductOptions and sometimes not certain rows in SpecialOfferProducts. Also this design must be used with complex queries dealing with the presence of NULL. (Which you address.) Justifying this as an ER design is similarly problematic.
PS 1 Please explain your business relation(ship)s/associations with less generic terms than the essentially meaningless "has", "with", "uses", "in" & "belong to"--as you would with a client buying your products & special offers. They refer to relation(ship)s/associations & sets, but they don't explain them. (Similarly, cardinalities are properties of relation(ship)s/associations, but don't explain/characterize them.)
PS 2 ER reasoning about designs involves what (possibly associative) entities are participating in relationships, whereas in the relational model view tables just capture n-ary relation(ship)s/associations for any n. So the ER view is adding needless distinctions. That is why ER-based information modeling & database design approaches are not as effective as fact-based approaches:
This leads to inadequate normalization and constraints, hence redundancy and loss of integrity. Or when those steps are adequately done it leads to the E-R diagram not actually describing the application, which is actually described by the relational database predicates, tables and constraints. Then the E-R diagram is both vague, redundant and wrong.
PS 3 We don't need SpecialOfferProducts if it holds rows where "special offer S offers the pairing of P and some option", because it is select S, P from SpecialOfferProductOption. (This seems to be the case since your option 3 involves having only one table that you call SpecialOfferProducts but is like this table with an added id.) But if it holds rows where say "special offer S offers product P" and that can be so when not all of S's product-option pairs have been recorded then you need it. (Something similar arises re deciding when something is an entity, eg when there should be a table "S is a special option".)
PS 4
seems really weird and I have never seen anything like it
This is the story of life. But in a technical context if we learn and apply clearly defined basic definitions, rules & procedures then we "see" more, and more clearly. (And don't vaguely think we vaguely see things that aren't there.) And "weird" is a rare case where we can explicitly justify that our tools don't apply.

How to design table with primary key and multivalued attribute?

I'm interested in database design and now reading the corresponding literature.
Through the book, i have faced a strange example that makes me feel uncertain.
There is a relation
In this table we have a composite primary key (StudentID, Activity). But ActivityFee is partially dependent on the key of the table (Activity -> ActivityFee), so the author suggests to divide this relation into two other relations:
Now if we take a look at the STUDENT_ACTIVITY, Activity becomes a foreign key and relation still has a composite primary key.
We've got the table in which the whole columns defines a composite primary key, is it OK?
If it is not, what should we do in this case? (probably define a surrogate key?)
What is a good way to deal with multivalued attribute (Activity in our case) in order eliminate possible data anomalies?
A table which consists only of a composite key is perfectly OK if that matches your business requirement.
Activity is not a multivalued attribute. There is a single value for activity for each tuple.
There is nothing wrong with a composite candidate key. (If your reference doesn't talk in terms of candidate keys, ie if it talks about primary keys in any other case than when there just happens to be only one candidate key, get a new reference.)
Your text will tell you what is good and bad design. There is no point in worrying about every property you notice about a relation that it might be "bad". The kind of "good" it is currently addressing is that given by "normalization".
"Activity" is not a "multivalued attribute". A "multivalued" attribute is a non-relational notion. The term is frequently but incorrectly used to mean either an "attribute" in a non-relational "table" that somehow (which is never explained) has more than one entry per "row", or for a column in a relational table that has a value with multiple similar parts (set, list, bag, table, etc) that somehow (which is never explained) doesn't apply to say, strings & numerals, or for a column in a relational table that has a value with multiple different parts (record, tuple, etc) that somehow (which is never explained) doesn't apply to, say, dates. (Sometimes it is even misapplied to mean a bunch of attributes with similar names and values, which ought to be replaced by a single attribute with a row for each original name.) (These are just cases of unwanted designs.) "Multivalued" gets used as an antonym of the similarly misused/abused term "atomic".
Having the same (value or) subrow of values appear more than once in a column or table is, again, neither good or bad per se. Again, your reference will tell you what is good design.

Can a multivalued attribute have a primary key?

Functional dependencies are the attributes that their values are determined in a unique way by another attribute.Given that, can a multivalued attribute be dependent upon a primary key?
"FDs are the attributes that their values are determined in a unique way by another attribute" is unintelligible. Find a way to say it correctly or how can you understand it?
An attribute (or set of attributes) is functionally determined by a set of attributes.
There is no such thing as a "multivalued attribute" in a relation. A tuple has an attribute value for each attribute name. (Maybe you mean, a set of attributes is being determined? Maybe you mean, a multi-valued dependency?) If you have an attribute that you consider to contain multiple parts, ie you want to generically query about the parts without using operators with parameters of their types, then it is usually good design to have a separate table with attributes for those parts. But that's not addressed by normalization. Any value can be considered to have multiple parts in multiple ways and it is your application/queries that determine when you stop making tables whose attributes are the values of parts of other values and just have an attribute for a value. Similarly, if you have a bunch of attributes that play a similar role (often with similar names) then it is usually good design to have a separate table with just one attribute for the role. But that's not addressed by normalization.
Candidate keys matter to FDs, MVDs, JDs and normalization. PKs don't. You can pick one CK as "the PK" but its primariness is irrelevant to the relational model. It might be relevant to some information modeling method or product.
Superkeys are sets of columns that determine every column. Since every set of attributes always determines the attributes in it, superkeys are sets of columns that determine every other column. CKs are superkeys that contain no smaller superkey. (So CKs are sets of columns that are unique but contain no smaller set of columns that are unique.)
You don't know all the CKs until you find all the FDs. But you might know that a particular set of attributes is unique and has no smaller unique set, so that you know that it is a CK and you can call it "the PK". (Eg an id attribute in a relation variable that can have more than one row.)
can a multivalued attribute be dependent upon a primary key?
Every attribute is dependent on every CK by definition of CK. So every attribute is dependent on every PK by definition of PK.(But you must clarify what you mean by "multivalued attribute" and "dependent".)

Tables with a common primary key

What's the term describing the relationship between tables that share a common primary key?
Here's an example:
Table 1
property(property_id, property_location, property_price, ...);
Table 2
flat(property_id, flat_floor, flat_bedroom_count, ...);
What you have looks like table inheritance. If your table structure is that all flat records represent a single property but not all property records refer to a flat, then that's table inheritance. It's a way of modeling something close to object-oriented relationships (in other words, flat inherits from property) in a relational database.
If I understand your example correctly, the data modeling term is Supertype/Subtype. This is a modeling technique where you define a root table (the supertype) containing common attributes, and one or more referencing tables (subtypes) that contain varying attributes based on the entities being modeled.
For example, you could have a Person table (the supertype) containing columns for attributes pertaining to all people, such as Name. You could then have an Employee table (the subtype) containing attributes specific to employees only, such as rate of pay and hire date. You could then continue this process with additional tables for other specializations of Person, such as Contractor. Each of the subtype tables would have a PersonID key column, which could be the primary key of the subtype table, as well as a foreign key referencing the Person table.
For additional info, search Google for "supertype and subtype entities", and see the links below.
http://www.learndatamodeling.com/dm_super_type.htm
http://technet.microsoft.com/en-us/library/cc505839.aspx
There isn't a good name for this relationship in common database terminology (as far as I know). It's not a one-to-one relationship because there isn't guaranteed to be a record in the "extending" table for each record in the main table. It's not a one-to-many relationship because there a maximum of one record allowed on what would otherwise be the "many" side of the relationship.
The best I can do is a one-to-one-or-none or a one-to-one-at-most relationship. (I will admit to sloppy terminology myself — I just call it a one-to-one relationship.)
Whatever you decide to call it, you can model it properly and maintain integrity in your database by making the property_id column in property a PK and the property_id column in flat a PK and also an FK back to property.
"Logic and Databases" advances the term "at most one to at most one" for this kind of relationship. (Note that it is insane to assign names to tables on account of which relationships they participate in.)
Beware of the people who have suggested things like "foreign key", "table inheritance", brief, all the other answers given here. Those people are making assumptions that you have not explicitly stated to be valid, namely that one of your two tables will be guaranteed to contain all key values that appear in the other.
(Disfunctionality of the site prevents me from adding this as a comment in the proper place.)
"How would you interpret "...that share a common primary key?" "
I interpret that in the only reasonable sense possible: that within table1, attribute values for the primary key are guaranteed to be unique, and that within table2, attribute values for the primary key are guaranteed to be unique. And that furthermore, the primary key in both tables has the same [set of] attribute names, and that the types corresponding to the primary key attribute[s] are also pairwise the same. Nothing more and nothing less.
In particular, "sharing a primary key" means "having a primary key in common", and that means in turn "having a certain 'internal uniqueness rule' in common", but that commonality guarantees in no way that a primary key value appearing in one table must also appear in the second table.
"Can you give an example involving two tables with shared primary keys where one table wouldn't contain all the key values that appear in the other?" "
Table1: column A of type integer, primary key A
Table2: column A of type integer, primary key A
Rows in table1: {A:1}. Satisfies the primary key for table1.
Rows in table2: {A:2}. Satisfies the primary key for table2.
Convinced ?
"Foreign key"?

Resources