This is ERD, and this is an actual model (most likely to be from Erwin). Can anybody explain notations I marked blue / red respectively?
Blue : As Dependents have total participation / can have many Employees,
shouldn’t it just have | and ⩛? Why is there O?
Red : As Employees don’t need to participate in Policy relationship,
shouldn’t it just have only O? Why is there |?
I thought what should be expressed on a line between an entity and a relationship is...
on an entity side : participation constraint (partial, total) is marked
on a relationship side : key constraint (1 to many... etc) is marked.
If I was wrong, I would appreciate if anyone can clarify this.
For the model you linked, while I can't explain why it's like that (I didn't design it), I can at least tell you what is being conveyed:
A Dependent can have zero or more records in the Policy table, and each record in the Policy table is related to exactly one record in the Dependent table. The relationship between Dependent and Policy is identifying (both because the PK in Policy contains all the FK columns from Dependent, and also because the relationship line between them is a solid line)
A Policy can be associated with at most 1 Employee, and an Employee can have at most 1 Policy. The relationship is non-identifying, both because the FK from Employee is below the line (not part of the PK) on Policy, and because the relationship line is dashed.
The model looks like it might be from ERwin, but I'm not so sure about the ERD. The ERD might be in Korth notation, but it still doesn't look quite right... As it is, the arrow from Dependent to Policy seems backwards. If it is denoting cardinality, it should point the other way. I am assuming this is meant to be a many-to-one relationship. In that case, I think it would be correct if you swap the line from Employee to Policy with the arrow from Dependent to Policy, with the arrow going from Policy and pointing to Employee.
Is this meant to be a locical/physical model split? That's my guess, and the logical modeler got their arrows mixed up for the relationship.
Related
I would like to convert this segment of an ER-Diagram to a relational model. We have a ternary relationship and what it says is the following:
1 Customer gives 1 Project to -> multiple Developers
1 Customer assigns 1 Developer with -> multiple Projects
1 Developer is assigned 1 Project by -> ONE Customer
A proposed solution would be this:
Assignment(EmployeeID, CustomerID, ProjectID)
where the primary key is composed of EmployeeID, CustomerID and ProjectID.
And all of those attributes are foreign keys, each one refering to its respective entity.
But this solution is plain wrong as it doesn't express the same thing as the ER-Diagram. We have a composed primary key, so that means that the COMBINATION of those three things is UNIQUE. That implies that I can have the same ProjectID, with the same EmployeeID but given by a different CustomerID (which I do not want).
How do I resolve this?
EDIT: As many users find that the bullet points haven't clarified anything, I will give a short textual description of the concept of the relation:
A single customer can give away one or more projects
A single project can be given by ONE SINGLE CUSTOMER
Each project can be finished by one or more developers
Each developer can work on multiple projects (regardless of the customer by which the project was given)
For that purpose, I have concluded that it would be better to use two separate binary relations instead of a single ternary. See my answer below.
When ternary relationships are expressed in a relational model, each of the entity sets with a "many" cardinality indicator becomes part of the primary key. In other words, I read your relationship as expressing the functional dependency (EmployeeID, ProjectID) -> CustomerID which will be physically represented as Assignment (EmployeeID PK/FK, ProjectID PK/FK, CustomerID FK).
the COMBINATION of those three things is UNIQUE. That implies that I can have the same ProjectID, with the same EmployeeID but given by a different CustomerID (which I do not want).
The triplets being unique does not imply that--clearly, the triplets can be unique at the same time as certain combinations of rows are absent. On the other hand it doesn't enforce their absence. But the cardinality constraints do. What they say is what the bullets (try to) say--that only certain situations/states can arise. The bullets are not "what the relationship says"--either in the sense of what rows actually form the relationship/table in a given situation/state or in the sense of what a row says about the situation when it is in the relationship/table.
In this kind of diagram a diamond denotes an n-ary business or application relation(ship) or association and its corresponding table. A line in such a diagram represents a participation by an entity type and its corresponding FK (foreign key) (sadly, called a "relationship" in pseudo-ER methods.) A constraint is a restriction on what instances/rows can appear in a relationship/table. Each instance/row in a relationship/table "says" that that row of values satisfies the relationship. Constraints "say" there are limitations on what values can be so related over all situations/states. Cardinalities are constraints that say something about how many times values and/or combinations of values can appear in a relationship.
There are two main cardinality conventions, look-across & look-here. In look-across a number/range says how many of the entities of the type it is near can participate with one subrow of entities of the other entity types, ie how many times some subrow of the others can participate/be in the relationship/table. (Chen's original ER meaning.) In look-here a number/range says how many subrows of the other entity types can appear with an entity of the type it is near, ie how many times a nearby entity can participate/be in the relationship/table. (Look-here isn't very useful for relationships with arity > 2.)
We have a ternary relationship and what it says is the following:
What the relationship diamond says is that you are recording the rows (EmployeeID, CustomerID, ProjectID) where (something like) developer EmployeeID is assigned by customer CustomerID to project ProjectID. What the cardinalities say is that only certain sets of instances/rows can satisfy that relationship in any given situation/state.
1 Customer gives 1 Project to -> multiple Developers
1 Customer assigns 1 Developer with -> multiple Projects
1 Developer is assigned 1 Project by -> ONE Customer
Your bulleted constraints are not clear. Numbers have been stuck in front of entity types--almost as one would put id values in to get what that row of id values says when in the relationship/table--but the almost-sentences produced, which also have unexplained arrows, don't mean anything. Maybe you are trying to say, for a given customer-project subrow value there can be multiple developer values, etc? That would give the look-across cardinalities in the diagram. But you haven't said that.
As I have already mentioned in the question in the first place, the description of the relation is the following:
A single customer can give away one or more projects
A single project can be given by ONE SINGLE CUSTOMER
Each project can be finished by one or more developers
Each developer can work on multiple projects (regardless of the customer by which the project was given)
The problem is in the ER-Diagram itself: it does not exactly represent the description above. The problem lies in the constraint that a single project can be given by one single customer. That's why it would make more sense to model that with two separate binary relationships instead using a ternary one.
That being said, the relationship between Customer and Project should be a 1:n relationship, while the relationship between Project and Developer should be a m:n relationship. Mapping those relationships gives us the following:
Customer(CustomerID) with Primary Key=CustomerID
Project (ProjectID, CustomerID) with Primary Key=CustomerID and Foreign Key=CustomerID referencing the Customer
Developer(DeveloperID) with PK=DeveloperID
ProjectDevelopment (ProjectID, DeveloperID) with PK={ProjectID, DeveloperID)
So I'm making an E/R diagram based on drugs. It states that each drug is produced by a given pharmaceutical company and the trade name of the drug is identified among the products of the given pharmaceutical company. So here's the E/R diagram I drew up:
Now the biggest question I have about this is, are these relationships supposed to be one to many or many to many? Each one relationship is represented by an arrow (where the pointed arrow means at most one and the rounded arrow means exactly one). I first assumed that a single drug identified by a single trade name would come from just one pharmaceutical company but would it be possible for a single drug to come from multiple pharmaceutical company's? I'm also not sure if it's supposed to be a 3 way relationship or not.
Not sure if this is really a technical question you can find the answer to here. It would probably be wise to further clarify with your client, but from pure wording I would assume.
1.) 1 Drug - 1 Trade Name - 1 Company
2.) 1 Company has Many Drugs
From general knowledge of US drugs, different companies have their unique versions of drugs with the same active ingredient, but these are all filed under different trade names, maintaining 1 trade name - 1 company relationship.
For example, ibuprofen (generic) is sold under both Advil and Motrin (separate trade names).
In this style of ER diagram, Chen's original, the diamond denotes a ternary
"relationship" type, aka association type, among/on the three participant "entity" types symbolized by the boxes. As in an application relationship/association, as in "Entity-Relationship Model". The lines showing participations correspond to FKs (foreign keys).
In such a diagram each line gets labeled by the number or range giving the number of entities in each entity set which is allowed in a relationship set. The table for the relationship would have a FK for each line. Per Chen it would be described as (in order company-name-drug) (at-most-1)-to-(exactly-1)-to-N relationship (assuming the unlabeled line means any number). There is a style with a cardinality at each end of a line.
Misunderstandings/misrepresentations/misappropriations of Chen style by older & newer methods & products (although quite mainstream) lead to different so-called ER diagrams.
One such style only shows entity type boxes with relationships shown by connecting lines labeled by relationship names. The 1:many relationships can be implemented by a FK attribute in one of the entity type tables, although they needn't be, and although that's contrary to Chen ER modeling, which would use a table. Typically, for n-ary relationships for n>2, instead of just having three line segments connect at a point the point is replaced by a box for what in Chen is an "associative entity" type. The lines would then be participations/FKs under Chen. All lines now represent 1:many relationships. Other so-called ER diagrams just have boxes for tables and lines for FKs and don't even have relationships on entities in the Chen sense. The use of lines that only ever denote 1:many relationships and/or FKs lead to lines and FKs being (wrongly but ubiquitously) called "relationships". (Which seems to be how you understand the word.)
The wikipedia entry on E-R modeling (and E-R diagrams) is currently reasonable.
When we say each department is managed by an employee , Does that imply that each department must be managed by an employee and hence a total participation constraint ?
Does that imply that each department must be managed by an employee
and hence a total participation constraint ?
Yes in other words it's a one to one relationship
In my observation (based on question body and comments):
The relation is one-to-many, showing that an employee can be the manager of many departments.
None of the predicates shows on-to-one relation, since there is no peripatetic saying that an employee can be manger of one department.
The difference: (it is opinion base to decide if there is any difference as comments of this answer shows)
Each department must be managed by an employee
Emphasis a mandatory one-to-many relation (is-managed-by)
Each department is managed by an employee
Emphasis an optional one-to-many relation.
Hint:
Documenting data integrity constraints is most widely done using natural language, which often produces a quick dive into ambiguity. If you use plain English to express
data integrity constraints, you’ll inevitably hit the problem of how the English sentence maps,
unambiguously, into the table structures.Different programmers (and users alike) will interpret such sentences differently, because they all try to convert these into something that will
map into the database design. Programmers then code their perception of the constraint (not
necessarily the specifier’s).
A formal manner will be using the logic and set theory.
Full disclosure...Trying feverishly here to learn more about databases so I am putting in the time and also tried to get this answer from the source to no avail.
Barry Williams from databaseanswers has this schema posted.
Clients and Fees Schema
I am trying to understand the split of address tables in this schema. Its clear to me that the Addresses table contains the details of a given address. The Client_Addresses and Staff_Addresses tables are what gets me.
1) I understand the use of Primary Foreign Keys as shown but I was under the assumption that when these are used you don't have a resident Primary Key in that same table (date_address_from in this case). Can someone explain the reasoning for both and put it into words how this actually works out?
2) Why would you use date_address_from as the primary key instead of something like client_address_id as the PK? What if someone enters two addresses in one day would there be conflicts in his design? If so or if not, what?
3) Along the lines of normalization...Since both date_address_from and date_address_to are the same in the Client_Addresses and Staff_Addresses table should those fields just not be included in the main Address table?
Evaluation
First an Audit, then the specific answers.
This is not a Data Model. This is not a Database. It is a bucket of fish, with each fish drawn as a rectangle, and where the fins of one fish are caught in the the gills of another, there is a line. There are masses of duplication, as well as masses of missing elements. It is completely unworthy of using as an example to learn anything about database design from.
There is no Normalisation at all; the files are very incomplete (see Mike's answer, there are a hundred more problem like that). The other_details and eg.s crack me up. Each element needs to be identified and stored: StreetNo, ApartmentNo, StreetName, StreetType, etc. not line_1_number_street, which is a group.
Customer and Staff should be normalised into a Person table, with all the elements identified.
And yes, if Customer can be either a Person or an Organisation, then a supertype-subtype structure is required to support that correctly.
So what this really is, the technically accurate terms, is a bunch of flat files, with descriptions for groups of fields. Light years distant from a database or a relational one. Not ready for evaluation or inspection, let alone building something with. In a Relational Data Model, that would be approximately 35 normalised tables, with no duplicated columns.
Barry has (wait for it) over 500 "schemas" on the web. The moment you try to use a second "schema", you will find that (a) they are completely different in terms of use and purpose (b) there is no commonality between them (c) let's say there was a customer file in both; they would be different forms of customer files.
He needs to Normalise the entire single "schema" first,
then present the single normlaised data model in 500 sections or subject areas.
I have written to him about it. No response.
It is important to note also, that he has used some unrecognisable diagramming convention. The problem with these nice interesting pictures is that they convey some things but they do not convey the important things about a database or a design. It is no surprise that a learner is confused; it is not clear to experienced database professionals. There is a reason why there is a standard for modelling Relational databases, and for the notation in Data Models: they convey all the details and subtleties of the design.
There is a lot that Barry has not read about yet: naming conventions; relations; cardinality; etc, too many to list.
The web is full of rubbish, anyone can "publish". There are millions of good- and bad-looking "designs" out there, that are not worth looking at. Or worse, if you look, you will learn completely incorrect methods of "design". In terms of learning about databases and database design, you are best advised to find someone qualified, with demonstrated capability, and learn from them.
Answer
He is using composite keys without spelling it out. The PK for client_addresses is client_id, address_id, date_address_from). That is not a bad key, evidently he expects to record addresses forever.
The notion of keeping addresses in a separate file is a good one, but he has not provided any of the fields required to store normalised addresses, so the "schema" will end up with complete duplication of addresses; in which case, he could remove addresses, and put the lines back in the client and staff files, along with their other_details, and remove three files that serve absolutely no purpose other than occupying disk space.
You are thinking about Associative Tables, which resolve the many-to-many relations in Databases. Yes, there, the columns are only the PKs of the two parent tables. These are not Associative Tables or files; they contain data fields.
It is not the PK, it is the third element of the PK.
The notion of a person being registered at more than one address in a single day is not reasonable; just count the one address they slept the most at.
Others have answered that.
Do not expect to identify any evidence of databases or design or Normalisation in this diagram.
1) In each of those tables the primary key is a compound key consisting of three attributes: (staff_id, address_id, date_address_from) and (client_id, address_id, date_address_from). This presumably means that the mapping of clients/staff to addresses is expected to change over time and that the history of those changes is preserved.
2) There's no obvious reason to create a new "id" attribute in those tables. The compound key does the job adequately. Why would you want to create the same address twice for the same client on the same date? If you did then that might be a reason to modify the design but that seems like an unlikely requirement.
3) No. The apparent purpose is that they are the applicable dates for the mapping of address to client/staff - not dates applicable to the address alone.
3) Along the lines of
normalization...Since both
date_address_from and date_address_to
are the same in the Client_Addresses
and Staff_Addresses table should those
fields just not be included in the
main Address table?
No. But you did find a problem.
The designer has decided that clients and staff are two utterly different things. By "utterly different", I mean they have no attributes in common.
That's not true, is it? Both clients and staff have addresses. I'm sure most of them have telephones, too.
Imagine that someone on staff is also a client. How many places is that person's name stored? That person's address? Can you hear Mr. Rogers in the background saying, "Can you spell 'update anomaly'? . . . I knew you could."
The problem is that the designer was thinking of clients and staff as different kinds of people. They're not. "Client" describes a business relationship between a service provider (usually, that is, not a retailer) and a customer, which might be either a person or a company. "Staff" describes a employment relationship between a company and a person. Not different kinds of people--different kinds of relationships.
Can you see how to fix that?
This 2 extra tables enables you to have address history per one person.
You can have them both in one table, but since staff and client are separated, it is better to separate them as well (b/c client id =1 and staff id =1 can't be used on the same table of address).
there is no "single" solution to a design problem, you can use 1 person table and then add a column to different between staff and client. BUT The major Idea is that the DB should be clear, readable and efficient, and not to save tables.
about 2 - the pk is combined, both clientID, AddressID and from.
so if someone lives 6 month in the states, then 6 month in Israel, and then back to the states, to the same address - you need only 2 address in address table, and 3 in the client_address.
The idea of heaving the from_Date as part of the key is right, although it doesn't guaranty data integrity - as you also need manually to check that there isn't overlapping dates between records of the same person.
about 3 - no (look at 2).
Viewing the data model, i think:
1) PF means that the field is both part of the primary key of the table and foreign key with other table.
2) In the same way, the primary key of Staff_Addresses is {staff_id,address_id,date_adderess_from} not just date_adderess_from
3) The same that 2)
In reference to Staff_Addresses table, the Primary Key on date_address_from basically prevents a record with the same staff_id/address_id entered more than once. Now, i'm no DBA, but i like my PKs to be integers or guids for performance reasons/faster indexing. If i were to do this i would make a new column, say, Staff_Address_Id and make it the PK column and put a unique constraint on staff_id/address_id/date_address_from.
As for your last concern, Addresses table is really a generic address storage structure. It shouldn't care about date ranges during which someone resided there. It's better to be left to specific implementations of an address such as Client/Staff addresses.
Hope this helps a little.
There are couples of questions around asking for difference / explanation on identifying and non-identifying relationship in relationship database.
My question is, can you think of a simpler term for these jargons? I understand that technical terms have to be specific and unambiguous though. But having an 'alternative name' might help students relate more easily to the concept behind.
We actually want to use a more layman term in our own database modeling tool, so that first-time users without much computer science background could learn faster.
cheers!
I often see child table or dependent table used as a lay term. You could use either of those terms for a table with an identifying relationship
Then say a referencing table is a table with a non-identifying relationship.
For example, PhoneNumbers is a child of Users, because a phone number has an identifying relationship with its user (i.e. the primary key of PhoneNumbers includes a foreign key to the primary key of Users).
Whereas the Users table has a state column that is a foreign key to the States table, making it a non-identifying relationship. So you could say Users references States, but is not a child of it per se.
I think belongs to would be a good name for the identifying relationship.
A "weak entity type" does not have its own key, just a "partial key", so each entity instance of this weak entity type has to belong to some other entity instance so it can be identified, and this is an "identifying relationship". For example, a landlord could have a database with apartments and rooms. A room can be called kitchen or bathroom, and while that name is unique within an apartment, there will be many rooms in the database with the name kitchen, so it is just a partial key. To uniquely identify a room in the database, you need to say that it is the kitchen in this particular apartment. In other words, the rooms belong to apartments.
I'm going to recommend the term "weak entity" from ER modeling.
Some modelers conceptualize the subject matter as being made up of entities and relationships among entities. This gives rise to Entity-Relationship Modeling (ER Modeling). An attribute can be tied to an entity or a relationship, and values stored in the database are instances of attributes.
If you do ER modeling, there is a kind of entity called a "weak entity". Part of the identity of a weak entity is the identity of a stronger entity, to which the weak one belongs.
An example might be an order in an order processing system. Orders are made up of line items, and each line item contains a product-id, a unit-price, and a quantity. But line items don't have an identifying number across all orders. Instead, a line item is identified by {item number, order number}. In other words, a line item can't exist unless it's part of exactly one order. Item number 1 is the first item in whatever order it belongs to, but you need both numbers to identify an item.
It's easy to turn an ER model into a relational model. It's also easy for people who are experts in the data but know nothing about databases to get used to an ER model of the data they understand.
There are other modelers who argue vehemently against the need for ER modeling. I'm not one of them.
Nothing, absolutely nothing in the kind of modeling where one encounters things such as "relationships" (ER, I presume) is "technical", "precise" or "unambiguous". Nor can it be.
A) ER modeling is always and by necessity informal, because it can never be sufficient to capture/express the entire definition of a database.
B) There are so many different ER dialects out there that it is just impossible for all of them to use exactly the same terms with exactly the same meaning. Recently, I even discovered that some UK university that teaches ER modeling, uses the term "entity subtype" for the very same thing that I always used to name "entity supertype", and vice-versa !
One could use connection.
You have Connection between two tables, where the IDs are the same.
That type of thing.
how about
Association
Link
Correlation