How to put this in a E/R diagram? - database

Have a simple question, but I think I am overthinking it. I need to make an E/R diagram out of this:
Substantial fees are due every calendar year. Fee payments must be
made via a bank transfer, mentioning the member number and the
membership year it applies to. The database should store the date of
payment.
I am ignoring calendar year, as I think it is not relevant for the E/R diagram. I have an entity called "Members" which I like to "Fee" via *"payed via the relationship (diamond symbol) a bank transfer"*.
Now, my question is: should "member number" and "membership" be part of the "fee" entity or the "member" entity? Or both? Because I am thinking to add a new relationship to "fee" giving it the name "consists of" and then link "member number" and "membership", but I don't know whether that's good or not.
And what to do with the last sentence? "The database should store the date of payment."? Can I ignore it?

From your description I got:
You have entity sets Members and Payments
Members are identified by a member_number
Payments have attributes date, amount and membership_year
Obviously, we also need:
Payments have an attribute amount
How are we going to identify Payments? No combination of the listed attributes are uniquely identifying in my opinion. A Member could make two identical Payments on the same date with the same amount, for the same membership year, e.g. if they accidentally only paid half of the annual fee at first then made a second payment to correct.
Let's introduce a surrogate key:
Payments are identified by a payment_id
We also need a relationship between the two entity sets:
Each Payment is associated with a single Member
Each Member can make multiple Payments
We can put this info into an ER diagram:
To derive a table diagram, Chen's original method implemented every entity relation (entity key and attributes) and relationship relation (relationship keys (i.e. related entity keys) and relationship attributes) as separate tables:
However, it's common practice to denormalize tables with the same primary key:
I recommend you study Chen's paper The Entity-Relationship Model - Toward a Unified View of Data. Codd's paper A Relational Model of Data for Large Shared Databanks provides valuable background.

Related

Relational Database: When do we need to add more entities?

We had a discussion today related to W3 lecture case study about how many entities we need for each situation. And I have some confusion as below:
Case 1) An employee is assigned to be a member of a team. A team with more than 5 members will have a team leader. The members of the team elect the team leader. List the entity(s) which you can identify in the above statement? In this cases, if we don't create 2 entities for above requirement, we need to add two more attributes for each employee which can lead to anomaly issues later. Therefore, we need to have 2 entities as below:
EMPLOYEE (PK is employeeId) (0-M)----------------(0-1) TEAM (PK teamId&employeeId) -> 2 entities
Case 2) The company also introduced a mentoring program, whereby a new employee will be paired with someone who has been in the company longer." How many entity/ies do you need to model the mentoring program?
The Answer from Lecturer is 1. With that, we have to add 2 more attributes for each Employee, mentorRole (Mentor or Mentee) and pairNo (to distinguish between different pairs and to know who mentors whom), doesn't it?
My question is why can't we create a new Entity named MENTORING which will be similar to TEAM in Q1? And why we can only do that if this is a many-many relationship?
EMPLOYEE (PK is employeeId) (0-M)----------------(0-1) TEAM (PK is pairNo&employeeId) -> 2 entities
Thank you in advance
First of all, about terminology: I use entity to mean an individual person, thing or event. You and I are two distinct entities, but since we're both members of StackOverflow, we're part of the same entity set. Entity sets are contrasted with value sets in the ER model, while the relational model has no such distinction.
While you're right about the number of entity sets, there's some issues with your implementation. TEAM's PK shouldn't be teamId, employeeId, it should be only teamId. The EMPLOYEE table should have a teamId foreign key (not part of the PK) to indicate team membership. The employeeId column in the TEAM table could be used to represent the team leader and is dependent on the teamId (since each team can have only one leader at most).
With only one entity set, we would probably represent team membership and leadership as:
EMPLOYEE(employeeId PK, team, leader)
where team is some team name or number which has to be the same for team members, and leader is a true/false column to indicate whether the employee in that row is the leader of his/her team. A problem with this model is that we can't ensure that a team has only one leader.
Again, there's some issues with the implementation. I don't see the need to identify pairs apart from the employees involved, and having a mentorRole (mentor or mentee) indicates that the association will be recorded for both mentor and mentee. This is redundant and creates an opportunity for inconsistency. If the goal was to represent a one-to-one relationship, there are better ways. I suggest a separate table MENTORING(menteeEmployeeId PK, mentorEmployeeId UQ) (or possibly a unique but nullable mentorEmployeeId in the EMPLOYEE table, depending on how your DBMS handles nulls in unique indexes).
The difference between the two cases is that teams can have any number of members and one leader, which is most effectively implemented by identifying teams separately from employees, whereas mentorship is a simpler association that is sufficiently identified by either of the two people involved (provided you consistently use the same role as identifier). You could create a separate entity set for mentoring, with relationships to the employees involved - it might look like my MENTORING table but with an additional surrogate key as PK, but there's no need for the extra identifier.
And why we can only do that if this is a many-many relationship?
What do you mean? Your examples don't contain a many-to-many relationship and we don't create additional entity sets for many-to-many relationships. If you're thinking of so-called "bridge" tables, you've got some concepts mixed up. Entity sets aren't tables. An entity set is a set of values, a table represents a relation over one or more sets of values. In Chen's original method, all relationships were represented in separate tables. It's just that we've gotten used to denormalizing simple one-to-one and one-to-many relationships into the same tables as entity attributes, but we can't do the same for many-to-many binary relationships or ternary and higher relationships in general.

What is the right way to use Associative Entity?

This is the description:
Draw an Entity-Relationship diagram for Poke-Hospital which provides
medical service to pokemon.
Each pokemon has an appointment with one of the nurse Joys. In
addition to recording the name, type and trainer of each pokemon, the
system needs to keep track of the multiple types of sickness being
diagnosed for the pokemon. During an appointment, the nurse will
always prescribe medicine. It is required to record the date, time and
dosage of the medicine. A pokemon may need to take more than one
medicine at a time. Each medicine is stored with its name, brand and
cost of purchase. There is no restriction on the amount of medicine to
be prescribed by any nurse.
Within an appointment, a pokemon may need to undergo procedures such
as a surgery and/or diagnosis. Each procedure requires different type
of rooms and a list of equipment. The date, time and the actual room
of the procedure need to be recorded.
A procedure may be performed by more than one nurse. A nurse is
involved in the procedure based on the training skills that she has
completed. Not all nurses are qualified to perform procedures.
Name, pager number as well as office number for each nurse most be
known. Your diagram should show the entities, relationships and their
attributes, and the cardinality of any relationships. Mark the best
primary key for each entity by underlining it.
This is my solution:
Here are my questions:
Should I use Have Appointment as associative entity?
Should I remove 2 relationships Undergo and Prescribe and connect 2
entities Procedure and Appointment Medicine directly to Have
Appointment associative entity? Will the ERD still right then?
If it's wrong, what about the same as question 2 and I turn the Have
Appointment associative entity into a relationship?
I feel really confused about the difference between using associative entity with a relationship (like in this post Enrollment with Teach and Teacher: When to use Associative entities?) and using ternary relationship (connect Teacher directly to Enrollment relationship instead of changing Enrollment to an associative entity and have the Teach relationship).
Should I use Have Appointment as associative entity?
No, I believe it should be a regular entity set. You gave it its own identity - the ID primary key - which I agree with, but that should've corresponded with a change in element type. Associative entity sets (AES) are relationships first, which means they're identified by the (keys of the) entity sets that they relate.
This is a topic that's widely confused, since AES in the entity-relationship model are different than in the network data model. The latter is intuitively more familiar to developers, since it's essentially a model based on records and pointers, but since it only supports directed binary relationships, anything more complicated - many-to-many relationships as well as ternary and higher relationships - need to be represented as AES. In this model, AES are identified by a surrogate ID, since composite keys generally aren't supported either.
The entity-relationship model supports n-ary relationships and composite keys, and so doesn't need AES nearly as frequently. One situation that can't be represented by regular entity sets and n-ary relationships is when a relationship needs to be the subject of a further relationship.
For example, let's look at the relationship between Procedure and Nurse to represent the nurses involved in a procedure.
I prefer the look-across convention for cardinality indicators - a nurse can perform 0 or more procedures, while a procedure requires 1 or more nurses. Anyway, the relationship Perform here is identified by the composite primary key (ProcedureID, NurseID).
Now, if we wanted to track the equipment used by each nurse in the performance of the procedure, we might think a simple ternary relationship would do the trick:
but that relationship would be identified by (ProcedureID, NurseID, EquipmentID), preventing us from recording nurses that assisted in the procedure without using any equipment. What we need is two separate relationships:
(ProcedureID, NurseID)
((ProcedureID, NurseID), EquipmentID)
with an FK constraint from the second to the first to prevent nurses not assisting in the procedure from handling the equipment.
Back to Have Appointment - it's not a relationship between pokemon and nurses (a pokemon can see the same nurse multiple times), it's an event that involves pokemon, nurses, procedures and medicine. It's best handled as a regular entity set with relationships to the other four. As for identity, I imagine a pokemon or nurse can only have one appointment at a time, so we could choose (PokemonID, DateTime) or (NurseID, DateTime) as a natural key. However, in practice we usually identify events by a surrogate ID since events span an interval which most DBMSs can't handle effectively as a primary key.
Should I remove 2 relationships Undergo and Prescribe and connect 2 entities Procedure and Appointment Medicine directly to Have Appointment associative entity? Will the ERD still right then?
No, I think you should add relationships between Pokemon and Have Appointment, and between Nurse and Have Appointment, after converting the AES to a regular entity set.
If it's wrong, what about the same as question 2 and I turn the Have Appointment associative entity into a relationship?
Answered above.

ER diagram that implements a database for trainee

I edited and remade the ERD. I have a few more questions.
I included participation constraints(between trainee and tutor), cardinality constraints(M means many), weak entities (double line rectangles), weak relationships(double line diamonds), composed attributes, derived attributes (white space with lines circle), and primary keys.
Questions:
Apparently to reduce redundant attributes I should only keep primary keys and descriptive attributes and the other attributes I will remove for simplicity reasons. Which attributes would be redundant in this case? I am thinking start_date, end_date, phone number, and address but that depends on the entity set right? For example the attribute address would be removed from Trainee because we don't really need it?
For the part: "For each trainee we like to store (if any) also previous companies (employers) where they worked, periods of employment: start date and end date."
Isn't "periods of employment: start date, end date" a composed attribute? because the dates are shown with the symbol ":" Also I believe I didn't make an attribute for "where they worked" which is location?
Also how is it possible to show previous companies (employers) when we already have an attribute employers and different start date? Because if you look at the Question Information it states start_date for employer twice and the second time it says start_date and end_date.
I labeled many attributes as primary keys but how am I able to distinguish from derived attribute, primary key, and which attribute would be redundant?
Is there a multivalued attribute in this ERD? Would salary and job held be a multivalued attribute because a employer has many salaries and jobs.
I believe I did the participation constraints (there is one) and cardinality constraints correctly. But there are sentences where for example "An instructor teaches at least a course. Each course is taught by only one instructor"; how can I write the cardinality constraint for this when I don't have a relationship between course and instructor?
Do my relationship names make sense because all I see is "has" maybe I am not correctly naming the actions of the relationships? Also I believe schedules depend on the actual entity so they are weak entities.... so does that make course entity set also a weak entity (I did not label it as weak here)?
For the company address I put a composed attribute, street num, street address, city... would that be correct? Also would street num and street address be primary keys?
Also I added the final mark attribute to courses and course_schedule is this in the right entity set? The statement for this attribute is "Each trainee identified by: unique code, social security number, name, address, a unique telephone number, the courses attended and the final mark for each course."
For this part: "We store in the database all classrooms available on the site" do i make a composed attribute that contains site information?
Question Information:
A trainee may be self-employed or employee in a company
Each trainee identified by:
unique code, social security number, name, address, a unique
telephone number, the courses attended and the final mark for each course.
If the trainee is an employee in a company: store the current company (employer), start date.
For each trainee we like to store (if any) also previous companies (employers) where they worked, periods of employment: start date and end date.
If a trainee is self-employed: store the area of expertise, and title.
For a trainee that works for a company: we store the salary and job
For each company (employer): name (unique), the address, a unique telephone number.
We store in the database all known companies in the
city.
We need also to represent the courses that each trainee is attending.
Each course has a unique code and a title.
For each course we have to store: the classrooms, dates, and times (start time, and duration in minutes) the course is held.
A classroom is characterized by a building name and a room number and the maximum places’ number.
A course is given in at least a classroom, and may be scheduled in many classrooms.
We store in the database all classrooms
available on the site.
We store in the database all courses given at least once in the company.
For each instructor we will store: the social security number, name, and birth date.
An instructor teaches at least a course.
Each course is taught by only one instructor.
All the instructors’ telephone numbers must also be stored (each instructor has at least a telephone number).
A trainee can be a tutor for one or many trainees for a specific
period of time (start date and end date).
For a trainee it is not mandatory to be a tutor, but it is mandatory to have a tutor
The attribute ‘Code’ will be your PK because it’s only use seems to be that of a Unique Identifier.
The relationship ‘is’ will work but having a reference to two tables like that can get messy. Also you have the reference to "Employers" in the Trainee table which is not good practice. They should really be combined. See my helpful hints section to see how to clean that up.
Company looks like the complete table of Companies in the area as your details suggest. This would mean table is fairly static and used as a reference in your other tables. This means that the attribute ‘employer’ in Employed would simply be a Foreign Key reference to the PK of a specific company in Company. You should draw a relationship between those two.
It seems as though when an employee is ‘employed’ they are either an Employee of a company or self-employed.
The address field in Company will be a unique address your current city, yes, as the question states the table is a complete list of companies in the city. However because this is a unique attribute you must have specifics like street address because simply adding the city name will mean all companies will have the same address which is forbidden in an unique field.
Some other helpful hints:
Stay away from adding fields with plurals on them to your diagram. When you have a plural field it often means you need a separate table with a Foreign Key reference to that table. For example in your Table Trainee, you have ‘Employers’. That should be a Employer table with a foreign key reference to the Trainee Code attribute. In the Employer Table you can combine the Self-employed and Employed tables so that there is a single reference from Trainee to Employer.
ERD Link http://www.imagesup.net/?di=1014217878605. Here's a quick ERD I created for you. Note the use of linker tables to prevent Many to Many relationships in the table. It's important to note there are several ways to solve this schema problem but this is just as I saw your problem laid out. The design is intended to help with normalization of the db. That is prevent redundant data in the DB. Hope this helps. Let me know if you need more clarification on the design I provided. It should be fairly self explanatory when comparing your design parameters to it.
Follow Up Questions:
If you are looking to reduce attributes that might be arbitrary perhaps phone_number and address may be ones to eliminate, but start and end dates are good for sorting and archival reasons when determining whether an entry is current or a past record.
Yes, periods_of_employment does not need to be stored as you can derive that information with start and end dates. Where they worked I believe is just meant to say previous employers, so no location but instead it’s meant that you should be able to get a list all the employers the trainee has had. You can get that with the current schema if you query the employer table for all records where trainee code equals requested trainee and sort by start date. The reason it states start_date twice is to let you know that for all ‘previous’ employers the record will have a start and end date. Hence the previous. However, for current employers the employment hasn't ended which means there will be no end_date so it will null. That’s what the problem was stating in my opinion.
To keep it simple PK’s are unique values used to reference a record within another table. Redundant values are values that you essentially don’t need in a table because the same value can be derived by querying another table. In this case most of your attributes are fine except for Final_Mark in the Course table. This is redundant because Course_Schedule will store the Final_Mark that was received. The Course table is meant to simply hold a list of all potential courses to be referenced by Course_Schedule.
There is no multivalued attributes in this design because that is bad practice Job and salary are singular and if and job or salary changes you would add a new record to the employer table not add to that column. Multivalued attributes make querying a db difficult and I would advise against it. That’s why I mentioned earlier to abstract all attributes with plurals into their own tables and use a foreign key reference.
You essentially do have that written here because Course_Schedule is a linker table meaning that it is meant to simplify relationships between tables so you don’t have many to many relationships.
All your relationships look right to me. Also since the schedules are linker tables and cannot exist without the supporting tables you could consider them weak entities. Course in this schema is a defined list of all courses available so can be independent of any other table. This by definition is not a weak entity. When creating this db you’d probably fill in the course table and it probably wouldn’t change after that, except rarely when adding or removing an available course option.
Yes, you can make address a composite attribute, and that would be right in your diagram. To be clear with your use of Primary key, just because an attribute is unique doesn’t make it a primary key. A table can have one and only one primary key so you must pick a column that you are certain will not be repeated. In this example you may think street number might be unique but what if one company leaves an address and another company moves into that spot. That would break that tables primary key. Typically a company name is licensed in a city or state so cannot be repeated. That would be a better choice for your primary key. You can also make composite primary keys, but that is a more advanced topic that I would recommend reading about at a later date.
Take final_mark out of courses. That’s table will contain rows of only courses, those courses won’t be linked to any trainee except by course_schedule table. The Final_Mark should only be in that table. If you add final_mark to Course table then, if you have 10 trainees in a course, You’d have 10 duplicate rows in the course table with only differing final_marks. Instead only hold the course_code and title that way you can assign different instructors, trainees and classrooms using the linker tables.
No composite attribute is needed using this schema. You have a Classroom table that will hold all available classrooms and their relevant information. You then use the Classroom_Schedule linker table to assign a given Classroom to a Course_Schedule. No attributes of Classroom can be broken down to simpler attributes.

Schema design: many to many plus additional one to many

I have this scenario and I'm not sure exactly how it should be modeled in the database. The objects I'm trying to model are: teams, players, the team-player membership, and a list of fees due for each player on a given team. So, the fees depend on both the team and the player.
So, my current approach is the following:
**teams**
id
name
**players**
id
name
**team_players**
id
player_id
team_id
**team_player_fees**
id
team_players_id
amount
send_reminder_on
Schema layout ERD
In this schema, team_players is the junction table for teams and players. And the table team_player_fees has records that belong to records to the junction table.
For example, playerA is on teamA and has the fees of $10 and $20 due in Aug and Feb. PlayerA is also on teamB and has the fees of $25 and $25 due in May and June. Each player/team combination can have a different set of fees.
Questions:
Are there better ways to handle such
a scenario?
Is there a term for this type of
relationship? (so I can google it) Or know of any references with similar structures?
Thus is a perfectly fine design. It is not uncommon for a junction table (AKA intersection table) to have attributes of its own - such as joining_date - and that can include dependent tables. There is, as far as I know, no special name for this arrangement.
One of the reasons why it might feel strange is that these tables frequently don't exist in a logical data model. At that stage they are represented by a many-to-many join notation. It's only when we get to the physical model that we have to materialize the junction table. (Of course many people skip the logical model and go straight to physical.)

a layman's term for identifying relationship

There are couples of questions around asking for difference / explanation on identifying and non-identifying relationship in relationship database.
My question is, can you think of a simpler term for these jargons? I understand that technical terms have to be specific and unambiguous though. But having an 'alternative name' might help students relate more easily to the concept behind.
We actually want to use a more layman term in our own database modeling tool, so that first-time users without much computer science background could learn faster.
cheers!
I often see child table or dependent table used as a lay term. You could use either of those terms for a table with an identifying relationship
Then say a referencing table is a table with a non-identifying relationship.
For example, PhoneNumbers is a child of Users, because a phone number has an identifying relationship with its user (i.e. the primary key of PhoneNumbers includes a foreign key to the primary key of Users).
Whereas the Users table has a state column that is a foreign key to the States table, making it a non-identifying relationship. So you could say Users references States, but is not a child of it per se.
I think belongs to would be a good name for the identifying relationship.
A "weak entity type" does not have its own key, just a "partial key", so each entity instance of this weak entity type has to belong to some other entity instance so it can be identified, and this is an "identifying relationship". For example, a landlord could have a database with apartments and rooms. A room can be called kitchen or bathroom, and while that name is unique within an apartment, there will be many rooms in the database with the name kitchen, so it is just a partial key. To uniquely identify a room in the database, you need to say that it is the kitchen in this particular apartment. In other words, the rooms belong to apartments.
I'm going to recommend the term "weak entity" from ER modeling.
Some modelers conceptualize the subject matter as being made up of entities and relationships among entities. This gives rise to Entity-Relationship Modeling (ER Modeling). An attribute can be tied to an entity or a relationship, and values stored in the database are instances of attributes.
If you do ER modeling, there is a kind of entity called a "weak entity". Part of the identity of a weak entity is the identity of a stronger entity, to which the weak one belongs.
An example might be an order in an order processing system. Orders are made up of line items, and each line item contains a product-id, a unit-price, and a quantity. But line items don't have an identifying number across all orders. Instead, a line item is identified by {item number, order number}. In other words, a line item can't exist unless it's part of exactly one order. Item number 1 is the first item in whatever order it belongs to, but you need both numbers to identify an item.
It's easy to turn an ER model into a relational model. It's also easy for people who are experts in the data but know nothing about databases to get used to an ER model of the data they understand.
There are other modelers who argue vehemently against the need for ER modeling. I'm not one of them.
Nothing, absolutely nothing in the kind of modeling where one encounters things such as "relationships" (ER, I presume) is "technical", "precise" or "unambiguous". Nor can it be.
A) ER modeling is always and by necessity informal, because it can never be sufficient to capture/express the entire definition of a database.
B) There are so many different ER dialects out there that it is just impossible for all of them to use exactly the same terms with exactly the same meaning. Recently, I even discovered that some UK university that teaches ER modeling, uses the term "entity subtype" for the very same thing that I always used to name "entity supertype", and vice-versa !
One could use connection.
You have Connection between two tables, where the IDs are the same.
That type of thing.
how about
Association
Link
Correlation

Resources