Choosing entities for a database

Choosing entities for a database - database

There are three kinds of people in a database, member, volunteer and requester.
Most of the volunteers are members and half of the requesters are members.
volunteer has some attributes that member doesn't have.
If a requester is not a member, only basic information can be put in the database. And they may become member later.
Anyone can be requester and volunteer, so, yes, a user can be both a requester, volunteer and member at same time. A user payed membership fee can be a member, once he made a request, he is a requester. And he can choose to be a volunteer. If a member did nothing, he is just member.
How should I choose entities?
Should I make them three entities or put them in one entity, and set volunteer and requester as two attributes?
Thanks

What about having a users table that contains all generic data, and then tables for the "roles", that contain role-specific data and that can be linked to the users:
user:
- id
- name
- email
- member_id
- volunteer_id
- requester_id
client
- id
- data
volunteer
- id
- data
requester
- id
- data
Then, if you are representing the rows with an object-oriented abstraction, your User objects can have this method:
// C-style pseudocode
boolean isVolunteer() {
!!self.volunteer_id;
}

Create 4 tables:
persons (person_id PK, first_name, last_name)
members (person_id PK/FK)
volunteers (person_id PK/FK)
requesters (person_id PK/FK)
Add common attributes to the persons table, and role-specific attributes to the relevant table.

I see you have several shared attributes among your entities, some occur on more than, some occur exclusively on others. It's not exactly a beautiful solution but seems to fit your problem, it's called Single Table Inheritance in JPA jargon. Despite the fact the article is written focusing on Java, it can be achieved with other technologies.
JPA Single Table Inheritance Example

Related

Relational Database: When do we need to add more entities?

We had a discussion today related to W3 lecture case study about how many entities we need for each situation. And I have some confusion as below:
Case 1) An employee is assigned to be a member of a team. A team with more than 5 members will have a team leader. The members of the team elect the team leader. List the entity(s) which you can identify in the above statement? In this cases, if we don't create 2 entities for above requirement, we need to add two more attributes for each employee which can lead to anomaly issues later. Therefore, we need to have 2 entities as below:
EMPLOYEE (PK is employeeId) (0-M)----------------(0-1) TEAM (PK teamId&employeeId) -> 2 entities
Case 2) The company also introduced a mentoring program, whereby a new employee will be paired with someone who has been in the company longer." How many entity/ies do you need to model the mentoring program?
The Answer from Lecturer is 1. With that, we have to add 2 more attributes for each Employee, mentorRole (Mentor or Mentee) and pairNo (to distinguish between different pairs and to know who mentors whom), doesn't it?
My question is why can't we create a new Entity named MENTORING which will be similar to TEAM in Q1? And why we can only do that if this is a many-many relationship?
EMPLOYEE (PK is employeeId) (0-M)----------------(0-1) TEAM (PK is pairNo&employeeId) -> 2 entities
Thank you in advance

First of all, about terminology: I use entity to mean an individual person, thing or event. You and I are two distinct entities, but since we're both members of StackOverflow, we're part of the same entity set. Entity sets are contrasted with value sets in the ER model, while the relational model has no such distinction.
While you're right about the number of entity sets, there's some issues with your implementation. TEAM's PK shouldn't be teamId, employeeId, it should be only teamId. The EMPLOYEE table should have a teamId foreign key (not part of the PK) to indicate team membership. The employeeId column in the TEAM table could be used to represent the team leader and is dependent on the teamId (since each team can have only one leader at most).
With only one entity set, we would probably represent team membership and leadership as:
EMPLOYEE(employeeId PK, team, leader)
where team is some team name or number which has to be the same for team members, and leader is a true/false column to indicate whether the employee in that row is the leader of his/her team. A problem with this model is that we can't ensure that a team has only one leader.
Again, there's some issues with the implementation. I don't see the need to identify pairs apart from the employees involved, and having a mentorRole (mentor or mentee) indicates that the association will be recorded for both mentor and mentee. This is redundant and creates an opportunity for inconsistency. If the goal was to represent a one-to-one relationship, there are better ways. I suggest a separate table MENTORING(menteeEmployeeId PK, mentorEmployeeId UQ) (or possibly a unique but nullable mentorEmployeeId in the EMPLOYEE table, depending on how your DBMS handles nulls in unique indexes).
The difference between the two cases is that teams can have any number of members and one leader, which is most effectively implemented by identifying teams separately from employees, whereas mentorship is a simpler association that is sufficiently identified by either of the two people involved (provided you consistently use the same role as identifier). You could create a separate entity set for mentoring, with relationships to the employees involved - it might look like my MENTORING table but with an additional surrogate key as PK, but there's no need for the extra identifier.
And why we can only do that if this is a many-many relationship?
What do you mean? Your examples don't contain a many-to-many relationship and we don't create additional entity sets for many-to-many relationships. If you're thinking of so-called "bridge" tables, you've got some concepts mixed up. Entity sets aren't tables. An entity set is a set of values, a table represents a relation over one or more sets of values. In Chen's original method, all relationships were represented in separate tables. It's just that we've gotten used to denormalizing simple one-to-one and one-to-many relationships into the same tables as entity attributes, but we can't do the same for many-to-many binary relationships or ternary and higher relationships in general.

Have I resolved my database relationships correctly?

I'm sorry if this is the wrong place for this question. I volunteer for a charity group that has to store sensitive data, as we are a new type of format, there are no systems that fit within our needs or our budget. Someone else started building the database, I wasn't sure he was resolving the relationships correctly, so I presented him with an alternate ER model and now we haven't heard back from him, so I am left to build it by myself.
As we have to store sensitive data, I'm reluctant to put my database design on here in it's entirety, so if there is a way I can privately discuss this with someone, that would be my preference, as I would love to get someone else to check it in full to make sure it's ALL good... but for now, can someone confirm if I have resolved the relationships correctly, or if the original design was better?
The database description is: There are different types of members -
Client, Staff, Professional (Offsite), Supplier, Family, General. There are different types of Staff members: Managers, Volunteer, Professional (Onsite), Admin, Committee, Lecturer. A member can be one or many types eg: Client/Volunteer/Family, Supplier/Volunteer, Manager/Lecturer/Volunteer/Committee/Family.
The original guy resolved this by creating a separate table for each user, each table storing a name and address eg:
Client - ClientName, ClientAddress
Professional - ProfessionalName, ProfessionalAddress
Employee - EmployeeName, EmployeeAddress
Family - FamilyName, FamilyAddress
My only problem with this is that I would ideally like one person to have one MemberID with their name and address, but with the original design each person would have a different ID for each type of person that they were, all storing name, address, phone number, email etc.
I thought that creating a Member table and having a Member Type table with a joining Member Type List table would be a better design. This is how I have resolved the issue:
Member Tables
Have I done this correctly or should I continue with the original design?
Thanks
Update:
Staff Model

It makes sense to store all member related data within one table.
Also for programming, I cannot imagine any use case that would support having different tables for each member type.
That being said, I advise you to look up the concept of "user roles", since this seems very similar.
You have different users (members) and they can have different roles (member type). Based on your roles you might want to show different data / allow different actions / send specific mails (or whatever else you can imagine).
So generally your approach looks good. The only thing I think about is that right now you don't have stored who is a "Staff" member for example. If you just have one list with different names you don't store the structure.
Depending on your use cases you can e.g. make another column in MemberType table "isStaff". Or, if you need to be more flexible and there are likely more different member types in the future, you can make another table (e.g.) MemberTypeParent and set a foreign key on your MemberType table to that table to make the connection.
It all depends on what you want to do with the data in the future.

what is the role of categorization in database?

let me know what categorization is in database and what role of it is ...
i got it from a site, i do not understand...
Categorisation is a process of modelling of a single subtype (or subclass) with a relationship that involves more than one distinct supertype (or superclass). Till now all the relationships that have been discussed, are a single distinct supertype. However, there could be need for modelling a single supertype/subtype relationship with more than one supertype, where the supertypes represent different entity set.
thanks!
https://books.google.co.uk/books?id=9m382yDgxRsC&pg=PA287&lpg=PA287&dq=7.4.+Categorisation+Categorisation+is+a+process+of+modelling+of+a+single+subtype+(or+subclass)+with+a+relationship+that+involves+more+than+one+distinct+supertype+(or+superclass).&source=bl&ots=7JFawUEg3d&sig=peeXz5QajJFdkFHw0TzlvQFwix8&hl=ko&sa=X&ved=0ahUKEwi1u8iHwIHSAhWMVhQKHfr_AkkQ6AEIIDAA#v=onepage&q=7.4.%20Categorisation%20Categorisation%20is%20a%20process%20of%20modelling%20of%20a%20single%20subtype%20(or%20subclass)%20with%20a%20relationship%20that%20involves%20more%20than%20one%20distinct%20supertype%20(or%20superclass).&f=false

As I understand, formally categorization is a process of creating of a relation (category) that contains tupels that are a subset of the union of the tupels of the superclasses. The tupels for that subset are chosen based on certain characteristic. Consider an example:
Lets say, we have Suppliers(id, name, address, email, bank_acct, paypal, ... etc.) relation and Customers(ssn, name, faname, email, address, paypal, ... etc.) relation. So, we could create another relation featuring only those parties (both suppliers and customers) who have paypal accounts - Paypal_account_holders(id, name, address, paypal_acct, email ... etc.) where Paypal_account_holders.id is surrogate primary key for Paypal_account_holders and foreign key to both Suppliers.paypal and Customers.paypal.
Motivation and advantages:
Universal interface for applications;
Security. Restricted access to tables when you allow users to access only
some part of information;
Simplified queries;
Enforcing some business rules for certain category;
etc.
Again, that's how I understand it.

Is it a good idea to create a db with a generic table entity that can be decorated with a role and metadatas?

I've been thinking about creating a database that, instead of having a table per object I want to represent, would have a series of generic tables that would allow me to represent anything I want and even modifying (that's actually my main interest) the data associated with any kind of object I represent.
As an example, let's say I'm creating a web application that would let people make appointments with hairdressers. What I would usually do is having the following tables in my database :
clients
hairdressers: FK: id of the company the hairdresser works for
companies
appointments: FK: id of the client and the hairdresser for that appointment
But what happens if we deal with scientific hairdressers that want to associate more data to an appointment (e.g. quantity of shampoo used, grams of hair cut, number of scissor's strokes,...) ?
I was thinking instead of that, I could use the following tables:
entity: represents anything I want. PK(entity_id)
group: is an entity (when I create a group, I first create an entity which
id is then referred to by the FK of the group). PK(group_id), FK(entity_id)
entity_group: each group can contain multiple entity (thus also other groups): PK(entity_id, group_id).
role: e.g. Administrator, Client, HairDresser, Company. PK(role_id)
entity_role: each entity can have multiple roles: PK(entity_id, role_id)
metadata: contains the name and type of the metadata aswell as the associated role and a flag that describes if its mandatory or not. PK(metadata_id), FK(metadata_type_id, role_id)
metadata_type: contains information about available metadata types. PK(metadata_type_id)
metadata_value: PK(metadata_value_id), FK(metadata_id)
metadata_: different tables for the different types e.g. char, text, integer, double, datetime, date. PK(metadata__id), FK(metadata_value_id) which contain the actual value of a metadata associated with an entity.
entity_metadata: contains data associated with an entity e.g. name of a client, address of a company,... PK(entity_id, metadata_value_id). Using the type of the metadata, its possible to select the actual value of a metadata for this entity in the corresponding table.
This would allow me to have a completely flexible data structure but has a few drawbacks:
Selecting the metadatas associated with an entity returns multiple rows that I have to process in my code to create the representation of the entity in my code.
Selecting metadatas of multiple entities requires to loop over the same process as above.
Selecting metadatas will also require me to do a select for each one of the metadata_* table that I have.
On the other hand, it has some advantages. For example, instead of having a client table with a lot of fields that will almost never be filled, I just use the exact number of rows that I need.
Is this a good idea at all?
I hope that I've expressed clearly what I'm trying to achieve. I guess that I'm not the first one who wonders how to achieve that but I was not able to find the right keywords to find an answer to that question :/
Thanks!

Database Schema Recommendation

I am having a brain-cease on a data problem that I am in need of modeling. I will do my best to outline the tables, and relationships
users (basic user information name/etc)
users.id
hospitals (basic information about hospital name/etc)
hospitals.id
pages
pages.id
user_id (page can be affiliated with a user)
hospital_id (page can be affiliated with a hospital)
Here is where the new data begins, and I am having an issue
groups (name of a group of pages)
groups.id
groups_pages (linking table)
group_id
page_id
Now here is the tricky part .. a group can be 'owned' by either a user or hospital, but those pages arent necessarily affiliated with that user/hospital .. In addition, there is another type of entity (company) that can 'own' the group
When displaying the group, I will need to know of what type (user / hospital / company) the group is and be able to get the correct affiliated data (name, address, etc)
Im drawing a blank on how to link groups to its respective owner, knowing that its respective owner can be different.

Party is a generic term for person or organization.
Keep all common fields (phone no, address..) in the Party table.
Person and Hospital should have only specific fields for the sub-type.
If the company has different set of columns from Hospital simply add it as another subtype.
If Hospital and company have same columns, rename the Hospital to more generic Organization
PartyType is the discriminator {P,H}

You'd have to use some form of discriminator. Like adding a column with "owner_type", you could then use either an enum, a vchar, or just an int to represent what type of owner the column represents.

Here is a good tutorial on how to model inheritance in a database while maintaining a reasonable normal form and referential integrity.
Condensed version for you: Create another table, owners, and let it keep a minimal set of attributes (what users and hospitals have in common, maybe a full name, address, and of course an id). Users and hospitals will have their respective id columns that will simultaneously be their primary keys and also foreign keys referencing users.id. Give users the attributes that hospital's don't have and vice versa. Now each hospital is represented by two easily joined rows, one from owners and one from hospitals.
This allows you to reference users.id from groups.owner_id.
(There is also a simpler alternative where you create just one table for users and hospitals and put NULLs to all columns that do not apply to a particular row, but that quickly gets unwieldy.)

HospitalGroups(HospitalID, GroupID)
UserGroups(UserID, GroupID)
CompanyGroups(CompanyID, GroupID)
Groups(GroupID,....)
GroupPages(GroupID, PageID)
Pages(PageID, ...)
Would be the classic way.
The discriminator idea mentioned by #Robert would also work, but you lose referential integrity, so you need more code instead of more tables.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight