what is the role of categorization in database?

what is the role of categorization in database? - database

let me know what categorization is in database and what role of it is ...
i got it from a site, i do not understand...
Categorisation is a process of modelling of a single subtype (or subclass) with a relationship that involves more than one distinct supertype (or superclass). Till now all the relationships that have been discussed, are a single distinct supertype. However, there could be need for modelling a single supertype/subtype relationship with more than one supertype, where the supertypes represent different entity set.
thanks!
https://books.google.co.uk/books?id=9m382yDgxRsC&pg=PA287&lpg=PA287&dq=7.4.+Categorisation+Categorisation+is+a+process+of+modelling+of+a+single+subtype+(or+subclass)+with+a+relationship+that+involves+more+than+one+distinct+supertype+(or+superclass).&source=bl&ots=7JFawUEg3d&sig=peeXz5QajJFdkFHw0TzlvQFwix8&hl=ko&sa=X&ved=0ahUKEwi1u8iHwIHSAhWMVhQKHfr_AkkQ6AEIIDAA#v=onepage&q=7.4.%20Categorisation%20Categorisation%20is%20a%20process%20of%20modelling%20of%20a%20single%20subtype%20(or%20subclass)%20with%20a%20relationship%20that%20involves%20more%20than%20one%20distinct%20supertype%20(or%20superclass).&f=false

As I understand, formally categorization is a process of creating of a relation (category) that contains tupels that are a subset of the union of the tupels of the superclasses. The tupels for that subset are chosen based on certain characteristic. Consider an example:
Lets say, we have Suppliers(id, name, address, email, bank_acct, paypal, ... etc.) relation and Customers(ssn, name, faname, email, address, paypal, ... etc.) relation. So, we could create another relation featuring only those parties (both suppliers and customers) who have paypal accounts - Paypal_account_holders(id, name, address, paypal_acct, email ... etc.) where Paypal_account_holders.id is surrogate primary key for Paypal_account_holders and foreign key to both Suppliers.paypal and Customers.paypal.
Motivation and advantages:
Universal interface for applications;
Security. Restricted access to tables when you allow users to access only
some part of information;
Simplified queries;
Enforcing some business rules for certain category;
etc.
Again, that's how I understand it.

Related

Relational Database: When do we need to add more entities?

We had a discussion today related to W3 lecture case study about how many entities we need for each situation. And I have some confusion as below:
Case 1) An employee is assigned to be a member of a team. A team with more than 5 members will have a team leader. The members of the team elect the team leader. List the entity(s) which you can identify in the above statement? In this cases, if we don't create 2 entities for above requirement, we need to add two more attributes for each employee which can lead to anomaly issues later. Therefore, we need to have 2 entities as below:
EMPLOYEE (PK is employeeId) (0-M)----------------(0-1) TEAM (PK teamId&employeeId) -> 2 entities
Case 2) The company also introduced a mentoring program, whereby a new employee will be paired with someone who has been in the company longer." How many entity/ies do you need to model the mentoring program?
The Answer from Lecturer is 1. With that, we have to add 2 more attributes for each Employee, mentorRole (Mentor or Mentee) and pairNo (to distinguish between different pairs and to know who mentors whom), doesn't it?
My question is why can't we create a new Entity named MENTORING which will be similar to TEAM in Q1? And why we can only do that if this is a many-many relationship?
EMPLOYEE (PK is employeeId) (0-M)----------------(0-1) TEAM (PK is pairNo&employeeId) -> 2 entities
Thank you in advance

First of all, about terminology: I use entity to mean an individual person, thing or event. You and I are two distinct entities, but since we're both members of StackOverflow, we're part of the same entity set. Entity sets are contrasted with value sets in the ER model, while the relational model has no such distinction.
While you're right about the number of entity sets, there's some issues with your implementation. TEAM's PK shouldn't be teamId, employeeId, it should be only teamId. The EMPLOYEE table should have a teamId foreign key (not part of the PK) to indicate team membership. The employeeId column in the TEAM table could be used to represent the team leader and is dependent on the teamId (since each team can have only one leader at most).
With only one entity set, we would probably represent team membership and leadership as:
EMPLOYEE(employeeId PK, team, leader)
where team is some team name or number which has to be the same for team members, and leader is a true/false column to indicate whether the employee in that row is the leader of his/her team. A problem with this model is that we can't ensure that a team has only one leader.
Again, there's some issues with the implementation. I don't see the need to identify pairs apart from the employees involved, and having a mentorRole (mentor or mentee) indicates that the association will be recorded for both mentor and mentee. This is redundant and creates an opportunity for inconsistency. If the goal was to represent a one-to-one relationship, there are better ways. I suggest a separate table MENTORING(menteeEmployeeId PK, mentorEmployeeId UQ) (or possibly a unique but nullable mentorEmployeeId in the EMPLOYEE table, depending on how your DBMS handles nulls in unique indexes).
The difference between the two cases is that teams can have any number of members and one leader, which is most effectively implemented by identifying teams separately from employees, whereas mentorship is a simpler association that is sufficiently identified by either of the two people involved (provided you consistently use the same role as identifier). You could create a separate entity set for mentoring, with relationships to the employees involved - it might look like my MENTORING table but with an additional surrogate key as PK, but there's no need for the extra identifier.
And why we can only do that if this is a many-many relationship?
What do you mean? Your examples don't contain a many-to-many relationship and we don't create additional entity sets for many-to-many relationships. If you're thinking of so-called "bridge" tables, you've got some concepts mixed up. Entity sets aren't tables. An entity set is a set of values, a table represents a relation over one or more sets of values. In Chen's original method, all relationships were represented in separate tables. It's just that we've gotten used to denormalizing simple one-to-one and one-to-many relationships into the same tables as entity attributes, but we can't do the same for many-to-many binary relationships or ternary and higher relationships in general.

Can a relationship link three or more entities?

I have an entity employees with six attributes: employee_number (unique key), first_name, last_name, address, phone_number, and hire_date. However, there are two types of employees: "Service Technicians" and "Sales Associates".
Each distinct type of employee has "job specific" attributes. Service technicians have model_expertise and pager_number attributes, and Sales associates have commission and salary attributes.
I'm not sure how to represent this in an ER diagram. I have an employees entity with the attributes listed, is it possible to have a relationship from employees to both technicians and associates? A relationship like is_type: can a relationship link one entity to two entities like this?
If not, how else?

You've got a classic subtype/supertype relationship. The original ER notation had no specific symbols for this situation, though one could represent subtypes as weak entities without a weak key. A number of extensions to the ER model were developed to address this. Here's one example:
The d in the circle indicates disjoint subtypes, meaning an Employee can be either a Technician or an Associate, but not both. The other option is o for overlapping.
However, don't confuse your supertype/subtype relationships with a 3-way relationship. Rather, it's better viewed as 2 binary relationships, with disjointness being a mutual constraint. True 3-way (and higher) relationships (e.g. a many-to-many-to-many association among suppliers, parts and regions) are certainly possible in ER diagrams. It's one of the features that distinguish the ER model from the older network data model.

Choosing entities for a database

There are three kinds of people in a database, member, volunteer and requester.
Most of the volunteers are members and half of the requesters are members.
volunteer has some attributes that member doesn't have.
If a requester is not a member, only basic information can be put in the database. And they may become member later.
Anyone can be requester and volunteer, so, yes, a user can be both a requester, volunteer and member at same time. A user payed membership fee can be a member, once he made a request, he is a requester. And he can choose to be a volunteer. If a member did nothing, he is just member.
How should I choose entities?
Should I make them three entities or put them in one entity, and set volunteer and requester as two attributes?
Thanks

What about having a users table that contains all generic data, and then tables for the "roles", that contain role-specific data and that can be linked to the users:
user:
- id
- name
- email
- member_id
- volunteer_id
- requester_id
client
- id
- data
volunteer
- id
- data
requester
- id
- data
Then, if you are representing the rows with an object-oriented abstraction, your User objects can have this method:
// C-style pseudocode
boolean isVolunteer() {
!!self.volunteer_id;
}

Create 4 tables:
persons (person_id PK, first_name, last_name)
members (person_id PK/FK)
volunteers (person_id PK/FK)
requesters (person_id PK/FK)
Add common attributes to the persons table, and role-specific attributes to the relevant table.

I see you have several shared attributes among your entities, some occur on more than, some occur exclusively on others. It's not exactly a beautiful solution but seems to fit your problem, it's called Single Table Inheritance in JPA jargon. Despite the fact the article is written focusing on Java, it can be achieved with other technologies.
JPA Single Table Inheritance Example

Database Schema Recommendation

I am having a brain-cease on a data problem that I am in need of modeling. I will do my best to outline the tables, and relationships
users (basic user information name/etc)
users.id
hospitals (basic information about hospital name/etc)
hospitals.id
pages
pages.id
user_id (page can be affiliated with a user)
hospital_id (page can be affiliated with a hospital)
Here is where the new data begins, and I am having an issue
groups (name of a group of pages)
groups.id
groups_pages (linking table)
group_id
page_id
Now here is the tricky part .. a group can be 'owned' by either a user or hospital, but those pages arent necessarily affiliated with that user/hospital .. In addition, there is another type of entity (company) that can 'own' the group
When displaying the group, I will need to know of what type (user / hospital / company) the group is and be able to get the correct affiliated data (name, address, etc)
Im drawing a blank on how to link groups to its respective owner, knowing that its respective owner can be different.

Party is a generic term for person or organization.
Keep all common fields (phone no, address..) in the Party table.
Person and Hospital should have only specific fields for the sub-type.
If the company has different set of columns from Hospital simply add it as another subtype.
If Hospital and company have same columns, rename the Hospital to more generic Organization
PartyType is the discriminator {P,H}

You'd have to use some form of discriminator. Like adding a column with "owner_type", you could then use either an enum, a vchar, or just an int to represent what type of owner the column represents.

Here is a good tutorial on how to model inheritance in a database while maintaining a reasonable normal form and referential integrity.
Condensed version for you: Create another table, owners, and let it keep a minimal set of attributes (what users and hospitals have in common, maybe a full name, address, and of course an id). Users and hospitals will have their respective id columns that will simultaneously be their primary keys and also foreign keys referencing users.id. Give users the attributes that hospital's don't have and vice versa. Now each hospital is represented by two easily joined rows, one from owners and one from hospitals.
This allows you to reference users.id from groups.owner_id.
(There is also a simpler alternative where you create just one table for users and hospitals and put NULLs to all columns that do not apply to a particular row, but that quickly gets unwieldy.)

HospitalGroups(HospitalID, GroupID)
UserGroups(UserID, GroupID)
CompanyGroups(CompanyID, GroupID)
Groups(GroupID,....)
GroupPages(GroupID, PageID)
Pages(PageID, ...)
Would be the classic way.
The discriminator idea mentioned by #Robert would also work, but you lose referential integrity, so you need more code instead of more tables.

Shared Entity in one to many database relationship

I have a database I'm working on the design for. I have manufacturers and I have distributors on separate tables containing practically the same information with few exceptions. Both groups have one-many contacts that need to be connected to them. I created a contact table to hold contact information, one!
Do I need a second contact table? I'm trying to make this as DRY as possible. How would that look? Thank you in advance

Maybe a case for the party-role pattern? Manufacturer and Distributor are roles played by Parties. Contacts apply to Parties, not the role(s) they play. So you'd have:
a table named Party
a table named ContactMethod (or similar)
a 1:M relationship from Party to ContactMethod
which would resolve the need for two Contact tables. How you model the roles side will depend on wider requirements. The canonical model would have:
a single supertype named Role
a M:M relationship from Party to Role
a subtype of Role for each specific role (Distributor and Manufacturer in your case).
(Note: as an aside, this also allows a Party to play both manufacturer and distributor roles - which may or may not be relevant).
There are 3 'standard' patterns for implementing a subtype hierarchy in relational tables:
table for entire hierarchy
table per leaf subtype
table per type
(1) would apply if you don't have any role-specific relationships. (However I suspect that's unlikely; there's probably information related to Distributors that doesn't apply to Manufacturers and vice-versa).
(2) means multiple relationships from Party (i.e. one to each role subtype).
(3) avoids both above but means an extra join in navigating from Party to its role(s).
Like I say, choice depends on wider reqs.
hth.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight