ER diagram relationship for user admin - database

I am designing a Database management project of gym management. There are 2 users, one is the clerk who can add,remove and edit all trainers, centers and members and the second user is the member who can only see and edit certain attributes related to him. Member ,center and trainers are 3 entities in the ER diagram so the question should I introduce entity for clerk and if so should it have a relationship with any of the three entities described above?

I wouldn't split up the two Entities based on the Fact that they have different permissions in your system.
I recommend you focus on the concepts behind the entities:
First, if all Attributes are equal I would start considering building 1 Entity out of the two. Once you end up with multiple columns that are mainly null it might have been a mistake to "merge" two entities.
In addition to that you should check if there is a central name that you can give your merged entity. For example if you have the two Entities: Manager, Employee and you want to merge them I would maybe just call it User and check if the Properties still make sense in that context.
Last but not least you should think about how the Entities are used later in the development. If you need two Joins instead of one once you split up your Entities that could be an argument for merging them. Maybe later in the development your 'clark' Entity will be extended by a few columns, this way you might end up with null columns again.
I think a general answer is not suitable since the Domain is unclear. Just collect arguments for and against merging the entities and compare those.

Related

Relational Database: Reusing the same table in a different interpretation

Problem description
I am currently working on a project which requires a relational database for storage.
After thinking about the data and its relations for a while I ran into a quite repetitive problem:
I encountered a common data schema for entity A which contains some fields e.g. name, description, value. This entity is connected with entity B in multiple n-1 relations. So entity B has n entities A in relation rel1 and n entities A in relation rel2.
Now I am trying to break down this datamodel into a schema for a relational database (e.g. Postgres, MySQL).
After some research, I have not really found "the best" solution for this particular problem.
Some similar questions I have found so far:
Stackoverflow
DBA Stackexchange
My ideas
So I have thought about possible solutions which I am going to present here:
1. Duplicate table
The relationship from entity B to entity A has a certain meaning to it. So it is possible to create multiple tables (1 per relationship). This would solve all immediate problems but essentially duplicate the tables which means that changes now have to be reflected to multiple tables (e.g. a new column).
2. Introduce a type column
Instead of multiple relationships, I could just say "Entity B is connected with n entity A". Additionally, I would add a type column that then tells me to which relation entity A belongs. I am not exactly sure how this is represented with common ORMs like Spring-Hibernate and if this introduces additional problems that I am currently unaware of.
3. Abstract the common attributes of entity A
Another option is to create a ADetails entity, which bundles all attributes of entity A.
Then I would create two entities that represent each relationship and which are connected to the ADetails entity in a 1-to-1 relationship. This would solve the interpretation problem of the foreign key but might be too much overhead.
My Question
In the context of a medium-large-sized project, are any of these solutions viable?
Are there certain Cons that rule out one particular approach?
Are there other (better) options I haven't thought about?
I appreciate any help on this matter.
Edit 1 - PPR (Person-Party-Role)
Thanks for the suggestion from AntC. PPR Description
I think the described situation matches my problem.
Let's break it down:
Entity B is an event. There exists only one event for the given participants to make this easier. So the relationship from event to participant is 1-n.
Entity A can be described as Groups, People, Organization but given my situation they all have the same attributes. Hence, splitting them up into separate tables felt like the wrong idea.
To explain the situation with the class diagram:
An Event (Entity B) has a collection of n Groups (Entity A), n People (Entity A) and n Organizations (Entity A).
If I understand correctly the suggestion is the following:
In my case the relationship between Event and Participant is 1-n
The RefRoles table represents the ParticipantType column that descibes to which relationship the Participant belongs (is it a customer or part of the service for the event for example)
Because all my Groups, People and Organizations have the same attributes the only table required at this point is the Participant table
If there are individual attributes in the future I would introduce a new table (e.g. People) that references the Participant in a 1-1 relationship.
If there are multiple tables going to be added, the foreign key of the multiple 1-1 relationship is mutually exclusive (so there can only be one Group/Person/Organization for a participant)
Solution suggested by AntC and Christian Beikov
Splitting up the tables does make sense while keeping the common attributes in one table.
At the moment there are no individual attributes but the type column is not required anymore because the foreign keys can be used to see which relationship the entity belongs to.
I have created a small example for this:
There exist 3 types (previously type column) of people for an event: Staff, VIP, Visitor
The common attributes are mapped in a 1-1-relationship to the person table.
To make it simple: Each Person (Staff, VIP, Visitor) can only participate in one event. (Would be n-m-relationship in a more advanced example)
The database schema would be the following:
This approach is better than the type column in my opinion.
It also solves having to interprete the entity based on its type in the application later on. It is also possible to resolve a type column in an ORM (see this question) but this approach avoids the struggle if the ORM you are using does not support resolving it.
IMO since you already use dedicated terms for these objects, they probably will diverge and splitting up a table afterwards is quite some work, also on the code side, so I would suggest you map dedicated entities/tables from the beginning.

Performance in database design

I have to implement a testing platform. My database needs the following tables: Students, Teachers, Admins, Personnel and others. I would like to know if it's more efficient to have the FirstName and LastName in each of these tables, or to have another table, Persons, and each of the other table to be linked to this one with PersonID.
Personally, I like it this way, although trickier to implement, because I think it's cleaner, especially if you look at it from the object-oriented point of view. Would this add an unnecessary overhead to the database?
Don't know if it helps to mention I would like to use SQL Server and ADO.NET Entity Framework.
As you've explicitly mentioned OO and that you're using EntityFramework, perhaps its worth approaching the problem instead from how the framework is intended to work - rather than just building a database structure and then trying to model it?
Entity Framework Code First Inheritance : Table Per Hierarchy and Table Per Type is a nice introduction to the various strategies that you could pick from.
As for the note on adding unnecessary overhead to the database - I wouldn't worry about it just yet. EF is generally about getting a product built more rapidly and as it has to cope with a more general case, doesn't always produce the most efficient SQL. If the performance is a problem after your application is built, working and correct you can revisit and fix up the most inefficient stuff then.
If there is a person overlap between the mentioned tables, then yes, you should separate them out into a Persons table.
If you are only tracking what role each Person has (i.e. Student vs. Teacher etc) then you might consider just having the following three tables: Persons, Roles, and a bridge table PersonRoles.
On the other hand, if each role has it's own unique fields, then you should carry on as you are and leave each of these tables separate with a foreign key of PersonID.
If the attributes (i.e. First Name, Last Name, Gender etc) of these entities (i.e. Students, Teachers, Admins and Personnel) are exactly the same then you could just make a single table for all the entities with PersonType or Role attribute added to distinguish each person's role. However, if the entities has a lot of different attributes then it would be better that you create separate tables otherwise you will have normalization problem.
Yes that is a very bad way of structuring a DB. The DB structure should be designed based on the Normalizations.
Please check the normalization forms.
U should avoid the duplicate data as much as possible, else the queries will become slower.
And the main problem is when u r trying to get data that is associated with more than one or two tables.

When I should use one to one relationship?

Sorry for that noob question but is there any real needs to use one-to-one relationship with tables in your database? You can implement all necessary fields inside one table. Even if data becomes very large you can enumerate column names that you need in SELECT statement instead of using SELECT *. When do you really need this separation?
1 to 0..1
The "1 to 0..1" between super and sub-classes is used as a part of "all classes in separate tables" strategy for implementing inheritance.
A "1 to 0..1" can be represented in a single table with "0..1" portion covered by NULL-able fields. However, if the relationship is mostly "1 to 0" with only a few "1 to 1" rows, splitting-off the "0..1" portion into a separate table might save some storage (and cache performance) benefits. Some databases are thriftier at storing NULLs than others, so a "cut-off point" where this strategy becomes viable can vary considerably.
1 to 1
The real "1 to 1" vertically partitions the data, which may have implications for caching. Databases typically implement caches at the page level, not at the level of individual fields, so even if you select only a few fields from a row, typically the whole page that row belongs to will be cached. If a row is very wide and the selected fields relatively narrow, you'll end-up caching a lot of information you don't actually need. In a situation like that, it may be useful to vertically partition the data, so only the narrower, more frequently used portion or rows gets cached, so more of them can fit into the cache, making the cache effectively "larger".
Another use of vertical partitioning is to change the locking behavior: databases typically cannot lock at the level of individual fields, only the whole rows. By splitting the row, you are allowing a lock to take place on only one of its halfs.
Triggers are also typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do that could make this impractical. For example, Oracle doesn't let you modify the mutating table - by having separate tables, only one of them may be mutating so you can still modify the other one from your trigger.
Separate tables may allow more granular security.
These considerations are irrelevant in most cases, so in most cases you should consider merging the "1 to 1" tables into a single table.
See also: Why use a 1-to-1 relationship in database design?
My 2 cents.
I work in a place where we all develop in a large application, and everything is a module. For example, we have a users table, and we have a module that adds facebook details for a user, another module that adds twitter details to a user. We could decide to unplug one of those modules and remove all its functionality from our application. In this case, every module adds their own table with 1:1 relationships to the global users table, like this:
create table users ( id int primary key, ...);
create table users_fbdata ( id int primary key, ..., constraint users foreign key ...)
create table users_twdata ( id int primary key, ..., constraint users foreign key ...)
If you place two one-to-one tables in one, its likely you'll have semantics issue. For example, if every device has one remote controller, it doesn't sound quite good to place the device and the remote controller with their bunch of characteristics in one table. You might even have to spend time figuring out if a certain attribute belongs to the device or the remote controller.
There might be cases, when half of your columns will stay empty for a long while, or will not ever be filled in. For example, a car could have one trailer with a bunch of characteristics, or might have none. So you'll have lots of unused attributes.
If your table has 20 attributes, and only 4 of them are used occasionally, it makes sense to break the table into 2 tables for performance issues.
In such cases it isn't good to have everything in one table. Besides, it isn't easy to deal with a table that has 45 columns!
If data in one table is related to, but does not 'belong' to the entity described by the other, then that's a candidate to keep it separate.
This could provide advantages in future, if the separate data needs to be related to some other entity, also.
The most sensible time to use this would be if there were two separate concepts that would only ever relate in this way. For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. I accept that this is contrived example to demonstrate the point.
Another reason is that you want to specialize a concept in different ways. If you have a Person table and want to add the concept of different types of Person, such as Employee, Customer, Shareholder - each one of these would need different sets of data. The data that is similar between them would be on the Person table, the specialist information would be on the specific tables for Customer, Shareholder, Employee.
Some database engines struggle to efficiently add a new column to a very large table (many rows) and I have seen extension-tables used to contain the new column, rather than the new column being added to the original table. This is one of the more suspect uses of additional tables.
You may also decide to divide the data for a single concept between two different tables for performance or readability issues, but this is a reasonably special case if you are starting from scratch - these issues will show themselves later.
First, I think it is a question of modelling and defining what consist a separate entity. Suppose you have customers with one and only one single address. Of course you could implement everything in a single table customer, but if, in the future you allow him to have 2 or more addresses, then you will need to refactor that (not a problem, but take a conscious decision).
I can also think of an interesting case not mentioned in other answers where splitting the table could be useful:
Imagine, again, you have customers with a single address each, but this time it is optional to have an address. Of course you could implement that as a bunch of NULL-able columns such as ZIP,state,street. But suppose that given that you do have an address the state is not optional, but the ZIP is. How to model that in a single table? You could use a constraint on the customer table, but it is much easier to divide in another table and make the foreign_key NULLable. That way your model is much more explicit in saying that the entity address is optional, and that ZIP is an optional attribute of that entity.
not very often.
you may find some benefit if you need to implement some security - so some users can see some of the columns (table1) but not others (table2)..
of course some databases (Oracle) allow you to do this kind of security in the same table, but some others may not.
You are referring to database normalization. One example that I can think of in an application that I maintain is Items. The application allows the user to sell many different types of items (i.e. InventoryItems, NonInventoryItems, ServiceItems, etc...). While I could store all of the fields required by every item in one Items table, it is much easier to maintain to have a base Item table that contains fields common to all items and then separate tables for each item type (i.e. Inventory, NonInventory, etc..) which contain fields specific to only that item type. Then, the item table would have a foreign key to the specific item type that it represents. The relationship between the specific item tables and the base item table would be one-to-one.
Below, is an article on normalization.
http://support.microsoft.com/kb/283878
As with all design questions the answer is "it depends."
There are few considerations:
how large will the table get (both in terms of fields and rows)? It can be inconvenient to house your users' name, password with other less commonly used data both from a maintenance and programming perspective
fields in the combined table which have constraints could become cumbersome to manage over time. for example, if a trigger needs to fire for a specific field, that's going to happen for every update to the table regardless of whether that field was affected.
how certain are you that the relationship will be 1:1? As This question points out, things get can complicated quickly.
Another use case can be the following: you might import data from some source and update it daily, e.g. information about books. Then, you add data yourself about some books. Then it makes sense to put the imported data in another table than your own data.
I normally encounter two general kinds of 1:1 relationship in practice:
IS-A relationships, also known as supertype/subtype relationships. This is when one kind of entity is actually a type of another entity (EntityA IS A EntityB). Examples:
Person entity, with separate entities for Accountant, Engineer, Salesperson, within the same company.
Item entity, with separate entities for Widget, RawMaterial, FinishedGood, etc.
Car entity, with separate entities for Truck, Sedan, etc.
In all these situations, the supertype entity (e.g. Person, Item or Car) would have the attributes common to all subtypes, and the subtype entities would have attributes unique to each subtype. The primary key of the subtype would be the same as that of the supertype.
"Boss" relationships. This is when a person is the unique boss or manager or supervisor of an organizational unit (department, company, etc.). When there is only one boss allowed for an organizational unit, then there is a 1:1 relationship between the person entity that represents the boss and the organizational unit entity.
The main time to use a one-to-one relationship is when inheritance is involved.
Below, a person can be a staff and/or a customer. The staff and customer inherit the person attributes. The advantage being if a person is a staff AND a customer their details are stored only once, in the generic person table. The child tables have details specific to staff and customers.
In my time of programming i encountered this only in one situation. Which is when there is a 1-to-many and an 1-to-1 relationship between the same 2 entities ("Entity A" and "Entity B").
When "Entity A" has multiple "Entity B" and "Entity B" has only 1 "Entity A"
and
"Entity A" has only 1 current "Entity B" and "Entity B" has only 1 "Entity A".
For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. - I borrowed this example from #Steve Fenton's answer
Where a Driver can drive multiple Cars, just not at the same time. So the Car and Driver entities are 1-to-many or many-to-many. But if we need to know who the current driver is, then we also need the 1-to-1 relation.
Another use case might be if the maximum number of columns in the database table is exceeded. Then you could join another table using OneToOne

Database design rules to follow for a programmer

We are working on a mapping application that uses Google Maps API to display points on a map. All points are currently fetched from a MySQL database (holding some 5M + records). Currently all entities are stored in separate tables with attributes representing individual properties.
This presents following problems:
Every time there's a new property we have to make changes in the database, application code and the front-end. This is all fine but some properties have to be added for all entities so that's when it becomes a nightmare to go through 50+ different tables and add new properties.
There's no way to find all entities which share any given property e.g. no way to find all schools/colleges or universities that have a geography dept (without querying schools,uni's and colleges separately).
Removing a property is equally painful.
No standards for defining properties in individual tables. Same property can exist with different name or data type in another table.
No way to link or group points based on their properties (somehow related to point 2).
We are thinking to redesign the whole database but without DBA's help and lack of professional DB design experience we are really struggling.
Another problem we're facing with the new design is that there are lot of shared attributes/properties between entities.
For example:
An entity called "university" has 100+ attributes. Other entities (e.g. hospitals,banks,etc) share quite a few attributes with universities for example atm machines, parking, cafeteria etc etc.
We dont really want to have properties in separate table [and then linking them back to entities w/ foreign keys] as it will require us adding/removing manually. Also generalizing properties will results in groups containing 50+ attributes. Not all records (i.e. entities) require those properties.
So with keeping that in mind here's what we are thinking about the new design:
Have separate tables for each entity containing some basic info e.g. id,name,etc etc.
Have 2 tables attribute type and attribute to store properties information.
Link each entity (or a table if you like) to attribute using a many-to-many relation.
Store addresses in different table called addresses link entities via foreign keys.
We think this will allow us to be more flexible when adding, removing or querying on attributes.
This design, however, will result in increased number of joins when fetching data e.g.to display all "attributes" for a given university we might have a query with 20+ joins to fetch all related attributes in a single row.
We desperately need to know some opinions or possible flaws in this design approach.
Thanks for your time.
In trying to generalize your question without more specific examples, it's hard to truly critique your approach. If you'd like some more in depth analysis, try whipping up an ER diagram.
If your data model is changing so much that you're constantly adding/removing properties and many of these properties overlap, you might be better off using EAV.
Otherwise, if you want to maintain a relational approach but are finding a lot of overlap with properties, you can analyze the entities and look for abstractions that link to them.
Ex) My Db has Puppies, Kittens, and Walruses all with a hasFur and furColor attribute. Remove those attributes from the 3 tables and create a FurryAnimal table that links to each of those 3.
Of course, the simplest answer is to not touch the data model. Instead, create Views on the underlying tables that you can use to address (5), (4) and (2)
1 cannot be an issue. There is one place where your objects are defined. Everything else is generated/derived from that. Just refactor your code until this is the case.
2 is solved by having a metamodel, where you describe which properties are where. This is probably needed for 1 too.
You might want to totally avoid the problem by programming this in Smalltalk with Seaside on a Gemstone object oriented database. Then you can just have objects with collections and don't need so many joins.

Google Appengine: Is This a Good set of Entity Groups?

I am trying to wrap my head around Entity Groups in Google AppEngine. I understand them in general, but since it sounds like you can not change the relationships once the object is created AND I have a big data migration to do, I want to try to get it right the first time.
I am making an Art site where members can sign up as regular a regular Member or as one of a handful of non-polymorphic Entity "types" (Artist, Venue, Organization, ArtistRepresentative, etc). Artists, for example can have Artwork, which can in turn have other Relationships (Gallery, Media, etc). All these things are connected via References and I understand that you don't need Entity Groups to merely do References. However, some of the References NEED to exist, which is why I am looking at Entity Groups.
From the docs:
"A good rule of thumb for entity groups is that they should be about the size of a single user's worth of data or smaller."
That said, I have a couple hopefully yes/no questions.
Question 0: I gather you don't need Entity Groups just to do transactions. However, since Entity Groups are stored in the same region of Big Table, this helps cut down on consistency issues and race conditions. Is this a fair look at Entity Groups and Transactions together?
Question 1: When a child Entity is saved, do any parent objects get implicitly accessed/saved? i.e. If I set up an Entity Group with path Member/Artist/Artwork, if I save an Artwork object, do the Member and Artist objects get updated/accessed? I would think not, but I am just making sure.
Question 2: If the answer to Question 1 is yes, does the accessing/updating only travel up the path and not affect other children. i.e. If I update Artwork, no other Artwork child of Member is updated.
Question 3: Assuming it is very important that the Member and its associated account type entity exist when a user signs up and that only the user will be updating its Member and associated account type Entity, does it make sense to put these in Entity Groups together?
i.e. Member/Artist, Member/Organization, Member/Venue.
Similarly, assuming only the user will be able to update the Artwork entities, does it make sense to include those as well? Note: Media/Gallery/etc which are references to Artwork may be related to lots of Artwork, not just those owned by the user (i.e. many to many relations).
It makes sense to have all the user's bits in an entity group if it works the way I suspect (i.e. Q1/Q2 are "no"), since they will all be in the same region of BigTable. However, adding the Artwork to the entity group seems like it might violate the "keep it small" principal and honestly, may not need to be in Transactions aside from saving bandwidth/retrys when users are uploading artwork images.
Any thoughts? Am I approaching Entity Groups wrong?
0: You do need entity groups for transactions among multiple entities
1: Modifying/accessing children does not modify/access a parent
2: N/A
3: Sounds reasonable. My feeling is, entity groups should not be used unless you need transactions among them.
It is not necessary to have the the Artwork as a child for permission purposes. But if you need transactional modification to them (including e.g. creation and deletion) it might be better. For example: if you delete an account, you delete the user entity but before you delete the child, you get DeadlineExceeded or the server crashes. Now you have an orphaned Artwork. If you have more than 1,000 Artworks for an Artist, you must delete in batches.
Good luck!

Resources