I have 3 entities in my system. Let's call them entity A,B and C. Entity A becomes Entity B, and in turn entity B becomes Entity C. I have to keep copies of all 3 entities.
My dilemma is how to represent them in a relational data. All 3 entities have similar fields. However, some fields are applicable to some entities and some are not.
What is the recommended approach in this case. Should I use one table and then make a 'Type' field ? The problem with this approach is that some fields will be empty for soem entities. Or should I create separate tables?
Any recommendation on best practice would be greatly appreaciated
this sounds like the manager -> personnel type of hierarchical design. You may have separate tables for each entities. But if they are of same base type when you want to get list of all people, then you have to union all tables. If you use a hierarchy column with a single table approach then you have to join the table with itself. Consider these two options while choosing one solution. Both are right but have their own difficulties.
Related
Problem description
I am currently working on a project which requires a relational database for storage.
After thinking about the data and its relations for a while I ran into a quite repetitive problem:
I encountered a common data schema for entity A which contains some fields e.g. name, description, value. This entity is connected with entity B in multiple n-1 relations. So entity B has n entities A in relation rel1 and n entities A in relation rel2.
Now I am trying to break down this datamodel into a schema for a relational database (e.g. Postgres, MySQL).
After some research, I have not really found "the best" solution for this particular problem.
Some similar questions I have found so far:
Stackoverflow
DBA Stackexchange
My ideas
So I have thought about possible solutions which I am going to present here:
1. Duplicate table
The relationship from entity B to entity A has a certain meaning to it. So it is possible to create multiple tables (1 per relationship). This would solve all immediate problems but essentially duplicate the tables which means that changes now have to be reflected to multiple tables (e.g. a new column).
2. Introduce a type column
Instead of multiple relationships, I could just say "Entity B is connected with n entity A". Additionally, I would add a type column that then tells me to which relation entity A belongs. I am not exactly sure how this is represented with common ORMs like Spring-Hibernate and if this introduces additional problems that I am currently unaware of.
3. Abstract the common attributes of entity A
Another option is to create a ADetails entity, which bundles all attributes of entity A.
Then I would create two entities that represent each relationship and which are connected to the ADetails entity in a 1-to-1 relationship. This would solve the interpretation problem of the foreign key but might be too much overhead.
My Question
In the context of a medium-large-sized project, are any of these solutions viable?
Are there certain Cons that rule out one particular approach?
Are there other (better) options I haven't thought about?
I appreciate any help on this matter.
Edit 1 - PPR (Person-Party-Role)
Thanks for the suggestion from AntC. PPR Description
I think the described situation matches my problem.
Let's break it down:
Entity B is an event. There exists only one event for the given participants to make this easier. So the relationship from event to participant is 1-n.
Entity A can be described as Groups, People, Organization but given my situation they all have the same attributes. Hence, splitting them up into separate tables felt like the wrong idea.
To explain the situation with the class diagram:
An Event (Entity B) has a collection of n Groups (Entity A), n People (Entity A) and n Organizations (Entity A).
If I understand correctly the suggestion is the following:
In my case the relationship between Event and Participant is 1-n
The RefRoles table represents the ParticipantType column that descibes to which relationship the Participant belongs (is it a customer or part of the service for the event for example)
Because all my Groups, People and Organizations have the same attributes the only table required at this point is the Participant table
If there are individual attributes in the future I would introduce a new table (e.g. People) that references the Participant in a 1-1 relationship.
If there are multiple tables going to be added, the foreign key of the multiple 1-1 relationship is mutually exclusive (so there can only be one Group/Person/Organization for a participant)
Solution suggested by AntC and Christian Beikov
Splitting up the tables does make sense while keeping the common attributes in one table.
At the moment there are no individual attributes but the type column is not required anymore because the foreign keys can be used to see which relationship the entity belongs to.
I have created a small example for this:
There exist 3 types (previously type column) of people for an event: Staff, VIP, Visitor
The common attributes are mapped in a 1-1-relationship to the person table.
To make it simple: Each Person (Staff, VIP, Visitor) can only participate in one event. (Would be n-m-relationship in a more advanced example)
The database schema would be the following:
This approach is better than the type column in my opinion.
It also solves having to interprete the entity based on its type in the application later on. It is also possible to resolve a type column in an ORM (see this question) but this approach avoids the struggle if the ORM you are using does not support resolving it.
IMO since you already use dedicated terms for these objects, they probably will diverge and splitting up a table afterwards is quite some work, also on the code side, so I would suggest you map dedicated entities/tables from the beginning.
I'm new to databases and trying to understand why a junction or association table is needed when creating a many-to-many relationship.
Most of what I'm finding on Stackoverflow and elsewhere describe it in either highly technical relational theory terms or it's just described as 'that's the way it's done' without qualifying why.
Are there any relational database designs out there that support having a many-to-many relationship without the use of an association table? Why is it not possible to have, for example, a column on on table that holds the relationships to another and vice a versa.
For example, a Course table that holds a list of courses and a Student table that holds a bunch of student info — each course can have many students and each student can take many classes.
Why is it not possible to have a column on each row in either table (possibly in csv format) that contains the relationships to the others in a list or something similar?
In a relational database, no column holds more than a single value in each row. Therefore, you would never store data in a "CSV format" -- or any other multiple value system -- in a single column in a relational database. Making repeated columns that hold instances of the same item (Course1, Course2, Course3, etc) is also not allowed. This is the very first rule of relational database design and is referred to as First Normal Form.
There are very good reasons for the existence of these rules (it is enormously easier to verify, constrain, and query the data) but whether or not you believe in the benefits the rules are, none-the-less, part of the definition of relational databases.
I do not know the answer to your question, but I can answer a similar question: Why do we use a junction table for many-to-many relationships in databases?
First, if the student table keeps track of which courses the student is in and the course keeps track of which students are in it, then we have duplication. This can lead to problems. What if a student knows it is in a course, but the course doesn't know that it has that student. Every time you made a course change you would have to make sure to change it in both tables. Inevitably this will not happen every time and the data will become inconsistent.
Second, where would we store this information? A list is not a possible type for a field in a database. So do we put a course column in the student table? No, because that would only allow each student to take one course, a many-to-one relationship from students to courses. Do we put a student column in the courses table? No, because then we have one student in each course.
What does work is having a new table that has one student and one course per row. This tells us that a student is in a class without duplicating any data.
"Junction tables" come from ER/ORM presentations/methods/products that don't really understand the relational model.
In the relational model (and in original ER information modeling) application relationships are represented by relations/tables. Each table holds tuples of values that are in that relationship to each other, ie that are so related, ie that satisfy that relationship, ie that participate in the relationship.
A relationship is expressed independently of any particular situation as a predicate, a fill-in-the-(named-)blanks statement. Rows that fill in the named blanks to give a true statement from the predicate in a particular situation go in the table. We pick sufficient predicates (hence base tables) to describe every situation. Both many-to-1 and many-to-many application relationships get tables.
The reason why you don't see a lot of many-to-many relationships along with columns about the participants rather than about their participation in the relationship is that such tables are better split into ones about the participants and one for the relationship. Eg columns in a many-to-many table that are about participants 1. can't say anything about entities that don't participate and 2. say the same thing about an entity every time it participates. Information modeling techniques that focus on identifying independent entity types first then relationships between them tend to lead to designs with few such problems. The reason why you don't see many-to-many relationships in two tables is that that is redundant and susceptible to the error of the tables disagreeing. The problem with collection-valued columns (sequences/lists/arrays) is that you cannot generically query about their parts using usual query notation and implementation because the DBMS doesn't see the parts organized into a table.
See this recent answer or this one.
I have to implement a testing platform. My database needs the following tables: Students, Teachers, Admins, Personnel and others. I would like to know if it's more efficient to have the FirstName and LastName in each of these tables, or to have another table, Persons, and each of the other table to be linked to this one with PersonID.
Personally, I like it this way, although trickier to implement, because I think it's cleaner, especially if you look at it from the object-oriented point of view. Would this add an unnecessary overhead to the database?
Don't know if it helps to mention I would like to use SQL Server and ADO.NET Entity Framework.
As you've explicitly mentioned OO and that you're using EntityFramework, perhaps its worth approaching the problem instead from how the framework is intended to work - rather than just building a database structure and then trying to model it?
Entity Framework Code First Inheritance : Table Per Hierarchy and Table Per Type is a nice introduction to the various strategies that you could pick from.
As for the note on adding unnecessary overhead to the database - I wouldn't worry about it just yet. EF is generally about getting a product built more rapidly and as it has to cope with a more general case, doesn't always produce the most efficient SQL. If the performance is a problem after your application is built, working and correct you can revisit and fix up the most inefficient stuff then.
If there is a person overlap between the mentioned tables, then yes, you should separate them out into a Persons table.
If you are only tracking what role each Person has (i.e. Student vs. Teacher etc) then you might consider just having the following three tables: Persons, Roles, and a bridge table PersonRoles.
On the other hand, if each role has it's own unique fields, then you should carry on as you are and leave each of these tables separate with a foreign key of PersonID.
If the attributes (i.e. First Name, Last Name, Gender etc) of these entities (i.e. Students, Teachers, Admins and Personnel) are exactly the same then you could just make a single table for all the entities with PersonType or Role attribute added to distinguish each person's role. However, if the entities has a lot of different attributes then it would be better that you create separate tables otherwise you will have normalization problem.
Yes that is a very bad way of structuring a DB. The DB structure should be designed based on the Normalizations.
Please check the normalization forms.
U should avoid the duplicate data as much as possible, else the queries will become slower.
And the main problem is when u r trying to get data that is associated with more than one or two tables.
I'm trying to wrap my head around how I can represent a many-to-many relationship inside of AppEngine's Datastore in the Go Programming Language. I'm more used to traditional relational databases.
I have two types of entities in my system. Let's call them A and B. Every A entity is related to some number of B entities. Similarly, every B entity is related to some other number of A entities. I'd like to be able to efficiently query for all B entities given an A entity, and for all A entities given a Bentity.
In the Python SDK, there seems to be a way to note fields in an entity can be ReferencePropertys which reference some other entity. However, I can't find something similar in Go's AppEngine SDK. Go seems to just use basic structs to represent entities.
What's the best practice for dealing with this?
A python ReferenceProperty essentially stores a key to another entity. It's similar to using a Key field in Go.
There's at least two ways to solve your problem. A cheap way to store a limited number of references, and an expensive way for larger data sets.
fmt.Println.MKO provided the answer for the cheap way, except the query is simpler than what he suggests, it should actually be:
SELECT * FROM B where AIds = 'A1'
This method is limited to the number of indexed entries per entity, as well as the entity size. So the list of AIds or BIds will limit the number of entities to 20000 or less.
If you have an insane amount of data, you would want a mapping entity to represent the M2M relationship between a given A & B entity. It would simply contain a key to an A and a key to a B. You would then query for map entities, and then fetch the corresponding A or B entities you need. This would be much more expensive, but breaks past the entity size limit.
based on how you which to query you could do the following:
in your struct A add a field:
BIds []int64
in your struct B add a field:
AIds []int64
now any time you add a relation between A and B you just need to add the corresponding ids to your two variables
when you need to query now for all B which are related to this A1 you do your query like this:
SELECT * FROM B where AIds = 'A1'
for all A wich are related to this B1 your do it similar:
SELECT * FROM A where BIds = 'B1'
update:
altered querys on suggestion from dragonx
Consider we have a database that has a table, which is a record of a sale. You sell both products and services, so you also have a product and service table.
Each sale can either be a product or a service, which leaves the options for designing the database to be something like the following:
Add columns for each type, ie. add Service_id and Product_id to Invoice_Row, both columns of which are nullable. If they're both null, it's an ad-hoc charge not relating to anything, but if one of them is satisfied then it is a row relating to that type.
Add a weird string/id based system, for instance: Type_table, Type_id. This would be a string/varchar and integer respectively, the former would contain for example 'Service', and the latter the id within the Service table. This is obviously loose coupling and horrible, but is a way of solving it so long as you're only accessing the DB from code, as such.
Abstract out the concept of "something that is chargeable" for with new tables, of which Product and Service now are an abstraction of, and on the Invoice_Row table you would link to something like ChargeableEntity_id. However, the ChargeableEntity table here would essentially be redundant as it too would need some way to link to an abstract "backend" table, which brings us all the way back around to the same problem.
Which way would you choose, or what are the other alternatives to solving this problem?
What you are essentially asking is how to achieve polymorphism in a relational database. There are many approaches (as you yourself demonstrate) to this problem. One solution is to use "table per class" inheritance. In this setup, there will be a parent table (akin to your "chargeable item") that contains a unique identifier and the fields that are common to both products and services. There will be two child tables, products and goods: Each will contain the unique identifier for that entity and the fields specific to it.
One benefit to this approach over others is you don't end up with one table with many nullable columns that essentially becomes a dumping ground to describe anything ("schema-less").
One downside is as your inheritance hierarchy grows, the number of joins needed to grab all the data for an entity also grows.
I believe it depends on use case(s).
You could put the common columns in one table and put product and service specific columns in its own tables.Here the deal is that you need to join stuff.
Else if you maintain two separate tables, one for Product and another for Sale. You use application logic to determine which table to insert into. And getting all sales will essentially mean , union of getting all products and getting all sale.
I would go for approach 2 personally to avoid joins and inserting into two tables whenever a sale is made.