Generic relation for database - database

I have to design a generic entity that would be able to refer to variated other entities.
In my example, that would be a commentary entity inside a web application. You could post commentaries on to users, classifieds, articles, varieties (botanical ones), and so on.
So that entity would be made like this:
As a matter of fact, the design (kind of) pattern would be this one:
What are the pros and cons of this kind of pattern?
What I see is:
Pros
It decreases the number of entities if the concept is the same (commentaries for example);
You can therefore easily manipulate heterogeneous objects;
You can aggregate these objects easily (e.g. this user's last commentaries in the whole site, presented easily in a same thread);
Cons
This allows you to fall in the ugly (you use it outrageously and your database and source code are ugly);
There is no control in the database, and this one must therefore be done inside the application code.
What are the performances impacts?
Conclusion
Is this kind of pattern suitable for a relational database? How can we do then?
Thank you by advance.

One more con :
This scheme relies on a mapping between values and names for the "entities" referred to by those values. Think of all the fun you'll have resolving issues that in the TEST system, the ORDER entity has number 734 but in production, it has number 256. You can use the entity names themselves as the values of your entity_id stuff, but you will never be able to avoid hardcoding values for them in your programs (or, say, in view definitions) anyway. Thereby defeating whatever advantage it was you thought you could win.
This kind of scheme is a disease mostly suffered by OO programmers. They see structures that are largely similar and they have this instinctive reflex "I must find a way to resue the existing thing for this". Forgetting that database design is not program design.
EDIT
(if it wasn't clear, this means my answer to your question "Is this kind of pattern suitable for a relational database?" is a principled "NO".)

This is the classic Polymorphic Association anti-pattern. There are a number of possible solutions:
1) Exclusive Arcs e.g. for the Commentary entity
Id
User_Id
Classified_Id
Article_Id
Variety_Id
Where User_Id, Classified_Id, Article_Id and Variety_Id are nullable and exactly one must be not null.
2) Reverse the Relationship e.g remove the Target_Entity and Target_Entity_Id from the Commentary entity and create four new entities
User_Commentary
Commentary_Id
User_Id
Classified_Commentary
Commentary_Id
Classified_Id
Article_Commentary
Commentary_Id
Article_Id
Variety_Commentary
Commentary_Id
Variety_Id
Where Commentary_Id is unique and relates to the Id in Commentary.
3) Create a super-type entity for User, Classified, Article and Variety and have the Commentary entity reference the unique attribute of this new entity.
You would need to decide which of these approaches you feel is most appropriate in your specific situation.

Related

Relational Database: Reusing the same table in a different interpretation

Problem description
I am currently working on a project which requires a relational database for storage.
After thinking about the data and its relations for a while I ran into a quite repetitive problem:
I encountered a common data schema for entity A which contains some fields e.g. name, description, value. This entity is connected with entity B in multiple n-1 relations. So entity B has n entities A in relation rel1 and n entities A in relation rel2.
Now I am trying to break down this datamodel into a schema for a relational database (e.g. Postgres, MySQL).
After some research, I have not really found "the best" solution for this particular problem.
Some similar questions I have found so far:
Stackoverflow
DBA Stackexchange
My ideas
So I have thought about possible solutions which I am going to present here:
1. Duplicate table
The relationship from entity B to entity A has a certain meaning to it. So it is possible to create multiple tables (1 per relationship). This would solve all immediate problems but essentially duplicate the tables which means that changes now have to be reflected to multiple tables (e.g. a new column).
2. Introduce a type column
Instead of multiple relationships, I could just say "Entity B is connected with n entity A". Additionally, I would add a type column that then tells me to which relation entity A belongs. I am not exactly sure how this is represented with common ORMs like Spring-Hibernate and if this introduces additional problems that I am currently unaware of.
3. Abstract the common attributes of entity A
Another option is to create a ADetails entity, which bundles all attributes of entity A.
Then I would create two entities that represent each relationship and which are connected to the ADetails entity in a 1-to-1 relationship. This would solve the interpretation problem of the foreign key but might be too much overhead.
My Question
In the context of a medium-large-sized project, are any of these solutions viable?
Are there certain Cons that rule out one particular approach?
Are there other (better) options I haven't thought about?
I appreciate any help on this matter.
Edit 1 - PPR (Person-Party-Role)
Thanks for the suggestion from AntC. PPR Description
I think the described situation matches my problem.
Let's break it down:
Entity B is an event. There exists only one event for the given participants to make this easier. So the relationship from event to participant is 1-n.
Entity A can be described as Groups, People, Organization but given my situation they all have the same attributes. Hence, splitting them up into separate tables felt like the wrong idea.
To explain the situation with the class diagram:
An Event (Entity B) has a collection of n Groups (Entity A), n People (Entity A) and n Organizations (Entity A).
If I understand correctly the suggestion is the following:
In my case the relationship between Event and Participant is 1-n
The RefRoles table represents the ParticipantType column that descibes to which relationship the Participant belongs (is it a customer or part of the service for the event for example)
Because all my Groups, People and Organizations have the same attributes the only table required at this point is the Participant table
If there are individual attributes in the future I would introduce a new table (e.g. People) that references the Participant in a 1-1 relationship.
If there are multiple tables going to be added, the foreign key of the multiple 1-1 relationship is mutually exclusive (so there can only be one Group/Person/Organization for a participant)
Solution suggested by AntC and Christian Beikov
Splitting up the tables does make sense while keeping the common attributes in one table.
At the moment there are no individual attributes but the type column is not required anymore because the foreign keys can be used to see which relationship the entity belongs to.
I have created a small example for this:
There exist 3 types (previously type column) of people for an event: Staff, VIP, Visitor
The common attributes are mapped in a 1-1-relationship to the person table.
To make it simple: Each Person (Staff, VIP, Visitor) can only participate in one event. (Would be n-m-relationship in a more advanced example)
The database schema would be the following:
This approach is better than the type column in my opinion.
It also solves having to interprete the entity based on its type in the application later on. It is also possible to resolve a type column in an ORM (see this question) but this approach avoids the struggle if the ORM you are using does not support resolving it.
IMO since you already use dedicated terms for these objects, they probably will diverge and splitting up a table afterwards is quite some work, also on the code side, so I would suggest you map dedicated entities/tables from the beginning.

Does the min/max notation relationship matche what i am trying to get?

is this the right way to represent this relationship which is described in text on the picture? this is in min/max notation
http://s7.postimg.org/holux2uwb/image.jpg
There is a huge lack of context here. I'll just kick a answer blindly.
In many cases while modeling data an order is usually seen as an event. I do not know exactly what is a "Bugel Card", but if it is a name of an identity such as a noun, and it has properties/attributes that must be stored, as I suspect it is the Customer, then we have two entities that have a relationship: the Customer entity, and the Bugel Card entity. The resulting connection/relationship/link forms the Order event.
If in an Order a Customer ALWAYS uses AT LEAST 1 "Bugel Card", and not more than that, then we have a cardinality (following the notation min max) of (1,1) between Customer and Bugel Card Entities, in both sides. For relationships (1,1) it takes the data modeler's discretion on which side will be set the relationship between the entities, that is, where the foreign key will go (once you decompose the Conceptual Model). It is always recommended to leave the foreign key on the side where in the future the relationship can become "many".
If you can improve a little more the context here, I can give you an answer with more accuracy (more correct), and remember:
Do not model data without a full context. When you go to an Entity Relationship Diagram starting from the Conceptual Model, you need a context, and one that is very well described. Without a full context, there is no diagram, and as a result, there is no database schema (or much less a system to use and manage).
Other than that, it is not possible to model entities without properties / attributes. Without them, an entity is nothing, because in its decomposition there will be no column to be created, and soon there will be no data to be persisted. Even if in your modeling process you let to define the attributes later you can end up confusing yourself and/or forgetting something. This is something prone to errors.
To be honest, there is no standard way of modeling data. What I have spoken so far are just data modeling tips. It is up to you what you want to do, and how you want to do.
Any questions, or anything else you need, please comment and I help you.

Supertype/subtype db design with subtype cross-link

This is probably a simple problem for an experienced database developer, but I'm struggling... I have trouble translating a certain ER diagram to a DB model, any help is appreciated.
I have a setup similar to slide 17 of this presentation:
http://www.cbe.wwu.edu/misclasses/mis421s04/presentations/supersubtype.ppt
Slide 17 shows an ER diagram with an Employee supertype having an Employee Type attribute and as subtypes the Employee Types themselves (Hourly, Salaried and Consultant), which is very similar to my design situation.
In my case, suppose Salaried Employees are the only ones that can be bosses of other employees and I wanted to somehow indicate if a certain Salaried employee is the boss of the Hourly and/or Salaried Employee and/or Consultant (either, none or both), how could that be designed in a database model, also considering these are one-to-many relationships?
I can put a PK-FK relationship between them, which would result in all tables having two FKeys and (like Consultant having FK_Employee and FK_SalariedEmployee) and SalariedEmployee referencing itself, but I keep thinking that might not be the wisest solution....although I'm not sure why (integrity issues?).
Is this or an acceptable solution or is there a better one?
Thanks in advance for any help!
Your case looks like an instance of the design pattern known as “Generalization Specialization” (Gen-Spec for short). The gen-spec pattern is familiar to object oriented programmers. It’s covered in tutorials when teaching about inheritance and subclasses.
The design of SQL tables that implement the gen-spec pattern can be a little tricky. Database design tutorials often gloss over this topic. But it comes up again and again in practice.
If you search the web on “generalization specialization relational modeling” you’ll find several useful articles that teach you how to do this. You’ll also be pointed to several times this topic has come up before in this forum.
The articles generally show you how to design a single table to capture all the generalized data and one specialized table for each subclass that will contain all the data specific to that subclass. The interesting part involves the primary key for the subclass tables. You won’t use the autonumber feature of the DBMS to populate the sub class primary key. Instead, you’ll program the application to propagate the primary key value obtained for the generalized table to the appropriate subclass table.
This creates a two way association between the generalized data and the specialized data. A simple view for each specialized subclass will collect generalized and specialized data together. It’s easy once you get the hang of it, and it performs fairly well.
In your specific case, declaring the "boss of" FK to reference the PK in the Salaried Employees table will be enough to do the trick. This will produce the two way association you want, and also prevent employees who are not salaried from being referenced as bosses.

What is the best way to realize this database

I have to realize a system with different kind of users and I think to realize it in this way:
A user table with only id, email and password.
Two different tables correlated to the user table in a 1-to-1 relation. Each table define specific attributes of each kind of user.
Is this the best way to realize it? I should use the InnoDB storage engine?
If I realize it in this way, how can I handle the tables in the Zend Framework?
I can't answer the second part of your question but the pattern you describe is called super and subtype in datamodelling. If this is the right choice can't be answered without knowing more about the differences between these user types and how they will be used in the application. There are different approaches when converting logical super/subtypes into physical tables.
Here are some relevant links:
http://www.sqlmag.com/article/data-modeling/implementing-supertypes-and-subtypes
and the next one about pitfalls and (mis)use of subtyping
http://www.ocgworld.com/doc/OCG_Subtyping_Techniques.pdf
In general I am, from a pragmatic point of view, very reluctant to follow your choice and most often opt to create one table containing all columns. In most cases there are a number of places where the application needs show all users in some sort of listing with specific columns for specific types (and empty if not applicable for that type). It quickly leads to non-straigtforward queries and all sort of extra code to deal with the different tables that it's just not worth being 'conceptually correct'.
Two reasons for me to still split the subtypes into different tables are if the subtypes are so truly different that it makes no logical sense to have them in one table and if the number of rows is so enormous that the overhead of the 'unneeded' columns when putting it all in one table actually starts to matter
On php side you can use Doctrine 2 ORM. It's easy to integrate with zf, and you could easily implement this table structure as inheritance in your doctrine mapping.

What is a "database entity" and what types of DBMS items are considered entities? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Is it things like tables? Or would it also include things like constraints, stored procedures, packages, etc.?
I've looked around the internet, but finding elementary answers to elementary questions is sometimes a little difficult.
That's quite a general question!
Basically, all types that the database system itself offers, like NUMERIC, VARCHAR etc., or that the programming language of choice offers (int, string etc.) would be considered "atomic" data(base) types.
Anything that you - based on your program's or business' requirements - build from that, business objects and so forth, are entities.
Tables, constraints and so forth are database-internal objects needed to store and retrieve data, but those are general not considered "entities". The data stored in your tables, when retrieved and converted into an object, that then is an entity.
Marc
In the entity relationship world an entity is something that may exist independently and so there is often a one-to-one relationship between entities and database tables. However, this mapping is an implementation decision: For example, an ER diagram may contain three entities: Triangle, Square and Circle and these could potentially be modelled as a single table: Shape.
Also note that some database tables may represent relationships between entities.
This seems helpful: http://en.wikipedia.org/wiki/Entity-relationship_model
In a database an entity is a table. The table represents whatever real world concept you are trying to model (person, transaction, event).
Contraints can represents relationships between entities. These would be foreign keys. They also enforce rules like first_name can not be blank (null). A transaction must have 1 or more items. An event must have a date time.
Stored Procedures / Packages / Triggers could handle more complex relationships and/or they can handle business rules, just depends on what it's doing.
it kind of depends how you think about it and how you model your problem domain. most of the time when you hear about entities, they are database tables (one or many) mapped onto object classes. So it's not really an entity until it's been queried for and turned into a class instance.
but again, it depends on your modeling methodology, and there are multiple :-)
This thread is demomnstrating one reason why it is difficult to find "elementary answers to elementary questions". Certain words have been used by different programming paradigms to mean different things (try asking a bunch of OO programmers what is the difference between a Class and an Object sometime).
Here's my take on it.
I first came across Entity as a modelling term in SSADM (ask your dad). In that context an Entity is used to model a logical clump of datas during the requirements gathering / analysis phase. The relationships between entities were modelled using the Entity Relationship diagrams, and the profile of an Enity was modelled using Entity Life Histories. ELH diagrams were very useful in COBOL systems but utterly horrible in relational databases. ERDs on the other hand continue to be useful to this day.
During the design and implementation phases the Entities get resolved into database tables, objects or records in a COBOL input file. In the course of that process a logical entity may get split across multiple tables, or several entities may get squidged into a single table, or there may be a one-to-one mapping. Sometimes an entity is resolved away entirely or lingers on as a view or a stored procedure.
My answer is obviously a little late, but here it is as defined in a database certification text book:
Entity: A uniquely identifiable element about which data is stored in a database.
and to clear up entity and table confusion,
Entity is not a table. Tables can be called "tables" or "relations" the words are synonymous.
We'd need to know some context. One thing people sometimes do when analysing data in prepartion for designing a database is to create an Entity Realtionship Diagram, where you are considering what data items you are managing and their relationships.
I wonder if that's the context you mean?
If so perhaps a read of this article would get you started?
Entities are "things of significance" to the users/business/enterprise/problem domain.
Update:
See this article in my blog in which I try to cover the subject in more detail:
What is entity-relationship model?
An entity is a term from the entity-relationship model.
A relational model (your database schema) is one of the ways to implement the ER model.
Relational tables represent relations between simple types like integers and strings, which, in their turn, can represent everything: entities, attributes, relationships.
You cannot tell what is it only from the relational structure, you need to see the ER model.
For table persons,
id name surname
1 John Smith
id, name and surname are entities in the real world and may or may not represent entities in the underlying ER model.
The fact of a record exists in the table means that these entities are in the following relation: "person 1 has name John and has surname Smith".
In the example above, the entity is defined by id (from the model's point of view).
If a person changes his name from John to Jack, the person remains the same (again, from the model's point of view), but gets related to another name.
In example above name and surname can be treated as attribute (as opposed to entity), but again, you need to see the ER model which this schema implements to tell what is it.
In some ER-to-relational model mappings, an entity should be defined in a table referenceable with a FOREIGN KEY to be considered an entity (which should constrain its domain).
However, this constraint can exist but not be represented in a database (due to technological limitations or something else).
Like, we cannot keep a list of all possible names, but the name of ##$^# is most probably a non-name, hence, it does not belong to the domain of names.
Therefore, an attribute is an entity which can participate in a relationship but cannot be contained in a domain-defining table.
For instance, the table prices:
good_id price
defines relationships between the set of goods (which is defined by the table goods) and the set of real numbers (which cannot be contained in a table since it's not even countable).
Still each price (like $2.00) is a real-world entity just as well.

Resources