I am trying to create a relational schema using this image.
But i don't know where to start. Can someone suggest some references to help get me started?
Start by specifying the types of facts that you want to record, in the form of predicate sentences with placeholders for values. For example:
The country with code <COUNTRY_CODE> is named <COUNTRY_NAME>
The language with code <LANGUAGE_CODE> is an official language of <COUNTRY_CODE>
<COUNTRY_CODE> has a <SUBDIVISION_TYPE> called <SUBDIVISION_NAME>
<COUNTRY_CODE> has a city called <CITY_NAME> in <SUBDIVISION_NAME>
Next, identify the domains of each placeholder as well as subset relationships between domains (these are your IS-A relationships and will eventually be enforced via foreign key constraints). Identify functional and multivalued dependencies, and normalize where required. If you kept your fact types simple you won't need much of the latter.
For more information, look into fact-based (i.e. relational) modeling disciplines like Object-Role Modeling.
Related
Problem description
I am currently working on a project which requires a relational database for storage.
After thinking about the data and its relations for a while I ran into a quite repetitive problem:
I encountered a common data schema for entity A which contains some fields e.g. name, description, value. This entity is connected with entity B in multiple n-1 relations. So entity B has n entities A in relation rel1 and n entities A in relation rel2.
Now I am trying to break down this datamodel into a schema for a relational database (e.g. Postgres, MySQL).
After some research, I have not really found "the best" solution for this particular problem.
Some similar questions I have found so far:
Stackoverflow
DBA Stackexchange
My ideas
So I have thought about possible solutions which I am going to present here:
1. Duplicate table
The relationship from entity B to entity A has a certain meaning to it. So it is possible to create multiple tables (1 per relationship). This would solve all immediate problems but essentially duplicate the tables which means that changes now have to be reflected to multiple tables (e.g. a new column).
2. Introduce a type column
Instead of multiple relationships, I could just say "Entity B is connected with n entity A". Additionally, I would add a type column that then tells me to which relation entity A belongs. I am not exactly sure how this is represented with common ORMs like Spring-Hibernate and if this introduces additional problems that I am currently unaware of.
3. Abstract the common attributes of entity A
Another option is to create a ADetails entity, which bundles all attributes of entity A.
Then I would create two entities that represent each relationship and which are connected to the ADetails entity in a 1-to-1 relationship. This would solve the interpretation problem of the foreign key but might be too much overhead.
My Question
In the context of a medium-large-sized project, are any of these solutions viable?
Are there certain Cons that rule out one particular approach?
Are there other (better) options I haven't thought about?
I appreciate any help on this matter.
Edit 1 - PPR (Person-Party-Role)
Thanks for the suggestion from AntC. PPR Description
I think the described situation matches my problem.
Let's break it down:
Entity B is an event. There exists only one event for the given participants to make this easier. So the relationship from event to participant is 1-n.
Entity A can be described as Groups, People, Organization but given my situation they all have the same attributes. Hence, splitting them up into separate tables felt like the wrong idea.
To explain the situation with the class diagram:
An Event (Entity B) has a collection of n Groups (Entity A), n People (Entity A) and n Organizations (Entity A).
If I understand correctly the suggestion is the following:
In my case the relationship between Event and Participant is 1-n
The RefRoles table represents the ParticipantType column that descibes to which relationship the Participant belongs (is it a customer or part of the service for the event for example)
Because all my Groups, People and Organizations have the same attributes the only table required at this point is the Participant table
If there are individual attributes in the future I would introduce a new table (e.g. People) that references the Participant in a 1-1 relationship.
If there are multiple tables going to be added, the foreign key of the multiple 1-1 relationship is mutually exclusive (so there can only be one Group/Person/Organization for a participant)
Solution suggested by AntC and Christian Beikov
Splitting up the tables does make sense while keeping the common attributes in one table.
At the moment there are no individual attributes but the type column is not required anymore because the foreign keys can be used to see which relationship the entity belongs to.
I have created a small example for this:
There exist 3 types (previously type column) of people for an event: Staff, VIP, Visitor
The common attributes are mapped in a 1-1-relationship to the person table.
To make it simple: Each Person (Staff, VIP, Visitor) can only participate in one event. (Would be n-m-relationship in a more advanced example)
The database schema would be the following:
This approach is better than the type column in my opinion.
It also solves having to interprete the entity based on its type in the application later on. It is also possible to resolve a type column in an ORM (see this question) but this approach avoids the struggle if the ORM you are using does not support resolving it.
IMO since you already use dedicated terms for these objects, they probably will diverge and splitting up a table afterwards is quite some work, also on the code side, so I would suggest you map dedicated entities/tables from the beginning.
I'm new to data modelling and have started following tutorials to learn more.
I am trying to create a model for a hypothetical scenario and am struggling to validate what I have created to see if it is what would be considered a correct data model.
Essentially all im trying to do is correctly store data in a normalised form. In my scenario there are 3 types of people and each share some attributes and have one set of contact details each.
Does the below data model look feasible?
The relationship between person and one of defendant, magistrate, or staff-member is a case of the class/subclass pattern. There are two common ways of modeling this pattern in relational tables.
One way is called "Class Table Inheritance". You can find out more by visiting this tag: class-table-inheritance or by searching the web for Martin Fowler's treatment of the same subject. Your design resembles this design.
Another way is called "Single Table Inheritance", which you can also research the same way. single-table-inheritance. It's simpler, and works ok in some cases. You deal with fewer joins, but you deal with more NULLS.
Many people who go for class table inheritance also apply a technique called "Shared Primary Key". shared-primary-key. Using this technique, Defendant, Magistrate, and Staff_Member would each use a copy of person_id as the primary key. This primary key also functions as a foreign key. Shared primary key enforces the one-to-one nature of the IS-A relationships that exist in this case.
If you want to go further in data modeling, you might want to learn ER modeling as a distinct data model from the relational model. What you've done here is essentially to use ER diagramming to diagram a relational model. There's nothing wrong with that, but it obscures a whole new field of study, generally called conceptual data modeling.
If you generate an ER model at the conceptual level, you don't attempt to implement it in terms of tables. There is a diagramming convention in ER that goes under the name "generalization/specialization" that allows you to depict a class/subclass situation, while remaining silent on how it's going to be implemented.
Conceptual data models have an area of usefulness, in addition to relational data modeling. What makes conceptual data models useful is precisely the fact that they present the information requirements without stating how those requirements are going to be met.
Once you are proficient at creating conceptual data models, it's not hard to convert one of them to a relational model.
This may be more than you bargained for, but since you are taking on learning modeling, I thought I'd survey some of the field for you.
I have to design a generic entity that would be able to refer to variated other entities.
In my example, that would be a commentary entity inside a web application. You could post commentaries on to users, classifieds, articles, varieties (botanical ones), and so on.
So that entity would be made like this:
As a matter of fact, the design (kind of) pattern would be this one:
What are the pros and cons of this kind of pattern?
What I see is:
Pros
It decreases the number of entities if the concept is the same (commentaries for example);
You can therefore easily manipulate heterogeneous objects;
You can aggregate these objects easily (e.g. this user's last commentaries in the whole site, presented easily in a same thread);
Cons
This allows you to fall in the ugly (you use it outrageously and your database and source code are ugly);
There is no control in the database, and this one must therefore be done inside the application code.
What are the performances impacts?
Conclusion
Is this kind of pattern suitable for a relational database? How can we do then?
Thank you by advance.
One more con :
This scheme relies on a mapping between values and names for the "entities" referred to by those values. Think of all the fun you'll have resolving issues that in the TEST system, the ORDER entity has number 734 but in production, it has number 256. You can use the entity names themselves as the values of your entity_id stuff, but you will never be able to avoid hardcoding values for them in your programs (or, say, in view definitions) anyway. Thereby defeating whatever advantage it was you thought you could win.
This kind of scheme is a disease mostly suffered by OO programmers. They see structures that are largely similar and they have this instinctive reflex "I must find a way to resue the existing thing for this". Forgetting that database design is not program design.
EDIT
(if it wasn't clear, this means my answer to your question "Is this kind of pattern suitable for a relational database?" is a principled "NO".)
This is the classic Polymorphic Association anti-pattern. There are a number of possible solutions:
1) Exclusive Arcs e.g. for the Commentary entity
Id
User_Id
Classified_Id
Article_Id
Variety_Id
Where User_Id, Classified_Id, Article_Id and Variety_Id are nullable and exactly one must be not null.
2) Reverse the Relationship e.g remove the Target_Entity and Target_Entity_Id from the Commentary entity and create four new entities
User_Commentary
Commentary_Id
User_Id
Classified_Commentary
Commentary_Id
Classified_Id
Article_Commentary
Commentary_Id
Article_Id
Variety_Commentary
Commentary_Id
Variety_Id
Where Commentary_Id is unique and relates to the Id in Commentary.
3) Create a super-type entity for User, Classified, Article and Variety and have the Commentary entity reference the unique attribute of this new entity.
You would need to decide which of these approaches you feel is most appropriate in your specific situation.
This is probably a simple problem for an experienced database developer, but I'm struggling... I have trouble translating a certain ER diagram to a DB model, any help is appreciated.
I have a setup similar to slide 17 of this presentation:
http://www.cbe.wwu.edu/misclasses/mis421s04/presentations/supersubtype.ppt
Slide 17 shows an ER diagram with an Employee supertype having an Employee Type attribute and as subtypes the Employee Types themselves (Hourly, Salaried and Consultant), which is very similar to my design situation.
In my case, suppose Salaried Employees are the only ones that can be bosses of other employees and I wanted to somehow indicate if a certain Salaried employee is the boss of the Hourly and/or Salaried Employee and/or Consultant (either, none or both), how could that be designed in a database model, also considering these are one-to-many relationships?
I can put a PK-FK relationship between them, which would result in all tables having two FKeys and (like Consultant having FK_Employee and FK_SalariedEmployee) and SalariedEmployee referencing itself, but I keep thinking that might not be the wisest solution....although I'm not sure why (integrity issues?).
Is this or an acceptable solution or is there a better one?
Thanks in advance for any help!
Your case looks like an instance of the design pattern known as “Generalization Specialization” (Gen-Spec for short). The gen-spec pattern is familiar to object oriented programmers. It’s covered in tutorials when teaching about inheritance and subclasses.
The design of SQL tables that implement the gen-spec pattern can be a little tricky. Database design tutorials often gloss over this topic. But it comes up again and again in practice.
If you search the web on “generalization specialization relational modeling” you’ll find several useful articles that teach you how to do this. You’ll also be pointed to several times this topic has come up before in this forum.
The articles generally show you how to design a single table to capture all the generalized data and one specialized table for each subclass that will contain all the data specific to that subclass. The interesting part involves the primary key for the subclass tables. You won’t use the autonumber feature of the DBMS to populate the sub class primary key. Instead, you’ll program the application to propagate the primary key value obtained for the generalized table to the appropriate subclass table.
This creates a two way association between the generalized data and the specialized data. A simple view for each specialized subclass will collect generalized and specialized data together. It’s easy once you get the hang of it, and it performs fairly well.
In your specific case, declaring the "boss of" FK to reference the PK in the Salaried Employees table will be enough to do the trick. This will produce the two way association you want, and also prevent employees who are not salaried from being referenced as bosses.
There are couples of questions around asking for difference / explanation on identifying and non-identifying relationship in relationship database.
My question is, can you think of a simpler term for these jargons? I understand that technical terms have to be specific and unambiguous though. But having an 'alternative name' might help students relate more easily to the concept behind.
We actually want to use a more layman term in our own database modeling tool, so that first-time users without much computer science background could learn faster.
cheers!
I often see child table or dependent table used as a lay term. You could use either of those terms for a table with an identifying relationship
Then say a referencing table is a table with a non-identifying relationship.
For example, PhoneNumbers is a child of Users, because a phone number has an identifying relationship with its user (i.e. the primary key of PhoneNumbers includes a foreign key to the primary key of Users).
Whereas the Users table has a state column that is a foreign key to the States table, making it a non-identifying relationship. So you could say Users references States, but is not a child of it per se.
I think belongs to would be a good name for the identifying relationship.
A "weak entity type" does not have its own key, just a "partial key", so each entity instance of this weak entity type has to belong to some other entity instance so it can be identified, and this is an "identifying relationship". For example, a landlord could have a database with apartments and rooms. A room can be called kitchen or bathroom, and while that name is unique within an apartment, there will be many rooms in the database with the name kitchen, so it is just a partial key. To uniquely identify a room in the database, you need to say that it is the kitchen in this particular apartment. In other words, the rooms belong to apartments.
I'm going to recommend the term "weak entity" from ER modeling.
Some modelers conceptualize the subject matter as being made up of entities and relationships among entities. This gives rise to Entity-Relationship Modeling (ER Modeling). An attribute can be tied to an entity or a relationship, and values stored in the database are instances of attributes.
If you do ER modeling, there is a kind of entity called a "weak entity". Part of the identity of a weak entity is the identity of a stronger entity, to which the weak one belongs.
An example might be an order in an order processing system. Orders are made up of line items, and each line item contains a product-id, a unit-price, and a quantity. But line items don't have an identifying number across all orders. Instead, a line item is identified by {item number, order number}. In other words, a line item can't exist unless it's part of exactly one order. Item number 1 is the first item in whatever order it belongs to, but you need both numbers to identify an item.
It's easy to turn an ER model into a relational model. It's also easy for people who are experts in the data but know nothing about databases to get used to an ER model of the data they understand.
There are other modelers who argue vehemently against the need for ER modeling. I'm not one of them.
Nothing, absolutely nothing in the kind of modeling where one encounters things such as "relationships" (ER, I presume) is "technical", "precise" or "unambiguous". Nor can it be.
A) ER modeling is always and by necessity informal, because it can never be sufficient to capture/express the entire definition of a database.
B) There are so many different ER dialects out there that it is just impossible for all of them to use exactly the same terms with exactly the same meaning. Recently, I even discovered that some UK university that teaches ER modeling, uses the term "entity subtype" for the very same thing that I always used to name "entity supertype", and vice-versa !
One could use connection.
You have Connection between two tables, where the IDs are the same.
That type of thing.
how about
Association
Link
Correlation