Proper name for an intermediate table between others two intermediate - database

I have 4 entities: Event, Message, Flow and Document.
Event table stores a limited (seeded) number of records. Message has many events and each event can be related to many messages. The name event_message was given for the intermediate table.
As you can see, the convention for intermediate tables are: {tablename}_{tablename}.
Flow table stores a limited (seeded) number of records. Message has many flows and each flow can be related to many messages. The name flow_message was given for the intermediate table.
A document is created on each relation between Flow and Message (each record on flow_message).
The issue starts here:
Each event on a message has different documents by flow. It means: for each new record on intermediate table flow_message, each record on intermediate event_message has a new document related.
To solve this, I created an intermediate table between event_message and flow_message named: event_message_flow_message.
Is this correct (in some conventional way)? Is this modeling correct?
How to proper model and naming the intermediate table derivative by two others intermediate tables?

I also wish there was some convention. Since I do not know any official convention, I invented mine. The important thing is to respect the convention you choose.
So I would change the event_message_flow_message to rel_eventmessage_flowmessage.
But for me your convention is pretty nice.

It's hard to make a recommendation because your model seems a bit odd to me. You have 1:1 relationships between both DOCUMENT and FLOW_MESSAGE and DOCUMENT and EVENT_MESSAGE_FLOW_MESSAGE. It's hard to reconcile this in my mind with the many to one relationships to EVENT_MESSAGE_FLOW_MESSAGE. If you're relationships to DOCUMENT are really 1:1 (mandatory), then why keep documents in a separate table?
To address your question about table naming: I would argue that the {table}_{table} convention for naming intersection tables is not a best practice but rather a fallback for cases where you can't think of a better name.
The best practice is for names of tables to reflect the business name of the thing which is recorded / described by the data in the table. It's not always possible to do this, especially for intersection tables. Intersection tables represent many-to-many relationships, and relationships are often difficult to describe with a noun.
In your case, I don't think that your convention is actually making things especially easy to understand. I'd probably try to simplify with something like MESSAGE_DOCUMENT or even just DOCUMENT - since these seem to be 1:1 related in any case.

Related

Parent child design to easily identify child type

In our database design we have a couple of tables that describe different objects but which are of the same basic type. As describing the actual tables and what each column is doing would take a long time I'm going to try to simplify it by using a similar structured example based on a job database.
So say we have following tables:
These tables have no connections between each other but share identical columns. So the first step was to unify the identical columns and introduce a unique personId:
Now we have the "header" columns in person that are then linked to the more specific job tables using a 1 to 1 relation using the personId PK as the FK. In our use case a person can only ever have one job so the personId is also unique across the Taxi driver, Programmer and Construction worker tables.
While this structure works we now have the use case where in our application we get the personId and want to get the data of the respective job table. This gets us to the problem that we can't immediately know what kind of job the person with this personId is doing.
A few options we came up with to solve this issue:
Deal with it in the backend
This means just leaving the architecture as it is and look for the right table in the backend code. This could mean looking through every table present and/or construct a semi-complicated join select in which we have to sift through all columns to find the ones which are filled.
All in all: Possible but means a lot of unecessary selects. We also would like to keep such database oriented logic in the actual database.
Using a Type Field
This means adding a field column in the Person table filled for example with numbers to determine the correct child table like:
So you could add a 0 in Type if it's a taxi driver, a 1 if it's a programmer and so on...
While this greatly reduced the amount of backend logic we then have to make sure that the numbers we use in the Type field are known in the backend and don't ever change.
Use separate IDs for each table
That means every job gets its own ID (has to be nullable) in Person like:
Now it's easy to find out which job each person has due to the others having an empty ID.
So my question is: Which one of these designs is the best practice? Am i missing an obvious solution here?
Bill Karwin made a good explanation on a problem similar to this one. https://stackoverflow.com/a/695860/7451039
We've now decided to go with the second option because it seem to come with the least drawbacks as described by the other commenters and posters. As there was no actual answer portraying the second option as a solution i will try to summarize our reasoning:
Against Option 1:
There is no way to distinguish the type from looking at the parent table. As a result the backend would have to include all logic which includes scanning all tables for the that contains the id. While you can compress most of the logic into a single big Join select it would still be a lot more logic as opposed to the other options.
Against Option 3:
As #yuri-g said this one is technically not possible as the separate IDs could not setup as primary keys. They would have to be nullable and as a result can't be indexed, essentially rendering the parent table useless as one of the reasons for it was to have a unique personID across the tables.
Against a single table containing all columns:
For smaller use cases as the one i described in the question this might me viable but we are talking about a bunch of tables with each having roughly 2-6 columns. This would make this option turn into a column-mess really quickly.
Against a flat design with a key-value table:
Our properties have completly different data types, different constraints and foreign key relations. All of this would not be possible/difficult in this design.
Against custom database objects containt the child specific properties:
While this option that #Matthew McPeak suggested might be a viable option for a lot of people our database design never really used objects so introducing them to the mix would likely cause confusion more than it would help us.
In favor of the second option:
This option is easy to use in our table oriented database structure, makes it easy to distinguish the proper child table and does not need a lot of reworking to introduce. Especially since we already have something similar to a Type table that we can easily use for this purpose.
Third option, as you describe it, is impossible: no RDBMS (at least, of I personally know about) would allow you to use NULLs in PK (even composite).
Second is realistic.
And yes, first would take up to N queries to poll relatives in order to determine the actual type (where N is the number of types).
Although you won't escape with one query in second case either: there would always be two of them, because you cant JOIN unless you know what exactly you should be joining.
So basically there are flaws in your design, and you should consider other options there.
Like, denormalization: line non-shared attributes into the parent table anyway, then fields become nulls for non-correpondent types.
Or flexible, flat list of attribute-value pairs related through primary key (yes, schema enforcement is a trade-off).
Or switch to column-oriented DB: that's a case for it.

Data Dictionary and Oracle Advice needed

I have four tables called (to make the post easier to understand) a,b,c and d.
I have created a data dictionary for them all, with all attributes and other information needed. I am confused on whether or not I have to include my many-to-many tables (e.g. a_b) in the data dictionary too?
Also, do I need to have the many-to-many tables in the Oracle database too?
You want to build a data dictionary, presumably explaining the usage of your tables and columns in your applications. This is a good idea.
It is better to include all tables, including intersection (many-to-many) tables. Firstly, because it is useful to understand why we have M:N relationships rather than the more-normal 1:M relationships. Secondly, because including all tables instils a discipline which means the data dictionary is more likely to be maintained (it is just another forma of documentation, and people tend not to keep documentation up-to-date).
As for whether you need intersection tables, well, yes. yYou need a physical data model which is accurate. If you have relationships between tables which cannot be reduced to one-to-many then you have to implement them as many-to-many.
The alternative is to enforce the relationship entirely in application code and that always leads to data corruption. Ignore anybody who says otherwise. The arguments of history are against them.

What is the best way to realize this database

I have to realize a system with different kind of users and I think to realize it in this way:
A user table with only id, email and password.
Two different tables correlated to the user table in a 1-to-1 relation. Each table define specific attributes of each kind of user.
Is this the best way to realize it? I should use the InnoDB storage engine?
If I realize it in this way, how can I handle the tables in the Zend Framework?
I can't answer the second part of your question but the pattern you describe is called super and subtype in datamodelling. If this is the right choice can't be answered without knowing more about the differences between these user types and how they will be used in the application. There are different approaches when converting logical super/subtypes into physical tables.
Here are some relevant links:
http://www.sqlmag.com/article/data-modeling/implementing-supertypes-and-subtypes
and the next one about pitfalls and (mis)use of subtyping
http://www.ocgworld.com/doc/OCG_Subtyping_Techniques.pdf
In general I am, from a pragmatic point of view, very reluctant to follow your choice and most often opt to create one table containing all columns. In most cases there are a number of places where the application needs show all users in some sort of listing with specific columns for specific types (and empty if not applicable for that type). It quickly leads to non-straigtforward queries and all sort of extra code to deal with the different tables that it's just not worth being 'conceptually correct'.
Two reasons for me to still split the subtypes into different tables are if the subtypes are so truly different that it makes no logical sense to have them in one table and if the number of rows is so enormous that the overhead of the 'unneeded' columns when putting it all in one table actually starts to matter
On php side you can use Doctrine 2 ORM. It's easy to integrate with zf, and you could easily implement this table structure as inheritance in your doctrine mapping.

Is it a bad idea to make a generic link table?

Imagine a meta database with a high degree of normalization. It would blow up this input field if I would attempt to describe it here. But imagine, every relationship through the entire database, through all tables, go through one single table called link. It has got these fields: master_class_id, master_attr_id, master_obj_id, class_id2, obj_id2. This table can easily represent all kinds of relationships: 1:1, 1:n, m:n, self:self.
I see the problem that this table is going to get HUUUUGE. Is that bad practice?
That is wrong on two accounts:
It'll be a tremendous bottleneck for all your queries and it'll kill any chance of throughput.
It reeks of bad design: you should be able to describe things more concisely and closer to reality. If this is really the best way to store the data you can consider partitioning or even another paradigm instead of the relational
In a word, yes, this is a bad idea
Without going into too many details, I would offer the following:
for a meta database, the link table should be split by (high level) entity : that is, you should have a separate link table for each entity
another link table is required for the between-entities links
Normally the high-level entities are fairly easy to identify, like Customer.
It is usually bad practice but not because the table is huge. The problem is that you are mixing unrelated data in one table.
The reason to keep the links in separate tables, is because you won't need to use them together.
It is a common mistake that is also done with data itself: You should not mix two sets of data in one table only because the fields are similar if the data itself is unrelated.
Relational databases don't actually fit for this model.
It's possible to implement it but it will be quite slow. The main drawback is that you won't be able to index the links efficiently.
However, this design can be useful in two cases:
This only stores the metadata: declared relationships between the entities. The actual data are stored in the plain relational tables, so this links are only used to show the structure but not in the actual queries.
This stores some structures which are complex but contain few data, so that the ease of development overweights the performance drawbacks.
This design can be seen in several ORMs (one of which I even developed).
I don't see the purpose of this type of table anyway. If you have table A that is one-to-many to table B then A is going to still have a PK and B will still have a PK. A would normally contain a FK to B.
So in the Master_Table you will have to store A PK, B FK which is just a duplicate of what is already there. The only thing you will 'lose' is the FK in table A but you just migrated it into a giant table that is hard to deal with by the database, the dba, and anyone coding using the db.
Those table appear in Access most frequently and show up on the DailyWTF because they are insanely hard to read and understand.
Oh! And a main problem is that to make the table ubiquitous you will have to make generic columns which will probably end up destroying data integrity.

Person name structure in separate database table

I am wondering when and when not to pull a data structure into a separate database table when it appears in several tables.
I have pulled the 12 attribute address structure into a separate table because I have a couple of different entities containing a single address in this format.
But how about my 3 attribute person name structure (given, middle, surname)?
Should this be put into its own table referenced with a foreign key for all the entities containing a name... e.g. the company table has a contact person name, the citizen table has a person name etc.
Are these best left as attributes in the main tables or should they be extracted?
I would usually keep the address on the Person table, unless there was an unusual need for absolutely uniform addresses on each entity, or if an entity could have an arbitrary number of addresses, or if addresses need to be shared between entities, or if it was a large enterprise product where I know I have to invest in infrastructure all over the place or I will end up gutting everything down the road.
Having your addresses in a seperate table is interesting because it's flexible, but in the context of a small project lacking a special need like the ones mentioned above, it's probably a slight waste. Always be aware of the balance between complexity and flexibility. Flexibility is important, but be discriminating... It's easy to invest way too much there!
In concrete terms, the times that I experimented with (for instance) one-to-one relationships for things like addresses, I ended up refactoring them back into the table because it introduced a bunch of headaches including more complex queries, dealing with situations where the address does not exist, etc. More entities also increases your cognitive load -- it makes the project harder to think about. In my case, it was an unecessary cost because there was no concrete need and, in truth, not even a gain in flexibility.
So, based on my experiences, I would "try" to keep the addresses in the same table, and I would definitely keep the names on them - again, unless there was a special need.
So to paraphrase Einstein, make it as simple as possible and no simpler. But in the short term, experiment. It's the best way to learn these lessons.
It's about not repeating information, so you don't want to store the same information in two places when one will do.
Another useful rule of thumb is one entity per table. If you find that one table contains, say, "person" AND "order" then you probably should split those into two tables.
And (putting myself at risk of repeating information...) you might find it helpful to review some database design basics, there are plenty of related questions here on stackoverflow.
Start with these...
What is normalisation?
What is important to keep in mind when designing a database
How many fields is 'too many'?
More tables or more columns?
Creating a person entity across your data model will give you this present and future advantages -
The same person occurring as a contact, or individual in different contexts. Saves redundancy.
Info can be maintained and kept current with far-less effort.
Easier to search for a person and identify them - i.e. is it the same John Smith?
You can expand the information - i.e. maintain addresses for this person far more easily.
Programming will be more consistent and debugging will be easier as well.
Moves you closer to a 'self-documenting' system.
As a counterpoint to the other (entirely valid) replies: within your application's current structure, how likely will it be for a given individual (not just name, the actual "person" -- multiple people could be "John Smith") to appear in more than one table? The less likely this is to happen, the less likely you are to get benefits from normalization.
Another way to think of it is entities. Outside of labels (names), is their any overlap between "customer" entity and an "employee" entity?
Extract them. Your aim should be to have no repeating data in your database.
Read about Normalization
It really depends on the problem you are trying to solve. In general it is probably a good idea to have some sort of 'person' table which holds details of people. However, there are occasions where that is potentially a very bad idea.
One example would be if you are holding details of prescriptions written out to people by a doctor. In some countries it is a legal requirment that the prescription details are held with the name in which they were prescribed NOT the name the person is going under currently. For instance a woman might be prescribed a drug as miss X, but then she gets married and becomes Mrs Y. If you had a person table that was linked to the prescriptions table you would now have the wrong details and would possibly face legal consequences. In that case you would need to probably copy the relevant details of the person into the prescription table, even though this would be duplicating data.
So again - it depends on the problem you are trying to solve. Don't just blindly follow what people consider to be best practices. Understand your data and any issues surrounding it, then try to follow best practices that fit.
Depends on what you're using the database for.
If you want fast queries on your tables you should de-normalize your tables. Having to run multiple JOIN's will take longer and make your queries more complex.
On the other hand if your intention is to have a flexible storage database which is not meant to be hit with a ton of fast-response queries, then normalizing the tables by splitting them out into multiple xref'ed tables will provide more flexibility in your design and reduce the need for submitting duplicated data.
Since de-normalization is "optimization", I would suggest you normalize the tables first, index them properly and see if you're getting any bottlenecks on your queries. If so, flatten the affected tables where needed.
You should really consider your whole database structure and do a ER diagram (entity relationship diagram) first. OF COURSE there should be another table called "Person" where the concept of a person is stored...

Resources