Is it a bad idea to make a generic link table? - database

Imagine a meta database with a high degree of normalization. It would blow up this input field if I would attempt to describe it here. But imagine, every relationship through the entire database, through all tables, go through one single table called link. It has got these fields: master_class_id, master_attr_id, master_obj_id, class_id2, obj_id2. This table can easily represent all kinds of relationships: 1:1, 1:n, m:n, self:self.
I see the problem that this table is going to get HUUUUGE. Is that bad practice?

That is wrong on two accounts:
It'll be a tremendous bottleneck for all your queries and it'll kill any chance of throughput.
It reeks of bad design: you should be able to describe things more concisely and closer to reality. If this is really the best way to store the data you can consider partitioning or even another paradigm instead of the relational

In a word, yes, this is a bad idea
Without going into too many details, I would offer the following:
for a meta database, the link table should be split by (high level) entity : that is, you should have a separate link table for each entity
another link table is required for the between-entities links
Normally the high-level entities are fairly easy to identify, like Customer.

It is usually bad practice but not because the table is huge. The problem is that you are mixing unrelated data in one table.
The reason to keep the links in separate tables, is because you won't need to use them together.
It is a common mistake that is also done with data itself: You should not mix two sets of data in one table only because the fields are similar if the data itself is unrelated.

Relational databases don't actually fit for this model.
It's possible to implement it but it will be quite slow. The main drawback is that you won't be able to index the links efficiently.
However, this design can be useful in two cases:
This only stores the metadata: declared relationships between the entities. The actual data are stored in the plain relational tables, so this links are only used to show the structure but not in the actual queries.
This stores some structures which are complex but contain few data, so that the ease of development overweights the performance drawbacks.
This design can be seen in several ORMs (one of which I even developed).

I don't see the purpose of this type of table anyway. If you have table A that is one-to-many to table B then A is going to still have a PK and B will still have a PK. A would normally contain a FK to B.
So in the Master_Table you will have to store A PK, B FK which is just a duplicate of what is already there. The only thing you will 'lose' is the FK in table A but you just migrated it into a giant table that is hard to deal with by the database, the dba, and anyone coding using the db.
Those table appear in Access most frequently and show up on the DailyWTF because they are insanely hard to read and understand.
Oh! And a main problem is that to make the table ubiquitous you will have to make generic columns which will probably end up destroying data integrity.

Related

Proper name for an intermediate table between others two intermediate

I have 4 entities: Event, Message, Flow and Document.
Event table stores a limited (seeded) number of records. Message has many events and each event can be related to many messages. The name event_message was given for the intermediate table.
As you can see, the convention for intermediate tables are: {tablename}_{tablename}.
Flow table stores a limited (seeded) number of records. Message has many flows and each flow can be related to many messages. The name flow_message was given for the intermediate table.
A document is created on each relation between Flow and Message (each record on flow_message).
The issue starts here:
Each event on a message has different documents by flow. It means: for each new record on intermediate table flow_message, each record on intermediate event_message has a new document related.
To solve this, I created an intermediate table between event_message and flow_message named: event_message_flow_message.
Is this correct (in some conventional way)? Is this modeling correct?
How to proper model and naming the intermediate table derivative by two others intermediate tables?
I also wish there was some convention. Since I do not know any official convention, I invented mine. The important thing is to respect the convention you choose.
So I would change the event_message_flow_message to rel_eventmessage_flowmessage.
But for me your convention is pretty nice.
It's hard to make a recommendation because your model seems a bit odd to me. You have 1:1 relationships between both DOCUMENT and FLOW_MESSAGE and DOCUMENT and EVENT_MESSAGE_FLOW_MESSAGE. It's hard to reconcile this in my mind with the many to one relationships to EVENT_MESSAGE_FLOW_MESSAGE. If you're relationships to DOCUMENT are really 1:1 (mandatory), then why keep documents in a separate table?
To address your question about table naming: I would argue that the {table}_{table} convention for naming intersection tables is not a best practice but rather a fallback for cases where you can't think of a better name.
The best practice is for names of tables to reflect the business name of the thing which is recorded / described by the data in the table. It's not always possible to do this, especially for intersection tables. Intersection tables represent many-to-many relationships, and relationships are often difficult to describe with a noun.
In your case, I don't think that your convention is actually making things especially easy to understand. I'd probably try to simplify with something like MESSAGE_DOCUMENT or even just DOCUMENT - since these seem to be 1:1 related in any case.

Designing two DB tables with same id? Good practices?

I have to take an online course on DB design once again since I got a really lazy teacher that I thought had taught us everything and I continue to find out he didn't.
I'm designing a small DB in which two particular tables brought up this question.
I have a table called "Athlete" which stores Athlete info and a second table called "EntryInfo" which stores a guy's objectives, if he was a referral by another athlete.
There is no way an athlete could have more than one of this entry infos, so I thought idAthlete would apply to both "Athlete" and "EntryInfo" but I don´t know if this is correct or not. Now I have these questions:
1) In trying to keep "Athlete" table as clean as possible I didn't include this "EntryInfo" in the "Athlete" table from the beginning but it COULD be in the same table. Is this the best way to handle it? Regarding good practices in DB design should they be in 1 or 2 tables?
2) If it´s better to keep it in two separate tables, can I have idAthlete as PK in Athlete table (identity, incremental) and have it also as a PK in Entry Info only as a FK? or would it be a better practice to have a PK identity incremental idEntryInfo on EntryInfo table with a FK idAthlete?
I know this is such a basic question and I know I should take a course on DB design and normalisation (and I will do).
When you have two tables with the same key it's called vertical partitioning and it's a valid design for various reasons.
However I don't see any reasons in your explanation. I only see your statement keep "Athlete" table as clean as possible, which has a pretty general meaning. If you're going to put different groups of fields into different tables you can categorise that any number of ways
If you had a zillion records and you had performance issues it might be worth considering.
It will be simpler for you if you keep it in one table, then you don't have to fiddle about synchronising keys between the tables

Data Dictionary and Oracle Advice needed

I have four tables called (to make the post easier to understand) a,b,c and d.
I have created a data dictionary for them all, with all attributes and other information needed. I am confused on whether or not I have to include my many-to-many tables (e.g. a_b) in the data dictionary too?
Also, do I need to have the many-to-many tables in the Oracle database too?
You want to build a data dictionary, presumably explaining the usage of your tables and columns in your applications. This is a good idea.
It is better to include all tables, including intersection (many-to-many) tables. Firstly, because it is useful to understand why we have M:N relationships rather than the more-normal 1:M relationships. Secondly, because including all tables instils a discipline which means the data dictionary is more likely to be maintained (it is just another forma of documentation, and people tend not to keep documentation up-to-date).
As for whether you need intersection tables, well, yes. yYou need a physical data model which is accurate. If you have relationships between tables which cannot be reduced to one-to-many then you have to implement them as many-to-many.
The alternative is to enforce the relationship entirely in application code and that always leads to data corruption. Ignore anybody who says otherwise. The arguments of history are against them.

Are there any standards/best-practices for managing small non-transactional lookup tables?

I have an ERP application with about 50 small lookup tables containing non-transactional data. Examples are ItemTypes, SalesOrderStatuses etc. There are so many different types and categories and statuses and with every new module new lookup tables are being added. I have a service to provide List objects out of these tables. These tables usually contain only two columns, (Id and Description). They have only a couple of rows, 8 - 10 rows at max.
I am thinking about putting all of them in one table with ID, Description and LookupTypeID. With this one table I will be able to get rid of 50 tables. Is it good idea? Bad Idea? Very bad idea?
Are there any standards/best-practices for managing small lookup tables?
Among some professionals, the single common lookup table is a design error you should avoid. At the very least, it will slow down performance. The reason is that you will have to have a compound primary key for the common table, and lookups via a compound key will take longer than lookups via a simple key.
According to Anith Sen, this is the first of five design errors you should avoid. See this article: Five Simple Design Errors
Merging lookup tables is a bad idea if you care about integrity of your data (and you should!):
It would allow "client" tables to reference the data they were not meant to reference. E.g. the DBMS will not protect you from referencing SalesOrderStatuses where only ItemTypes should be allowed - they are now in the same table and you cannot (easily) separate the corresponding FKs.
It would force all lookup data to share the same columns and types.
Unless you have a performance problems due to excessive JOINs, I recommend you stay with your current design.
If you do, then you could consider using natural instead of surrogate keys in the lookup tables. This way, the natural keys gets "propagated" through foreign keys to the "client" tables, resulting in less need for JOINing, at the price of increased storage space. For example, instead of having ItemTypes {Id PK, Description AK}, only have ItemTypes {Description PK}, and you no longer have to JOIN with ItemTypes just to get the Description - it was automatically propagated down the FK.
You can store them in a text search (ie nosql) database like Lucene. They are ridiculously fast.
I have implemented this to great effect. Note though that there is some initial setup to overcome, but not much. Lucene queries on ids are a snap to write.
The "one big lookup table" approach has the problem of allowing for silly values -- for example "color: yellow" for trucks in the inventory when you only have cars with "color: yellow". One Big Lookup Table: Just Say No.
Off-hand, I would go with the natural keys for the lookup tables unless you would have cases like "the 2012 model CX300R was red but the 2010-2011 models CX300R were blue (and model ID also denotes color)".
Traditionally if you ask a DBA they will say you should have separate tables. If you asked a programmer they would say using the single table is easier. (Makes making a Edit Status webpage very easy you just make one webpage and pass it a different LookupTypeID instead of lots of similar pages)
However now with ORM the SQL and Code to access different status tables is not really any extra effort.
I have used both method and both work fine. I must admit using a single status table is easiest. I have done this for small apps and also enterprise apps and have noticed no performance impacts.
Finally the other field I normally like to add on these generic status tables is a OrderBy field so you can sort the status in your UI by something other than the description if needed.
Sounds like a good idea to me. You can have the ID and LookupTypeID as a multi-attribute primary key. You just need to know what all of the different LookupTypeIDs represent and you should be good as gold.
EDIT: As for the standards/best-practices, I honestly don't have an answer for you. I've only had one semester of SQL/database design so I haven't been all too exposed to the matter.

Person name structure in separate database table

I am wondering when and when not to pull a data structure into a separate database table when it appears in several tables.
I have pulled the 12 attribute address structure into a separate table because I have a couple of different entities containing a single address in this format.
But how about my 3 attribute person name structure (given, middle, surname)?
Should this be put into its own table referenced with a foreign key for all the entities containing a name... e.g. the company table has a contact person name, the citizen table has a person name etc.
Are these best left as attributes in the main tables or should they be extracted?
I would usually keep the address on the Person table, unless there was an unusual need for absolutely uniform addresses on each entity, or if an entity could have an arbitrary number of addresses, or if addresses need to be shared between entities, or if it was a large enterprise product where I know I have to invest in infrastructure all over the place or I will end up gutting everything down the road.
Having your addresses in a seperate table is interesting because it's flexible, but in the context of a small project lacking a special need like the ones mentioned above, it's probably a slight waste. Always be aware of the balance between complexity and flexibility. Flexibility is important, but be discriminating... It's easy to invest way too much there!
In concrete terms, the times that I experimented with (for instance) one-to-one relationships for things like addresses, I ended up refactoring them back into the table because it introduced a bunch of headaches including more complex queries, dealing with situations where the address does not exist, etc. More entities also increases your cognitive load -- it makes the project harder to think about. In my case, it was an unecessary cost because there was no concrete need and, in truth, not even a gain in flexibility.
So, based on my experiences, I would "try" to keep the addresses in the same table, and I would definitely keep the names on them - again, unless there was a special need.
So to paraphrase Einstein, make it as simple as possible and no simpler. But in the short term, experiment. It's the best way to learn these lessons.
It's about not repeating information, so you don't want to store the same information in two places when one will do.
Another useful rule of thumb is one entity per table. If you find that one table contains, say, "person" AND "order" then you probably should split those into two tables.
And (putting myself at risk of repeating information...) you might find it helpful to review some database design basics, there are plenty of related questions here on stackoverflow.
Start with these...
What is normalisation?
What is important to keep in mind when designing a database
How many fields is 'too many'?
More tables or more columns?
Creating a person entity across your data model will give you this present and future advantages -
The same person occurring as a contact, or individual in different contexts. Saves redundancy.
Info can be maintained and kept current with far-less effort.
Easier to search for a person and identify them - i.e. is it the same John Smith?
You can expand the information - i.e. maintain addresses for this person far more easily.
Programming will be more consistent and debugging will be easier as well.
Moves you closer to a 'self-documenting' system.
As a counterpoint to the other (entirely valid) replies: within your application's current structure, how likely will it be for a given individual (not just name, the actual "person" -- multiple people could be "John Smith") to appear in more than one table? The less likely this is to happen, the less likely you are to get benefits from normalization.
Another way to think of it is entities. Outside of labels (names), is their any overlap between "customer" entity and an "employee" entity?
Extract them. Your aim should be to have no repeating data in your database.
Read about Normalization
It really depends on the problem you are trying to solve. In general it is probably a good idea to have some sort of 'person' table which holds details of people. However, there are occasions where that is potentially a very bad idea.
One example would be if you are holding details of prescriptions written out to people by a doctor. In some countries it is a legal requirment that the prescription details are held with the name in which they were prescribed NOT the name the person is going under currently. For instance a woman might be prescribed a drug as miss X, but then she gets married and becomes Mrs Y. If you had a person table that was linked to the prescriptions table you would now have the wrong details and would possibly face legal consequences. In that case you would need to probably copy the relevant details of the person into the prescription table, even though this would be duplicating data.
So again - it depends on the problem you are trying to solve. Don't just blindly follow what people consider to be best practices. Understand your data and any issues surrounding it, then try to follow best practices that fit.
Depends on what you're using the database for.
If you want fast queries on your tables you should de-normalize your tables. Having to run multiple JOIN's will take longer and make your queries more complex.
On the other hand if your intention is to have a flexible storage database which is not meant to be hit with a ton of fast-response queries, then normalizing the tables by splitting them out into multiple xref'ed tables will provide more flexibility in your design and reduce the need for submitting duplicated data.
Since de-normalization is "optimization", I would suggest you normalize the tables first, index them properly and see if you're getting any bottlenecks on your queries. If so, flatten the affected tables where needed.
You should really consider your whole database structure and do a ER diagram (entity relationship diagram) first. OF COURSE there should be another table called "Person" where the concept of a person is stored...

Resources