What is the best way to realize this database - database

I have to realize a system with different kind of users and I think to realize it in this way:
A user table with only id, email and password.
Two different tables correlated to the user table in a 1-to-1 relation. Each table define specific attributes of each kind of user.
Is this the best way to realize it? I should use the InnoDB storage engine?
If I realize it in this way, how can I handle the tables in the Zend Framework?

I can't answer the second part of your question but the pattern you describe is called super and subtype in datamodelling. If this is the right choice can't be answered without knowing more about the differences between these user types and how they will be used in the application. There are different approaches when converting logical super/subtypes into physical tables.
Here are some relevant links:
http://www.sqlmag.com/article/data-modeling/implementing-supertypes-and-subtypes
and the next one about pitfalls and (mis)use of subtyping
http://www.ocgworld.com/doc/OCG_Subtyping_Techniques.pdf
In general I am, from a pragmatic point of view, very reluctant to follow your choice and most often opt to create one table containing all columns. In most cases there are a number of places where the application needs show all users in some sort of listing with specific columns for specific types (and empty if not applicable for that type). It quickly leads to non-straigtforward queries and all sort of extra code to deal with the different tables that it's just not worth being 'conceptually correct'.
Two reasons for me to still split the subtypes into different tables are if the subtypes are so truly different that it makes no logical sense to have them in one table and if the number of rows is so enormous that the overhead of the 'unneeded' columns when putting it all in one table actually starts to matter

On php side you can use Doctrine 2 ORM. It's easy to integrate with zf, and you could easily implement this table structure as inheritance in your doctrine mapping.

Related

Proper name for an intermediate table between others two intermediate

I have 4 entities: Event, Message, Flow and Document.
Event table stores a limited (seeded) number of records. Message has many events and each event can be related to many messages. The name event_message was given for the intermediate table.
As you can see, the convention for intermediate tables are: {tablename}_{tablename}.
Flow table stores a limited (seeded) number of records. Message has many flows and each flow can be related to many messages. The name flow_message was given for the intermediate table.
A document is created on each relation between Flow and Message (each record on flow_message).
The issue starts here:
Each event on a message has different documents by flow. It means: for each new record on intermediate table flow_message, each record on intermediate event_message has a new document related.
To solve this, I created an intermediate table between event_message and flow_message named: event_message_flow_message.
Is this correct (in some conventional way)? Is this modeling correct?
How to proper model and naming the intermediate table derivative by two others intermediate tables?
I also wish there was some convention. Since I do not know any official convention, I invented mine. The important thing is to respect the convention you choose.
So I would change the event_message_flow_message to rel_eventmessage_flowmessage.
But for me your convention is pretty nice.
It's hard to make a recommendation because your model seems a bit odd to me. You have 1:1 relationships between both DOCUMENT and FLOW_MESSAGE and DOCUMENT and EVENT_MESSAGE_FLOW_MESSAGE. It's hard to reconcile this in my mind with the many to one relationships to EVENT_MESSAGE_FLOW_MESSAGE. If you're relationships to DOCUMENT are really 1:1 (mandatory), then why keep documents in a separate table?
To address your question about table naming: I would argue that the {table}_{table} convention for naming intersection tables is not a best practice but rather a fallback for cases where you can't think of a better name.
The best practice is for names of tables to reflect the business name of the thing which is recorded / described by the data in the table. It's not always possible to do this, especially for intersection tables. Intersection tables represent many-to-many relationships, and relationships are often difficult to describe with a noun.
In your case, I don't think that your convention is actually making things especially easy to understand. I'd probably try to simplify with something like MESSAGE_DOCUMENT or even just DOCUMENT - since these seem to be 1:1 related in any case.

Performance in database design

I have to implement a testing platform. My database needs the following tables: Students, Teachers, Admins, Personnel and others. I would like to know if it's more efficient to have the FirstName and LastName in each of these tables, or to have another table, Persons, and each of the other table to be linked to this one with PersonID.
Personally, I like it this way, although trickier to implement, because I think it's cleaner, especially if you look at it from the object-oriented point of view. Would this add an unnecessary overhead to the database?
Don't know if it helps to mention I would like to use SQL Server and ADO.NET Entity Framework.
As you've explicitly mentioned OO and that you're using EntityFramework, perhaps its worth approaching the problem instead from how the framework is intended to work - rather than just building a database structure and then trying to model it?
Entity Framework Code First Inheritance : Table Per Hierarchy and Table Per Type is a nice introduction to the various strategies that you could pick from.
As for the note on adding unnecessary overhead to the database - I wouldn't worry about it just yet. EF is generally about getting a product built more rapidly and as it has to cope with a more general case, doesn't always produce the most efficient SQL. If the performance is a problem after your application is built, working and correct you can revisit and fix up the most inefficient stuff then.
If there is a person overlap between the mentioned tables, then yes, you should separate them out into a Persons table.
If you are only tracking what role each Person has (i.e. Student vs. Teacher etc) then you might consider just having the following three tables: Persons, Roles, and a bridge table PersonRoles.
On the other hand, if each role has it's own unique fields, then you should carry on as you are and leave each of these tables separate with a foreign key of PersonID.
If the attributes (i.e. First Name, Last Name, Gender etc) of these entities (i.e. Students, Teachers, Admins and Personnel) are exactly the same then you could just make a single table for all the entities with PersonType or Role attribute added to distinguish each person's role. However, if the entities has a lot of different attributes then it would be better that you create separate tables otherwise you will have normalization problem.
Yes that is a very bad way of structuring a DB. The DB structure should be designed based on the Normalizations.
Please check the normalization forms.
U should avoid the duplicate data as much as possible, else the queries will become slower.
And the main problem is when u r trying to get data that is associated with more than one or two tables.

Onetomany with parent database design

Below is a database design which represents my problem(it is not my actual database design). For each city I need to know which restaurants, bars and hotels are available. I think the two designs speak for itself, but:
First design: create one-to-many relations between city and restaurants, bars and hotels.
Second design: only create an one-to-many relation between city and place.
Which design would be best practice? The second design has less relations, but would I be able to get all the restaurants, bars and hotels for a city and there own data (property_x/y/z)?
Update: this question is going wrong, maybe my fault for not being clear.
the restaurant/bar/hotel classes are subclasses of "place" (in both
designs).
the restaurant/bar/hotel classes must have the parent "place"
the restaurant/bar/hotel classes have there own specific data (property_X/Y/X)
Good design first
Your data, and the readability/understandability of your SQL and ERD, are the most important factors to consider. For the purpose of readability:
Put city_id into place. Why: Places are in cities. A hotel is not a place that just happens to be in a city by virtue of being a hotel.
Other design points to consider are how this structure will be extended in the future. Let's compare adding a new subtype:
In design one, you need to add a new table, relationship to 'place' and a relationship to city
In design two, you simply add a new table and relationship to 'place'.
I'd again go with the second design.
Performance second
Now, I'm guessing, but the reason for putting city_id in the subtype is probably that you anticipate that it's more efficient or faster in some specific use cases and this may be a very good reason to ignore readability/understandability. However, until you measure performance on the actual hardware you'll deploy on, you don't know:
Which design is faster
Whether the difference in performance would actually degrade the overall system
Whether other optimization approaches (tuning SQL or database parameters) is actually a better way to handle it.
I would argue that design one is an attempt to physically model the database on an ERD, which is a bad practice.
Premature optimization is the root of a lot of evil in SW Engineering.
Subtype approaches
There are two solutions to implementing subtypes on an ERD:
A common-properties table, and one table per subtype, (this is your second model)
A single table with additional columns for subtype properties.
In the single-table approach, you would have:
A subtype column, TYPE INT NOT NULL. This specifies whether the row is a restaurant, bar or hotel
Extra columns property_X, property_Y and property_Z on place.
Here is a quick table of pros and cons:
Disadvantages of a single-table approach:
The extension columns (X, Y, Z) cannot be NOT NULL on a single table approach. You can implement row-level constraints, but you lose the simplicity and visibility of a simple NOT NULL
The single table is very wide and sparse, especially as you add additional subtypes. You may hit the max. number of columns on some databases. This can make this design quite wasteful.
To query a list of a specific subtype, you have to filter using a WHERE TYPE = ? clause, whereas the table-per-subtype is a much more natural `FROM HOTEL INNER JOIN PLACE ON HOTEL.PLACE_ID = PLACE.ID"
IMHO, mapping into classes in an object-oriented languages is harder and less obvious. Consider avoiding if this DB is going to be mapped by Hibernate, Entity Beans or similar
Advantages of a single-table approach:
By consolidating into a single table, there are no joins, so queries and CRUD operations are more efficient (but is this small difference going to cause problems?)
Queries for different types are parameterized (WHERE TYPE = ?) and therefore more controllable in code rather than in the SQL itself (FROM PLACE INNER JOIN HOTEL ON PLACE.ID = HOTEL.PLACE_ID).
There is no best design, you have to pick based on the type of SQL and CRUD operations you are doing most frequently, and possibly on performance (but see above for a general warning).
Advice
All things being equal, I would advise the default option is your second design. But, if you have an overriding concern such as those I listed above, do choose another implementation. But don't optimize prematurely.
Both of them and none of them at all.
If I need to choose one, I would keep the second one, because of the number of foreign keys and indexes needed to be created after. But, a better approach would be: create a table with all kinds of places (bars, restaurants, and so on) and assign to each row a column with a value of the type of the place (apply a COMPRESS clause with the types expected at the column). It would improve both performance and readability of the structure, plus being more easier to maintain. Hope this helps. :-)
you do not show alternate columns in any of the sub-place tables. i think you should not split type data into table names like 'bar','restaurant', etc - these should be types inside the place table.
i think further you should have an address table - one column of which is city. then each place has an address and you can easily group by city when needed. (or state or zip code or country etc)
I think the best option is the second one. In the first design, there is a possibility of data errors as one place can be assigned to a particular restaurant (or any other type) in one city (e.g. A) and at the same time can be assigned to another restaurant in a different city (e.g. B). In the second design, a place is always bound to a particular city.
First:
Both designs can get you all the appropriate data.
Second:
If all extending classes are going to implement the location (which sound obvious for your implementation) then it would be a better practice to include it as part of the parent object. This would suggest option 2.
Thingy:
The thing is that even-tough you can find out the type of each particular PLACE, it is easier to just know that a type (CHILD) is always a place (PARENT). You can think of that while you visualize the result-set of option 2. With that in mind, I recommend the first approach.
NOTE:
First one doesn't have more relations, it just splits them.
If bar, restaurant and hotel have different sets of attributes, then they are different entities and should be represented by 3 different tables. But why do you need the place table? My advice it to ditch it and have 3 tables for your 3 entities and that's that.
In code, collecting common attributes into a parent class is more organised and efficient than repeating them in each child class - of course. But as spathirana comments above, database design is not like OOP. Sure, you'll save on the repetition of column names by sticking common attributes of places into a "place" table. But it will also add complication:
- you have to join on that table whenever you want to reference a bar, restaurant or hotel
- you have to insert into two tables whenever you want to add a new bar, restaurant or hotel
- you have to update two tables when ... etc.
Having 3 tables without the place table is ALSO, PROBABLY, the most performance-optimal design. But that's not where I'm coming from. I'm thinking of clean, simple database design where a single entity means a single row in a single table. There are no "is-a" relationships in a relational DB. Foreign key relationships are "has-a". OK, there are exceptions I'm sure, but your case is not exceptional.

Is it a bad idea to make a generic link table?

Imagine a meta database with a high degree of normalization. It would blow up this input field if I would attempt to describe it here. But imagine, every relationship through the entire database, through all tables, go through one single table called link. It has got these fields: master_class_id, master_attr_id, master_obj_id, class_id2, obj_id2. This table can easily represent all kinds of relationships: 1:1, 1:n, m:n, self:self.
I see the problem that this table is going to get HUUUUGE. Is that bad practice?
That is wrong on two accounts:
It'll be a tremendous bottleneck for all your queries and it'll kill any chance of throughput.
It reeks of bad design: you should be able to describe things more concisely and closer to reality. If this is really the best way to store the data you can consider partitioning or even another paradigm instead of the relational
In a word, yes, this is a bad idea
Without going into too many details, I would offer the following:
for a meta database, the link table should be split by (high level) entity : that is, you should have a separate link table for each entity
another link table is required for the between-entities links
Normally the high-level entities are fairly easy to identify, like Customer.
It is usually bad practice but not because the table is huge. The problem is that you are mixing unrelated data in one table.
The reason to keep the links in separate tables, is because you won't need to use them together.
It is a common mistake that is also done with data itself: You should not mix two sets of data in one table only because the fields are similar if the data itself is unrelated.
Relational databases don't actually fit for this model.
It's possible to implement it but it will be quite slow. The main drawback is that you won't be able to index the links efficiently.
However, this design can be useful in two cases:
This only stores the metadata: declared relationships between the entities. The actual data are stored in the plain relational tables, so this links are only used to show the structure but not in the actual queries.
This stores some structures which are complex but contain few data, so that the ease of development overweights the performance drawbacks.
This design can be seen in several ORMs (one of which I even developed).
I don't see the purpose of this type of table anyway. If you have table A that is one-to-many to table B then A is going to still have a PK and B will still have a PK. A would normally contain a FK to B.
So in the Master_Table you will have to store A PK, B FK which is just a duplicate of what is already there. The only thing you will 'lose' is the FK in table A but you just migrated it into a giant table that is hard to deal with by the database, the dba, and anyone coding using the db.
Those table appear in Access most frequently and show up on the DailyWTF because they are insanely hard to read and understand.
Oh! And a main problem is that to make the table ubiquitous you will have to make generic columns which will probably end up destroying data integrity.

Person name structure in separate database table

I am wondering when and when not to pull a data structure into a separate database table when it appears in several tables.
I have pulled the 12 attribute address structure into a separate table because I have a couple of different entities containing a single address in this format.
But how about my 3 attribute person name structure (given, middle, surname)?
Should this be put into its own table referenced with a foreign key for all the entities containing a name... e.g. the company table has a contact person name, the citizen table has a person name etc.
Are these best left as attributes in the main tables or should they be extracted?
I would usually keep the address on the Person table, unless there was an unusual need for absolutely uniform addresses on each entity, or if an entity could have an arbitrary number of addresses, or if addresses need to be shared between entities, or if it was a large enterprise product where I know I have to invest in infrastructure all over the place or I will end up gutting everything down the road.
Having your addresses in a seperate table is interesting because it's flexible, but in the context of a small project lacking a special need like the ones mentioned above, it's probably a slight waste. Always be aware of the balance between complexity and flexibility. Flexibility is important, but be discriminating... It's easy to invest way too much there!
In concrete terms, the times that I experimented with (for instance) one-to-one relationships for things like addresses, I ended up refactoring them back into the table because it introduced a bunch of headaches including more complex queries, dealing with situations where the address does not exist, etc. More entities also increases your cognitive load -- it makes the project harder to think about. In my case, it was an unecessary cost because there was no concrete need and, in truth, not even a gain in flexibility.
So, based on my experiences, I would "try" to keep the addresses in the same table, and I would definitely keep the names on them - again, unless there was a special need.
So to paraphrase Einstein, make it as simple as possible and no simpler. But in the short term, experiment. It's the best way to learn these lessons.
It's about not repeating information, so you don't want to store the same information in two places when one will do.
Another useful rule of thumb is one entity per table. If you find that one table contains, say, "person" AND "order" then you probably should split those into two tables.
And (putting myself at risk of repeating information...) you might find it helpful to review some database design basics, there are plenty of related questions here on stackoverflow.
Start with these...
What is normalisation?
What is important to keep in mind when designing a database
How many fields is 'too many'?
More tables or more columns?
Creating a person entity across your data model will give you this present and future advantages -
The same person occurring as a contact, or individual in different contexts. Saves redundancy.
Info can be maintained and kept current with far-less effort.
Easier to search for a person and identify them - i.e. is it the same John Smith?
You can expand the information - i.e. maintain addresses for this person far more easily.
Programming will be more consistent and debugging will be easier as well.
Moves you closer to a 'self-documenting' system.
As a counterpoint to the other (entirely valid) replies: within your application's current structure, how likely will it be for a given individual (not just name, the actual "person" -- multiple people could be "John Smith") to appear in more than one table? The less likely this is to happen, the less likely you are to get benefits from normalization.
Another way to think of it is entities. Outside of labels (names), is their any overlap between "customer" entity and an "employee" entity?
Extract them. Your aim should be to have no repeating data in your database.
Read about Normalization
It really depends on the problem you are trying to solve. In general it is probably a good idea to have some sort of 'person' table which holds details of people. However, there are occasions where that is potentially a very bad idea.
One example would be if you are holding details of prescriptions written out to people by a doctor. In some countries it is a legal requirment that the prescription details are held with the name in which they were prescribed NOT the name the person is going under currently. For instance a woman might be prescribed a drug as miss X, but then she gets married and becomes Mrs Y. If you had a person table that was linked to the prescriptions table you would now have the wrong details and would possibly face legal consequences. In that case you would need to probably copy the relevant details of the person into the prescription table, even though this would be duplicating data.
So again - it depends on the problem you are trying to solve. Don't just blindly follow what people consider to be best practices. Understand your data and any issues surrounding it, then try to follow best practices that fit.
Depends on what you're using the database for.
If you want fast queries on your tables you should de-normalize your tables. Having to run multiple JOIN's will take longer and make your queries more complex.
On the other hand if your intention is to have a flexible storage database which is not meant to be hit with a ton of fast-response queries, then normalizing the tables by splitting them out into multiple xref'ed tables will provide more flexibility in your design and reduce the need for submitting duplicated data.
Since de-normalization is "optimization", I would suggest you normalize the tables first, index them properly and see if you're getting any bottlenecks on your queries. If so, flatten the affected tables where needed.
You should really consider your whole database structure and do a ER diagram (entity relationship diagram) first. OF COURSE there should be another table called "Person" where the concept of a person is stored...

Resources