How to support variable number of one-to-many relationships in database - database

I am trying to design a Person database. The requirement is that a Person can have one or more varying number of children, cars, jobs, and homes.
So, currently, the way I have designed this is:
Person {
CharField name
DateField dob
CharField city
...
# Some standard base person data
}
Since I want to support variable number of associations, I create separate tables with one-to-many relationships. For example, I have
Home {
ForeignKey Person
CharField home_address
...
}
Job {
ForeignKey Person
CharField company_nme
CharField office_address
...
}
And so on for other fields.
This works fine because I can have as many or as few entries per person.
The downside is that for each Person, I do lookup on 5-6 tables. I am going to need more fields, so the lookups will increase.
Is there a paradigm to efficiently design this kind of scenario?
If it is of interest, I use Django with PostGreSql.
Edit:
The server is mostly making REST API responses off the database. The browser client needs the entire data for one Person at one go (to reduce multiple requests over network). So I will have to do the multiple joins together.
Actually, for my Person table, I really do not need any relational-stuff. Other tables in my DB are heavily relational. The reason I am thinking of this now is because I suspect that the lot of joins will result in slower performance, and changing the design later will be difficult.
I also came across JSONField for PostGreSql and I was wondering whether I should use those to save the "hanging-off" data so that the REST calls do not result in a multitude of JOINS. Since this is design level, I am thinking of the issue now because I am not sure changing this going ahead will be feasible.
Thanks a lot for your inputs.

Your design is correct. The number of tables is a reflection of the complexity (or not) of the application.
The "paradigm to efficiently design this kind of scenario" is the relational model and you are designing in terms of tables because you are working within that paradigm.
Your notions about "the downside" and "lookups" and "efficiency" presume implementation aspects without justification. The DBMS takes your declarations and updates and answers your queries and hides how. Implementation issues do arise, but far from the level of experience and knowledge suggested by your question.
Just make a staightforward design.

Related

Database design, multiple M-M tables or just one?

Today I was designing a database for a potential personal project of mine. Since I couldn't decide what would be a better option I asked my teacher Databases, unfortunately he couldn't tell me which of the two options is better than the other and why.
I designed the database for a dummy data generator. Since I want to generate multilangual data I thought of these tables. (But its a simplification of the tables).
(first and last)names: id, name
streets: id, name
languages: id, name
Each names.name and streets.name originates from a language, sometimes a name can have multiple origins (ex: Nick is both a Dutch as an English name).
Each language has multiple names and streets.
These two rules result in a Many-to-Many relationship. At the moment I've got only two tables, but I know I will get between 10 and 20 of these kind of tables.
The regular way one would do this is just make 10 to 20 Many-to-Many relationship tables.
Another idea I came up with was just one Many-to-Many table with a third column which specifies which table the id relates to.
At the moment I've got the design on my other PC so I will update it with my ideas visualized after dinner (2 hours or so).
Which idea is better and why?
To make the project idea a bit clearer:
It is always a hassle to create good and enough realistic looking working data for projects. This application will generate this data for you and return the needed SQL so you only have to run the queries.
The user comes to the site to get the data. He states his tablename, his columnnames and then he can link the columnnames to types of data, think of:
* Firstname
* Lastname
* Email adress (which will be randomly generated from the name of the person)
* Adress details (street, housenumber, zipcode, place, country)
* A lot more
Then, after linking columns with the types the user can set the number of rows he wants to make. The application will then choose a country at random and generate realistic looking data according to the country they live in.
That's actually an excellent question. This sort of thing leads to a genuine problem in database design and there is a real tradeoff. I don't know what rdbms you are using but....
Basically you have four choices, all of them with serious downsides:
1. One M-M table with check constraints that only one fkey can be filled in besides language and one column per potential table. Ick....
2. One M-M table per relationship. This makes things quite hard to manage over time especially if you need to change something from an int to a bigint at some point.
3. One M-M table with a polymorphic relationship. You lose a lot of referential integrity checks when you do this and to make it safe, have fun coding (and testing!) triggers.
4. Look carefully at the advanced features in your rdbms for a solution. For example in postgresql this can be solved with table inheritance. The downside is that you lose portability and end up in advanced territory.
Unfortunately there is no single definite answer. You need to consider the tradeoffs carefully and decide what makes sense for your project. If I was just working with one RDBMS, I would do the last one. But if not, I would probably do one table per relationship and focus on tooling to manage the problems that come up. But the former preference is about my level of knowledge and confidence, and the latter is a bit more of a personal opinion.
So I hope this helps you look at the tradeoffs and select what is right for you.

Performance in database design

I have to implement a testing platform. My database needs the following tables: Students, Teachers, Admins, Personnel and others. I would like to know if it's more efficient to have the FirstName and LastName in each of these tables, or to have another table, Persons, and each of the other table to be linked to this one with PersonID.
Personally, I like it this way, although trickier to implement, because I think it's cleaner, especially if you look at it from the object-oriented point of view. Would this add an unnecessary overhead to the database?
Don't know if it helps to mention I would like to use SQL Server and ADO.NET Entity Framework.
As you've explicitly mentioned OO and that you're using EntityFramework, perhaps its worth approaching the problem instead from how the framework is intended to work - rather than just building a database structure and then trying to model it?
Entity Framework Code First Inheritance : Table Per Hierarchy and Table Per Type is a nice introduction to the various strategies that you could pick from.
As for the note on adding unnecessary overhead to the database - I wouldn't worry about it just yet. EF is generally about getting a product built more rapidly and as it has to cope with a more general case, doesn't always produce the most efficient SQL. If the performance is a problem after your application is built, working and correct you can revisit and fix up the most inefficient stuff then.
If there is a person overlap between the mentioned tables, then yes, you should separate them out into a Persons table.
If you are only tracking what role each Person has (i.e. Student vs. Teacher etc) then you might consider just having the following three tables: Persons, Roles, and a bridge table PersonRoles.
On the other hand, if each role has it's own unique fields, then you should carry on as you are and leave each of these tables separate with a foreign key of PersonID.
If the attributes (i.e. First Name, Last Name, Gender etc) of these entities (i.e. Students, Teachers, Admins and Personnel) are exactly the same then you could just make a single table for all the entities with PersonType or Role attribute added to distinguish each person's role. However, if the entities has a lot of different attributes then it would be better that you create separate tables otherwise you will have normalization problem.
Yes that is a very bad way of structuring a DB. The DB structure should be designed based on the Normalizations.
Please check the normalization forms.
U should avoid the duplicate data as much as possible, else the queries will become slower.
And the main problem is when u r trying to get data that is associated with more than one or two tables.

Relational database design: standard row values in one table vs. separate tables

Note: I've seen a few related question about similar issues; however, none of them would fully answer my question.
I have exam data for schools. There are around 500 schools, and around 12 subject exams in my dataset (each school has data for each exam). Each exam has 6 attributes (columns). After the initial data is loaded to the database, no modifications are expected. With respect to SELECT queries, I imagine that separate exam data is used as often as queries over a number of exams. However, the database would be used by a website visualizing the data, thus those SELECT queries might have to be run rather often. With that in mind, I can think of three ways of organizing that data, with each way producing (apparently) BCNF tables.
First scema:
school
exam1_attr1
exam1_attr2
...
exam12_attr6
This schema feels wrong, though I do not have strong arguments against it. As I said, my data would not change, thus having exams carved into attribute names is not that much of an issue. However, such a setup would pose some aggregation difficulties over the entire dataset (i.e., resulting queries would possibly be unnecessarily complicated).
Second schema:
school
examID
attr1
attr2
...
attr6
While this schema looks attractive, I find it hard to convince myself that it is a good idea to represent exams as values rather than columns or separate tables. That is, the set of exams is known, finite and final, and each exam has exact same properties - sounds like a primary candidate for a separate table. On the other hand, under such an arrangement, both aggregation and single-exam queries are very clean and straight-forward.
Third schema would be identical for 12 separate exam tables:
school
attr1
attr2
...
attr6
Conceptually, I would feel that this schema represents my data best: each exam is logically separated into its own table. However, any queries requiring aggregate data over all exams would then include 12 tables, and that makes me feel rather uneasy.
Thus, my question: which database design would be best in my case? While I am looking for an answer, I am also very interested in reasons for choosing one schema over the other. Specifically, I wonder:
how efficiency of running queries changes with each database design,
how important in real life is the ease of writing queries (given that the data would be primarily used by a website - I would seldom write queries over the data after the website has been finished),
which design is better if potential future changes to the data of the website are taken into account,
whether your answer would be different if the number of schools was not 500, but 50,000.
In short, I am interested in any opinions that would help me understand why one design is better than the other. Any database design theories are welcome as well. Thanks!
In an operational relational database, the speed of changes is more important than speed of selects. In a data warehouse, the speed of selects is more important than the speed of changes.
You have a data warehouse.
Operational relational databases are normalized.
Data warehouses use some variation of a star schema.
Your second schema is a good schema for the reason you stated. Both aggregation and single-exam queries are very clean and straight-forward. However, you should put the school information in a separate school table, and reference the school table ID (primary key field, auto-increment integer) as a foreign key in the exam table. This allows you to scale from 500 to 50,000 schools more easily.

What is the best way to realize this database

I have to realize a system with different kind of users and I think to realize it in this way:
A user table with only id, email and password.
Two different tables correlated to the user table in a 1-to-1 relation. Each table define specific attributes of each kind of user.
Is this the best way to realize it? I should use the InnoDB storage engine?
If I realize it in this way, how can I handle the tables in the Zend Framework?
I can't answer the second part of your question but the pattern you describe is called super and subtype in datamodelling. If this is the right choice can't be answered without knowing more about the differences between these user types and how they will be used in the application. There are different approaches when converting logical super/subtypes into physical tables.
Here are some relevant links:
http://www.sqlmag.com/article/data-modeling/implementing-supertypes-and-subtypes
and the next one about pitfalls and (mis)use of subtyping
http://www.ocgworld.com/doc/OCG_Subtyping_Techniques.pdf
In general I am, from a pragmatic point of view, very reluctant to follow your choice and most often opt to create one table containing all columns. In most cases there are a number of places where the application needs show all users in some sort of listing with specific columns for specific types (and empty if not applicable for that type). It quickly leads to non-straigtforward queries and all sort of extra code to deal with the different tables that it's just not worth being 'conceptually correct'.
Two reasons for me to still split the subtypes into different tables are if the subtypes are so truly different that it makes no logical sense to have them in one table and if the number of rows is so enormous that the overhead of the 'unneeded' columns when putting it all in one table actually starts to matter
On php side you can use Doctrine 2 ORM. It's easy to integrate with zf, and you could easily implement this table structure as inheritance in your doctrine mapping.

Person name structure in separate database table

I am wondering when and when not to pull a data structure into a separate database table when it appears in several tables.
I have pulled the 12 attribute address structure into a separate table because I have a couple of different entities containing a single address in this format.
But how about my 3 attribute person name structure (given, middle, surname)?
Should this be put into its own table referenced with a foreign key for all the entities containing a name... e.g. the company table has a contact person name, the citizen table has a person name etc.
Are these best left as attributes in the main tables or should they be extracted?
I would usually keep the address on the Person table, unless there was an unusual need for absolutely uniform addresses on each entity, or if an entity could have an arbitrary number of addresses, or if addresses need to be shared between entities, or if it was a large enterprise product where I know I have to invest in infrastructure all over the place or I will end up gutting everything down the road.
Having your addresses in a seperate table is interesting because it's flexible, but in the context of a small project lacking a special need like the ones mentioned above, it's probably a slight waste. Always be aware of the balance between complexity and flexibility. Flexibility is important, but be discriminating... It's easy to invest way too much there!
In concrete terms, the times that I experimented with (for instance) one-to-one relationships for things like addresses, I ended up refactoring them back into the table because it introduced a bunch of headaches including more complex queries, dealing with situations where the address does not exist, etc. More entities also increases your cognitive load -- it makes the project harder to think about. In my case, it was an unecessary cost because there was no concrete need and, in truth, not even a gain in flexibility.
So, based on my experiences, I would "try" to keep the addresses in the same table, and I would definitely keep the names on them - again, unless there was a special need.
So to paraphrase Einstein, make it as simple as possible and no simpler. But in the short term, experiment. It's the best way to learn these lessons.
It's about not repeating information, so you don't want to store the same information in two places when one will do.
Another useful rule of thumb is one entity per table. If you find that one table contains, say, "person" AND "order" then you probably should split those into two tables.
And (putting myself at risk of repeating information...) you might find it helpful to review some database design basics, there are plenty of related questions here on stackoverflow.
Start with these...
What is normalisation?
What is important to keep in mind when designing a database
How many fields is 'too many'?
More tables or more columns?
Creating a person entity across your data model will give you this present and future advantages -
The same person occurring as a contact, or individual in different contexts. Saves redundancy.
Info can be maintained and kept current with far-less effort.
Easier to search for a person and identify them - i.e. is it the same John Smith?
You can expand the information - i.e. maintain addresses for this person far more easily.
Programming will be more consistent and debugging will be easier as well.
Moves you closer to a 'self-documenting' system.
As a counterpoint to the other (entirely valid) replies: within your application's current structure, how likely will it be for a given individual (not just name, the actual "person" -- multiple people could be "John Smith") to appear in more than one table? The less likely this is to happen, the less likely you are to get benefits from normalization.
Another way to think of it is entities. Outside of labels (names), is their any overlap between "customer" entity and an "employee" entity?
Extract them. Your aim should be to have no repeating data in your database.
Read about Normalization
It really depends on the problem you are trying to solve. In general it is probably a good idea to have some sort of 'person' table which holds details of people. However, there are occasions where that is potentially a very bad idea.
One example would be if you are holding details of prescriptions written out to people by a doctor. In some countries it is a legal requirment that the prescription details are held with the name in which they were prescribed NOT the name the person is going under currently. For instance a woman might be prescribed a drug as miss X, but then she gets married and becomes Mrs Y. If you had a person table that was linked to the prescriptions table you would now have the wrong details and would possibly face legal consequences. In that case you would need to probably copy the relevant details of the person into the prescription table, even though this would be duplicating data.
So again - it depends on the problem you are trying to solve. Don't just blindly follow what people consider to be best practices. Understand your data and any issues surrounding it, then try to follow best practices that fit.
Depends on what you're using the database for.
If you want fast queries on your tables you should de-normalize your tables. Having to run multiple JOIN's will take longer and make your queries more complex.
On the other hand if your intention is to have a flexible storage database which is not meant to be hit with a ton of fast-response queries, then normalizing the tables by splitting them out into multiple xref'ed tables will provide more flexibility in your design and reduce the need for submitting duplicated data.
Since de-normalization is "optimization", I would suggest you normalize the tables first, index them properly and see if you're getting any bottlenecks on your queries. If so, flatten the affected tables where needed.
You should really consider your whole database structure and do a ER diagram (entity relationship diagram) first. OF COURSE there should be another table called "Person" where the concept of a person is stored...

Resources