What I have: A Postgres database with TypeORM.
What is the model: A main entity Event and many children inherited (10+). Each children will have attributes very different from each other. Each Event belong to an User, left out of the example as it is not important for this particular case. For a simplified example, I've attached a mock diagram.
How do I use it: The model will be queried for the list of all aggregated events, in chronological order, and then filtered by its "type" in order to be displayed (Parameter, Symptom, ...). This means that I want a list with a subset of the most recent 20 Events (as example). For each of those 20 I will get each individual data from the table (or embed it in the first place).
Some of the events will happen with an high frequency while others with a much lower frequency.
The question is, what would be the best approach to model this?
I came up with:
Single table, Event contains everything
Cons: it will contains a lot of NULLs
Pros: everything can be aggregated through that table, no joins or views needed
Multiple tables, one for each type
Cons: I will need to have lots of joins and views to aggregate the data
Pros: Each table and entry is meaningful of its data and type
Multiple tables plus a main Event which points to the types row
Cons: Almost same as above but with an easier way to get certain aggregation per User
Pros: Same as above
It looks like you have a straight-forward tree structure rooted at the User entity. Something like this:
You then just need a table for each entity:
Users
Events
The Children (Parameter, Symptom, etc.)
The Events have a user field, that contains the ID of their parent User. Similarly, the Children have an event field that contains the ID of their parent Event.
Such that:
"Child" {
event -> id of parent Event
}
Event {
user -> id of parent User
}
In TS, you can create Entities for each type of Child (Symptom, Meal, etc.) so they remain strictly-typed. Having a seperate table for each allows you to optimize their DB representations however you want (indexing specific fields, normalizing, etc.). Seperate tables also allows you to avoid having a lot of NULLs for uncommon values.
Related
Consider this simple setup.
In this model, the following restrictions apply:
A person is either a parent or a child.
Only one parent per child
A parent has a relationship to a public utility instance
A child has a transitive relationship to the utility because of the parent.
Now, the question is: should every child have the "City utility id" property set in the database?
Advantage:
You avoid nullability. It is said that databases will build less effective indexes if the field is nullable because every person that is a child will have the same value (null) on this property.
Disadvantage:
Less clean, more bookkeeping on CUD operations. The field does not convey data that isn't already represented in the database.
EF supports both Table Per Hierarchy (TPH) and Table Per Concrete Type (TPT) so as far as the schema goes you have options. While both a Parent and a Child "are a" person, they each have individual characteristics. One of which is that only Parents have a City Utility assigned, the others are that Children can only be associated to a Parent. (A parent cannot reference another parent as a child, or a child cannot reference another child as a parent.) Handling all of these scenarios within a single table is a TPH structure which relies on implicit rules to be enforced by the application code, and results in a lot of null-able references and fields for data that applies to one or the other.
Wherever possible I recommend using a TPT structure and making the relationships more explicit. This has the benefit of not relying solely on application code to ensure that relationships and "optionality vs. required" are enforced at a DB level.
This would have something like:
[Person]
PersonId [PK]
// other common fields that apply to ALL types of Person.
[Parent]
PersonId [PK, FK]
CityUtilityId [FK]
// other parent-specific fields.
[Child]
PersonId [PK, FK]
ParentPersonId [FK] (To Parent, not Person)
// other child-specific fields.
This way if a parent or child has required or optional fields, they can be put in their respective tables with the respective NULL-ability. The alternative is that the field would always be NULL-able and it's up to the application to ensure the required nature for one or the other is enforced. The DB would be free to get into a completely invalid state by mistake at any point without complaint.
There is still a lot of attraction out in the development community to minimize the number of tables which stems largely from the days when drive space was expensive and schema cost $$ so combining similar data into a single table might have made sense. Relationally though it still had significant drawbacks. With modern databases I'd argue it's always better to only combine what is effectively identical and use TPT for inheritance, or use composition.
An example of Composition would be something like an Order which has a status. That status might be Delivered and there might be details we want to record against an order when it is delivered. (signatures, etc.) These could be Null-able fields on the Order table, but they only apply to Delivered orders and would be Null in all other cases. Instead, having a table like OrderDeliveryDetails /w a 1-to-1 relationship with order which is created when an order is delivered. (And deleted/made inactive if an order changes from Delivered to another status for any reason.)
I am looking for advice on the best way to go about modeling my database
Lets say I have three entities: meetings, habits, and tasks. Each has its own unique schema, however, I would like all 3 to have several things in common.
They should all contain calendar information, such as a start_date, end_date, recurrence_pattern, etc...
There are a few ways I could go about this:
Add these fields to each of the entities
Create an Event entity and have a foreign_key field on each of the other entities, pointing to the related Event
Create an Event entity and have 3 foreign_key fields on the Event (one for each of the other entities). At any given time only 1 of those fields would have a value and the other 2 would be null
Create an Event entity with 2 fields related_type and related_id. the related_type value, for any given row, would be one of "meetings", "habits", or "tasks" and the related_id would be the actual id of that entity type.
I will have separate api endpoints that access meetings, habits, and tasks.
I will need to return the event data along with them.
I will also have an endpoint to return all events.
I will need to return the related entity data along with each event.
Option 4 seems to be the most flexible but eliminates working with foreign keys.
Im not sure if that is a problem or a hinders performance.
I say its flexible in the case that I add a new entity, lets call it "games", the event schema will already be able to handle this.
When creating a new game, I would create a new event, and set the related_type to "games".
Im thinking the events endpoint can join on the related_type and would also require little to no updating.
Additionally, this seems better than option 3 in the case that I add many new entities that have event data.
For each of these entities a new column would be added to the event.
Options 1 and 2 could work fine, however I cannot just query for all events, I would have to query for each of the other entities.
Is there any best practices around this scenario? Any other approaches?
In the end performance is more important then flexibility. I would rather update code than sacrifice on performance.
I am using django and maybe someone has some tips around this, however, I am really looking for best practices around the database itself and not the api implementation.
I would keep it simple and choose option 1. Splitting up data in more tables than necessary for proper normalization won't be a benefit.
Perhaps you will like the idea of using PostgreSQL table inheritance. You could have an empty table event, and your three tables inherit from that table. That way, they automatically have all column from event, but they are still three independent tables.
This question already has answers here:
How can you represent inheritance in a database?
(7 answers)
Closed 2 years ago.
I am making a research repository where there are different types of research items such as conferences, publications, patents, keynotes etc. The data will be inserted after getting from the relevant sources, processing it and then inserting in a batch from excel sheet. The main operation will be querying the data according to the logged-in user like researcher related information for an individual, department/unit related information (mainly summing up the rows) for the chairperson and so on.
Now when I approach this, I see two options:
Make two tables, one for the research item type and the other for the actual item
Make individual tables for all type of objects
The problem with the 1st structure is that I will a huge main table with empty/null columns. But it will allow me to easily add another research item in future, since I can simply add the new item in the "type" table and then add the actual data in the common table.
However, the second approach allows me to only query the relevant table to get the information, hence no empty/null values. The drawback is I will not be able to add new research item in this structure, and I need to add new table for the new item type.
If I may ask, which of the two strategies would you recommend and why?
The 1st one entails multiple database queries, and the second one entails a large single table.
If it helps, I am using MS SQL server.
The problem you're facing is the resolution of a hierarchy in an ER model.
You have a parent entity, or generalization (RESEARCH_ITEM) that can be instantiated in different ways (your child entities, like PUBLICATION, PATENT, and so on).
To implement this hierarchy in the physical layer (i.e. creating the tables) you have to consider which properties this hierarchy has. In particular, you have to ask yourself:
Overlap constraint: can an instance of the parent entity belong to more than one child entity?
Covering constraint: do the child entities cover all the possbile instances of the parent entity?
Combining this two criteria, we have four possbile cases:
Total disjoint: the child entities cover all the possibile instances
without overlap;
Partial disjoint: the child entities don't cover all the possibile
instances and there's no overlap;
Total overlapping: the child entities cover all the possibile
instances with potential overlap;
Partial overlapping: the child entities don't cover all the
possibile instances and there's possible overlap.
The resolution of the hierarchy depends upon the scenario. If your hierarchy is a total-disjoint one, the best thing to do would be to eliminate the parent entity and to incorporate its attributes in the child entities (faster queries, cleaner tables).
On the other hand, if there is overlapping, this solution is not optimal, because you'd have duplication of data (the same row in two child tables). In this case you could opt for the incorporation of the children in the parent, with possible NULL fields for child-specific attributes.
Moreover, in order to design the better implementation, you'd have to consider how the data are accessed (Is there a child that I know will be queried against very often? In this case a separate table would be good).
I am designing database model for some application, and I have one table Post which belong to some category. OK, Category will logically be other table.
But, more categories belong to some super category or domain or area, and my question is next:
Whether create other table for super categories or domains, or to do this hierarchy in table Category with some combination of key to point to parent.
I hope I was clear with problem?
PS.I know that I can do this problem with both solution, but is there any benefits with using first over second solution, and contrary.
Thanks
It depends: if nearly each category has a parent, you could add a parent serial as a column. Then your category table will look like
+--+----+------+
|ID|Name|Parent|
+--+----+------+
The problem with this representation is that, as long the hierarchy is not cyclic, some categories will have no parent. Furthermore a category can only have one parent.
Therefore I would suggest using a category_hierarchy table. An additional table:
+-----+------+
|Child|Parent|
+-----+------+
The disadvantage of this approach is that nearly each category will be repeated. And therefore if nearly all categories have parents, the redundancy will approximately scale with that number. If relations however are quite sparse, one saves space. Furthermore using an intelligent join will prevent the second representation from taking long execution times. You can for instance define a view to handle such requests.
Furthermore there are situations where the second approach can improve speed. For instance if you don't need the hierarchy all the time (for instance when mapping serials to the category-name), lookups in the category table can be faster, simply because the table is more compact and thus more parts of the table will be cached.
I'm working on an app which I will be displaying events (sporting events, concerts, etc, etc). I'm trying to come up with a model where I can single out teams playing for sporting events, and bands/artists playing in a concert.
My initial stab at is it to have an events table, team table, band/artist table. But I can't figure out the optimal way of handling the two teams or the many band/artists performing at an event.
Is it ok to have multiple fields NULL if they do not apply to the record? I always thought it's best to limit the amount of NULL fields in a record.
Lots of ways to attack this.
You'll certainly have an events table.
Whether you split sports teams and bands into separate tables will depend on what information you're maintaining about each and how you feel about the NULLS. My understanding is that there is virtually no performance hit by having nulls in your records (I'm sure it varies by database) -- the real downside is potentially an unwieldy table if you have a lot of sports- and music-related fields all mixed in together.
If you do create separate sports and concert tables, you'll need to consider how you'll handle the "etc, etc" from your question. Into which table will magicians and sword-swallowers go?
As for connecting multiple teams/acts with an event, I'd be inclined to create an intermediate table to perform that linking. Each record would reference an event id and a performer id. You could have one record for an event (one performer) or two records for a sporting event (two teams) or n records for a many-act concert.
Then to pull up the show bill, you'll look up your event, and then use the intermediate table to find all the acts appearing at that event, and use those ids to get the name and details for each act.