I'm working on an app which I will be displaying events (sporting events, concerts, etc, etc). I'm trying to come up with a model where I can single out teams playing for sporting events, and bands/artists playing in a concert.
My initial stab at is it to have an events table, team table, band/artist table. But I can't figure out the optimal way of handling the two teams or the many band/artists performing at an event.
Is it ok to have multiple fields NULL if they do not apply to the record? I always thought it's best to limit the amount of NULL fields in a record.
Lots of ways to attack this.
You'll certainly have an events table.
Whether you split sports teams and bands into separate tables will depend on what information you're maintaining about each and how you feel about the NULLS. My understanding is that there is virtually no performance hit by having nulls in your records (I'm sure it varies by database) -- the real downside is potentially an unwieldy table if you have a lot of sports- and music-related fields all mixed in together.
If you do create separate sports and concert tables, you'll need to consider how you'll handle the "etc, etc" from your question. Into which table will magicians and sword-swallowers go?
As for connecting multiple teams/acts with an event, I'd be inclined to create an intermediate table to perform that linking. Each record would reference an event id and a performer id. You could have one record for an event (one performer) or two records for a sporting event (two teams) or n records for a many-act concert.
Then to pull up the show bill, you'll look up your event, and then use the intermediate table to find all the acts appearing at that event, and use those ids to get the name and details for each act.
Related
I am looking for advice on the best way to go about modeling my database
Lets say I have three entities: meetings, habits, and tasks. Each has its own unique schema, however, I would like all 3 to have several things in common.
They should all contain calendar information, such as a start_date, end_date, recurrence_pattern, etc...
There are a few ways I could go about this:
Add these fields to each of the entities
Create an Event entity and have a foreign_key field on each of the other entities, pointing to the related Event
Create an Event entity and have 3 foreign_key fields on the Event (one for each of the other entities). At any given time only 1 of those fields would have a value and the other 2 would be null
Create an Event entity with 2 fields related_type and related_id. the related_type value, for any given row, would be one of "meetings", "habits", or "tasks" and the related_id would be the actual id of that entity type.
I will have separate api endpoints that access meetings, habits, and tasks.
I will need to return the event data along with them.
I will also have an endpoint to return all events.
I will need to return the related entity data along with each event.
Option 4 seems to be the most flexible but eliminates working with foreign keys.
Im not sure if that is a problem or a hinders performance.
I say its flexible in the case that I add a new entity, lets call it "games", the event schema will already be able to handle this.
When creating a new game, I would create a new event, and set the related_type to "games".
Im thinking the events endpoint can join on the related_type and would also require little to no updating.
Additionally, this seems better than option 3 in the case that I add many new entities that have event data.
For each of these entities a new column would be added to the event.
Options 1 and 2 could work fine, however I cannot just query for all events, I would have to query for each of the other entities.
Is there any best practices around this scenario? Any other approaches?
In the end performance is more important then flexibility. I would rather update code than sacrifice on performance.
I am using django and maybe someone has some tips around this, however, I am really looking for best practices around the database itself and not the api implementation.
I would keep it simple and choose option 1. Splitting up data in more tables than necessary for proper normalization won't be a benefit.
Perhaps you will like the idea of using PostgreSQL table inheritance. You could have an empty table event, and your three tables inherit from that table. That way, they automatically have all column from event, but they are still three independent tables.
So I've done extensive searching on this and I can't seem to find a good solution that actually applies to my situation.
I have a list of projects in a table, then a list of people. I want to assign multiple people to one project. Seems pretty common. Obviously, I can't make multiple columns on my projects table for each person, as the people will change fairly frequently.
I need to display this information very quickly in a continuous list of projects (the ultimate way would be a multiple-select combobox as a listbox is too tall, but they don't exist outside of the dreaded lookup fields)
I can think of two ways:
- Store multiple employee IDs delimited by commas in one field in my projects table (I know this goes against good database design). Would require some code to store and retrieve the data.
- Have a separate table for employees assigned to projects (ID, ProjectID, EmployeeID). One to many relationship between projects table and this new table. One to many relationship between employees table and this new table. If a project has 3 employees assigned, it would store 3 records in this table. It seems a bit odd joining both tables in this way, and would also require code to get it to store and retrieve into a control like the one mentioned above).
Does anyone know if there is a better way (including displaying in an easy control) or how you usually tackle this problem?
The usual way to tackle this problem would be with a Junction Table. This is what you describe where you have a separate table maybe called EmployeeProject which has an EmployeeProjectID(PK), EmployeeID(FK) and ProjectID(FK).
In this way you model a Many-to-Many relationship where each project can have many employees involved and each employee can be involved in many projects. It's not actually all that difficult to do the SQL etc. required to pull the information back together again for display.
I would definitely stay away from storing comma-delimited values as this becomes significantly more complicated when you want to display or manipulate the data.
There's a good guide here: http://en.tekstenuitleg.net/articles/software/create-a-many-to-many-relationship-in-access but if you google "many to many junction table" or similar, there are thousands of pages/articles about implementation.
I believe the following is a pretty common use case, yet even after thinking about for a couple of hours and discussing it with a friend I found no satisfactory solution.
Basic problem: How do you store and efficiently query objects/entities with connections to many different relations?
The objects
Imagine you have a system that keeps track of a group of cars, their positions and their drivers (each is an entity in your DB/system). From monitoring the activity of the cars you are generating events such as speeding violations, collisions between two cars and fuel fillups. Now each of these events is a little different, modeled as objects they might have the following attributes:
Speeding violation
speed (integer)
car (reference)
driver (reference)
Collision
car1 (reference)
car2 (reference)
driver1 (reference)
driver2 (reference)
position (reference)
date & time
state (fixed or new)
Fuel fillup
car (reference)
amount (float)
position (reference)
Additionally they all share some attributes such as a creation date and an owning company. It is also possible that new events will be generated in the future and these should be simple to add to whatever storage system/model gets decided on.
Query demands
Queries are roughly ordered in by importance (most important first). The system should be able to efficiently
query all notifications (with their attributes) for a given company or time frame
query all notifications belonging to a certain car or driver
query all notifications of a certain type (e.g. all fillup notifications)
The question
How do you store the above described objects in a database (not necessarily relational although the referenced entities are in a relational DB) such that the queries described can be performed efficiently?
The definition of efficiency here can be pretty flexible, what is important to me is that situation where e.g. all of the dependencies have to be queried individually are avoided.
Potential solutions
Here are some of the ideas I came up with:
2 tables model: A first table event holds the common general information of the events such as an id, company, event_type and creation date. A second table event_objects then holds all the different attachments and contains the columns id, event_id, object_id and object_type.
Good:
Most queries can be answered efficiently
Very easy to scale for additional events
Very easy to add new attributes for an event
Bad:
When the objects for a specific event have to be retrieved they each have to be fetched with an individual query
If the DB is relational this goes against good practice/designed use (essentially use the DB as a key-value store)
1 table per event: Simply create one table for each event type with a column for each attribute
Good:
Events of the same type can be queried very efficiently
Querying all events of a company/car etc. is only linear in the number of event types (as opposed to the number of related attributes times the number of fetched events for the 2 tables model)
Fits more nicely with the relational model
Bad:
Harder to query all events of a company/for a time frame (requires #types queries)
Harder to add a new attribute to an existing event type
Conclusion
Based on the listed advantages and disadvantages I am tempted to go with the 1 table per event solution, but it still doesn't seem particularly elegant to me. I am sure I am not the first one to bump into this problem and would love to hear how others have tackled similar issues.
Heres a simple version of the website I'm designing: Users can belong to one or more groups. As many groups as they want. When they log in they are presented with the groups the belong to. Ideally, in my Users table I'd like an array or something that is unbounded to which I can keep on adding the IDs of the groups that user joins.
Additionally, although I realize this isn't necessary, I might want a column in my Group table which has an indefinite amount of user IDs which belong in that group. (side question: would that be more efficient than getting all the users of the group by querying the user table for users belonging to a certain group ID?)
Does my question make sense? Mainly I want to be able to fill a column up with an indefinite list of IDs... The only way I can think of is making it like some super long varchar and having the list JSON encoded in there or something, but ewww
Please and thanks
Oh and its a mysql database (my website is in php), but 2 years of php development I've recently decided php sucks and I hate it and ASP .NET web applications is the only way for me so I guess I'll be implementing this on whatever kind of database I'll need for that.
Your intuition is correct; you don't want to have one column of unbounded length just to hold the user's groups. Instead, create a table such as user_group_membership with the columns:
user_id
group_id
A single user_id could have multiple rows, each with the same user_id but a different group_id. You would represent membership in multiple groups by adding multiple rows to this table.
What you have here is a many-to-many relationship. A "many-to-many" relationship is represented by a third, joining table that contains both primary keys of the related entities. You might also hear this called a bridge table, a junction table, or an associative entity.
You have the following relationships:
A User belongs to many Groups
A Group can have many Users
In database design, this might be represented as follows:
This way, a UserGroup represents any combination of a User and a Group without the problem of having "infinite columns."
If you store an indefinite amount of data in one field, your design does not conform to First Normal Form. FNF is the first step in a design pattern called data normalization. Data normalization is a major aspect of database design. Normalized design is usually good design although there are some situations where a different design pattern might be better adapted.
If your data is not in FNF, you will end up doing sequential scans for some queries where a normalized database would be accessed via a quick lookup. For a table with a billion rows, this could mean delaying an hour rather than a few seconds. FNF guarantees a direct access lookup path for each item of data.
As other responders have indicated, such a design will involve more than one table, to be joined at retrieval time. Joining takes some time, but it's tiny compared to the time wasted in sequential scans, if the data volume is large.
Consider we have a database that has a table, which is a record of a sale. You sell both products and services, so you also have a product and service table.
Each sale can either be a product or a service, which leaves the options for designing the database to be something like the following:
Add columns for each type, ie. add Service_id and Product_id to Invoice_Row, both columns of which are nullable. If they're both null, it's an ad-hoc charge not relating to anything, but if one of them is satisfied then it is a row relating to that type.
Add a weird string/id based system, for instance: Type_table, Type_id. This would be a string/varchar and integer respectively, the former would contain for example 'Service', and the latter the id within the Service table. This is obviously loose coupling and horrible, but is a way of solving it so long as you're only accessing the DB from code, as such.
Abstract out the concept of "something that is chargeable" for with new tables, of which Product and Service now are an abstraction of, and on the Invoice_Row table you would link to something like ChargeableEntity_id. However, the ChargeableEntity table here would essentially be redundant as it too would need some way to link to an abstract "backend" table, which brings us all the way back around to the same problem.
Which way would you choose, or what are the other alternatives to solving this problem?
What you are essentially asking is how to achieve polymorphism in a relational database. There are many approaches (as you yourself demonstrate) to this problem. One solution is to use "table per class" inheritance. In this setup, there will be a parent table (akin to your "chargeable item") that contains a unique identifier and the fields that are common to both products and services. There will be two child tables, products and goods: Each will contain the unique identifier for that entity and the fields specific to it.
One benefit to this approach over others is you don't end up with one table with many nullable columns that essentially becomes a dumping ground to describe anything ("schema-less").
One downside is as your inheritance hierarchy grows, the number of joins needed to grab all the data for an entity also grows.
I believe it depends on use case(s).
You could put the common columns in one table and put product and service specific columns in its own tables.Here the deal is that you need to join stuff.
Else if you maintain two separate tables, one for Product and another for Sale. You use application logic to determine which table to insert into. And getting all sales will essentially mean , union of getting all products and getting all sale.
I would go for approach 2 personally to avoid joins and inserting into two tables whenever a sale is made.