I'm thinking through a database design and I was wondering if anyone could chime in. I have some structured data that I will occasionally be filtering against some somewhat unstructured data. I'm thinking a lot about performance so I'm trying to keep it as denormalized as possible. Do people have opinions about an indexed JSONB column over a separate table? For example:
| (smurfs) id | name | filters (GIN index) |
|-------------+--------+-------------------------------------|
| 1 | Papa | { "color": "blue" } |
| 2 | Brainy | { "brain": "big", "color": "blue" } |
And I'd query against the indexed JSONB data.
Or:
| (smurfs) id | name |
|-------------+--------+
| 1 | Papa |
| 2 | Brainy |
| (filters) id | smurf_id | filter_type | filter_value |
|--------------+----------+-------------+--------------|
| 1 | 1 | color | blue |
| 2 | 2 | brain | big |
| 3 | 2 | color | blue |
and I'd JOIN the filters with the data for my query.
There's a lot of talk about misuse of JSON in relational databases. Does this fit into that category? Would one be preferable over another from a good design standpoint. Is one more performant? I'm trying to optimize for reads on a large table. Seems like in the second case, I'd have 2 large tables instead of one.
Related
Assuming there is a gigantic organization with a crazy way to manage. Each employee has one or multiple managers, managers are employees themselves who have one or multiple managers on top.
employee table
| id | name |managers_id|
| -------- | -------------- |-----------|
| 1 | Smith | 5,6 |
| 2 | Matt | 1 |
| 3 | Bob | 1,2 |
| 4 | Adam | 1,3 |
| 5 | Suzi | 6 |
| 6 | Emily | 23,25 |
| ... | ... | ... |
It is a one-way management chain, no loops, meaning it goes A-B-C-D, A-a-b-C-D etc, no such case as A-B-C-D-A
The query is to get the management chains, say C has two management chains on top:
A-B-C
A-a-b-C
C also has one chain below:
C-D
The level of C along the chains is not a matter.
In theory, there is no limitation on the number of levels, the chain can keep going indefinitely.
I was thinking about 'inheritance' but probably it is not the solution.
Any tips on how to design this postgres dababase, please? Thank you.
I have a question related to a kind of duplication I see in databases from time to time. To ask this question, I need to set the stage a bit:
Let's say I have a database of TV shows. Its primary table Content stores information at various levels of granularity (Show -> Season -> Episode), using a parent column to denote hierarchy:
+----+---------------------------+-------------+----------+
| ID | ContentName | ContentType | ParentId |
+----+---------------------------+-------------+----------+
| 1 | Friends | Show | [null] |
| 2 | Season 1 | Season | 1 |
| 3 | The Pilot | Episode | 2 |
| 4 | The One with the Sonogram | Episode | 2 |
+----+---------------------------+-------------+----------+
Maybe this isn't ideal, but let's say it's good enough to work with and we're not looking to change it.
Now let's say we need to build a table that defines air dates. We can set these at any level, and they must apply down the hierarchy (e.g., if set at the Season level, it applies to all episodes within that season; if set at the Show level, it applies to all seasons and episodes).
So the original air dates might look like this:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 71 | 3 | 1994-09-22 |
| 72 | 4 | 1994-09-29 |
+-------+-----------+------------+
Whereas the air date for a streaming service might look like:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 91 | 1 | 2015-01-01 |
+-------+-----------+------------+
Cool. Everything's fine so far; we're adhering to 4NF (I think!) and we can proceed to our business logic.
Now we get to my question. If we implement our business logic in such a way that disregards the referential hierarchy, and instead duplicates the air dates down the hierarchy, what is this anti-pattern called? e.g., Let's say I set an air date at the Show level like above, but the business logic finds all child elements and creates an entry for each one, resulting in:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 91 | 1 | 2015-01-01 |
| 92 | 2 | 2015-01-01 |
| 93 | 3 | 2015-01-01 |
| 94 | 4 | 2015-01-01 |
+-------+-----------+------------+
There are some pretty clear problems with this, but please note that my question is not how to fix this. Just, is there a specific term for it? I want to call it something like, "disregarding data relationship" or, "ignoring referential context". Maybe it's not strictly a database anti-pattern, since in my example there's an external actor inserting the excess rows.
I am trying to design a model for our future database of our toys and certain measurements that have to be done post-production. I have trouble grasping how to model this. I have tried multiple ways, but none of them seem optimal and in the end I've always lost out on the connectivity between entities.
What I need to achieve is some kind of meaningful relationship between the following:
A toy (with some trivial properties).
A series of toys (multiple toys can be related to one series and a toy can only belong to one series).
Measurement steps. There are currently 6 of these steps. Each step has its own input parameters and these vary in type as well as in number (eg. only 3 parameters for measurement step 1 and 10 parameters for measurement step 2).
With each series, a sequence of these measurement steps is defined. Duplicates of tests are allowed (eg. measurement step 1 > measurement step 4 > measurement step 1 is a valid sequence). The sequence along with the parameters must be stored somewhere for future reference.
Each toy goes through the sequence of measurements that is defined by its series. All of the results must be stored somewhere (for each individual toy).
If I split the measurement steps into their own tables I can't reference them conditionally (as foreign keys) to some other table.
If I try to serialize part of the data I lose the ability to make connections between individual measurement steps, measurement results (at least with queries) etc.
I know people here generally hate/don't answer these kinds of "discussion-like" questions, but I'd ask of you to at least point out what is a good practice in a system where I need to store this locally on a machine, but need a database to hold the data - to move towards serial-like data and just do general relationships where it is easy to do so or keep trying to normalize it as much as possible?
If measurements steps share most of attributes (or are of the same type, like what you called PARAMETERS), and I understood correctly your definitions, I would make something like this.
It could be a starting point.
+----------------------------+ +------------------------------+
| TOYS | | TOY_SERIES |
+-----+----------------------+ +---------+--------------------+
| PK | ID_TOY | +----------+ PK, FK1 | ID_S +--------+
| | | | +------------------------------+ |
| FK1 | ID_S +---------+ | | ... | |
+----------------------------+ | | | |
| | ... | | | | |
| | | | | | |
+-----+----------------------+ +---------+--------------------+ |
|
|
|
|
+------------------------------+ |
| BR_SER_MEAS | |
+---------+--------------------+ |
| PK, FK1 | ID_S +--------+
| | |
| PK, FK2 | ID_M +--------+
| | | |
| PK | ID_SEQ | |
| | | |
+---------+--------------------+ |
|
|
+------------------------------+ |
| MEASURE_STEPS | |
+------------------------------+ |
| PK ID_M +--------+
+------------------------------+
| PARAM_01 |
| ... |
| PARAM_10 |
| |
| |
+------------------------------+
I'm trying to make a database that will hold a table of objects, and these objects are comprised of objects from a second table. One table is a table of possible sets, and the second is a table of possible components. The table of sets has to include fields for each of its components, but each set has an unknown number of components. How do I make a table with fields (Component 1, Component 2, Component 3, ...) that are dependent on each set to decide how many of the fields it needs?
Is there a way to do this just using the Access interface or will I actually have to get into the code behind it?
I think it would also solve my problem if there were a way to make a field in a column that acted as an ArrayList so if anyone could think of how to do that please let me know.
Assuming that a component can be part of more than one set, what you need here is a many-to-many relationship.
In a database you don't do this with an arbitrary number of columns, you use a junction table.
When you need a tabular representation, you use a Pivot / Crosstab query.
Your data model could look like this:
Sets
+--------+----------+
| Set_ID | Set_Name |
+--------+----------+
| 1 | foo |
| 2 | bar |
+--------+----------+
Components
+--------------+----------------+
| Component_ID | Component_Name |
+--------------+----------------+
| 1 | aaa |
| 2 | bbb |
| 3 | ccc |
| 4 | ddd |
+--------------+----------------+
Junction table
+----------+----------------+
| f_Set_ID | f_Component_ID |
+----------+----------------+
| 1 | 2 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
+----------+----------------+
(f_ as in Foreign Key)
I am thinking about my application and I want to store data about drinks. Now I am thinking what is best that if I save ingredients just as nvarchar column in table with drink or if I create new table with ingredients and create relationship one to many? I want database to be just read-only and I want to have option filter by ingredients. So what´s best way for windows phone (for performance)? And if the new table would be the better choice, I should use EntitySet, EntityRef, am I right? And I would have for every ingredient new row in table? Let´s say I have 100drinks, in average that every drink has 4ingredients so I have in first table 100rows and in second cca 400rows? Thanks for help
Actually, both solutions proposed are wrong. A drink can have many ingredients and an ingredient can be used in many drinks. Hence, we have a many-to-many relationship here. The proper way to model this is the following (I'm adding data for it to be more understandable):
Ingredients (PK: Id)
+----+--------------------+
| Id | Name |
+----+--------------------+
| 1 | Water |
| 2 | Sugar |
| 3 | Coffe |
| 4 | Virgin Islands Tea |
| 5 | Ice |
+----+--------------------+
Drinks (PK: Id)
+----+-------------+
| Id | Name |
+----+-------------+
| 1 | Black Coffe |
| 2 | Tea |
| 3 | Ice Tea |
+----+-------------+
Drinks_Ingredients (PK: Drink_Id, Ingredient_Id)
+----------+---------------+------------+
| Drink_Id | Ingredient_Id | Proportion |
+----------+---------------+------------+
| 1 | 1 | 70 |
| 1 | 2 | 10 |
| 1 | 3 | 20 |
| 2 | 1 | 90 |
| 2 | 4 | 10 |
| 3 | 1 | 80 |
| 3 | 4 | 10 |
| 3 | 5 | 10 |
+----------+---------------+------------+
I'm adding this Proportion column to show you how to add data that is dependant on the pair of drink-ingredient. Now, if you're worried about the size of the tables it'll be quite small as the only tables that will have the more complex data types (varchars) will be the ingredients and drinks tables, which will have the minimum amount of records possible: one per each drink and one per each ingredient.
If you still have doubts keep looking at the example, you'll get it :)
I would do a table for ingredients with a description and an id, and store the ids in the drink table cause it's the elegant way to do it. For 100 drinks, you won't see a difference for the performance.