Ingredients for drink new table or string in column? - database

I am thinking about my application and I want to store data about drinks. Now I am thinking what is best that if I save ingredients just as nvarchar column in table with drink or if I create new table with ingredients and create relationship one to many? I want database to be just read-only and I want to have option filter by ingredients. So what´s best way for windows phone (for performance)? And if the new table would be the better choice, I should use EntitySet, EntityRef, am I right? And I would have for every ingredient new row in table? Let´s say I have 100drinks, in average that every drink has 4ingredients so I have in first table 100rows and in second cca 400rows? Thanks for help

Actually, both solutions proposed are wrong. A drink can have many ingredients and an ingredient can be used in many drinks. Hence, we have a many-to-many relationship here. The proper way to model this is the following (I'm adding data for it to be more understandable):
Ingredients (PK: Id)
+----+--------------------+
| Id | Name |
+----+--------------------+
| 1 | Water |
| 2 | Sugar |
| 3 | Coffe |
| 4 | Virgin Islands Tea |
| 5 | Ice |
+----+--------------------+
Drinks (PK: Id)
+----+-------------+
| Id | Name |
+----+-------------+
| 1 | Black Coffe |
| 2 | Tea |
| 3 | Ice Tea |
+----+-------------+
Drinks_Ingredients (PK: Drink_Id, Ingredient_Id)
+----------+---------------+------------+
| Drink_Id | Ingredient_Id | Proportion |
+----------+---------------+------------+
| 1 | 1 | 70 |
| 1 | 2 | 10 |
| 1 | 3 | 20 |
| 2 | 1 | 90 |
| 2 | 4 | 10 |
| 3 | 1 | 80 |
| 3 | 4 | 10 |
| 3 | 5 | 10 |
+----------+---------------+------------+
I'm adding this Proportion column to show you how to add data that is dependant on the pair of drink-ingredient. Now, if you're worried about the size of the tables it'll be quite small as the only tables that will have the more complex data types (varchars) will be the ingredients and drinks tables, which will have the minimum amount of records possible: one per each drink and one per each ingredient.
If you still have doubts keep looking at the example, you'll get it :)

I would do a table for ingredients with a description and an id, and store the ids in the drink table cause it's the elegant way to do it. For 100 drinks, you won't see a difference for the performance.

Related

What is this data referencing anti-pattern called?

I have a question related to a kind of duplication I see in databases from time to time. To ask this question, I need to set the stage a bit:
Let's say I have a database of TV shows. Its primary table Content stores information at various levels of granularity (Show -> Season -> Episode), using a parent column to denote hierarchy:
+----+---------------------------+-------------+----------+
| ID | ContentName | ContentType | ParentId |
+----+---------------------------+-------------+----------+
| 1 | Friends | Show | [null] |
| 2 | Season 1 | Season | 1 |
| 3 | The Pilot | Episode | 2 |
| 4 | The One with the Sonogram | Episode | 2 |
+----+---------------------------+-------------+----------+
Maybe this isn't ideal, but let's say it's good enough to work with and we're not looking to change it.
Now let's say we need to build a table that defines air dates. We can set these at any level, and they must apply down the hierarchy (e.g., if set at the Season level, it applies to all episodes within that season; if set at the Show level, it applies to all seasons and episodes).
So the original air dates might look like this:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 71 | 3 | 1994-09-22 |
| 72 | 4 | 1994-09-29 |
+-------+-----------+------------+
Whereas the air date for a streaming service might look like:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 91 | 1 | 2015-01-01 |
+-------+-----------+------------+
Cool. Everything's fine so far; we're adhering to 4NF (I think!) and we can proceed to our business logic.
Now we get to my question. If we implement our business logic in such a way that disregards the referential hierarchy, and instead duplicates the air dates down the hierarchy, what is this anti-pattern called? e.g., Let's say I set an air date at the Show level like above, but the business logic finds all child elements and creates an entry for each one, resulting in:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 91 | 1 | 2015-01-01 |
| 92 | 2 | 2015-01-01 |
| 93 | 3 | 2015-01-01 |
| 94 | 4 | 2015-01-01 |
+-------+-----------+------------+
There are some pretty clear problems with this, but please note that my question is not how to fix this. Just, is there a specific term for it? I want to call it something like, "disregarding data relationship" or, "ignoring referential context". Maybe it's not strictly a database anti-pattern, since in my example there's an external actor inserting the excess rows.

Can a Database column have duplicate foreign keys?

I'm building an app for trading cards for a given game. This means, a user can have multiple cards and even repeated cards. This is may approach but I don't know if it's correct (or even possible):
Users
---------------------------
|id| name | cards_ids |
---------------------------
|20| John | 31, 40, 50, 50|
---------------------------
Cards
-------------------------------
|id| name | type |
-------------------------------
|31| Monster31 | Aqua Monster|
-------------------------------
|50| Monster50 | Rock Monster|
-------------------------------
|40| Monster40 | Air Monster |
-------------------------------
As you can see, a user can have many cards even if they are the same. Would this duplicate foreign keys approach work fine? I will do this using Postgres, if that's relevant
You need think third normal form when designing your database.
In this case you want add the number of cards as a property
Users
-----------
|id| name |
-----------
|20| John |
-----------
CardsOwned
--------------------------------
|user_id| card_type_id | count |
--------------------------------
|20 | 31 | 1 |
|20 | 40 | 1 |
|20 | 50 | 2 |
--------------------------------
Or even better they should have their own id. Even when two cards are the same monster, they can have different attributes like "Near Mint" or "Mint"
Your cards definition should be something like cards_type where you define the card. But the cards own by anyone are the cards where even when are the same cards they have different id because are two different cards
------------------------------------------
| card_id | card_type_id | condition |
------------------------------------------
| 1 | 31 | Mint |
| 2 | 40 | Near Mint |
| 3 | 50 | Used |
| 4 | 50 | Mint |
------------------------------------------
then you need the ownership table to control who own what
CardsOwned:
| card_id | owner_id |
| 1 | 20 |
| 2 | 20 |
| 3 | 20 |
| 4 | 20 |

SSRS - filter on expression in report table

Let's say I have a report with following table in the body:
ID | Name | FirstName | GivenCity | RecordedCity | IsCorrectCity
1 | Gates | Bill | London | New York | No
2 | McCain | John | Brussels | Brussels | Yes
3 | Bullock | Lili | London | London | Yes
4 | Bravo | Johnny | Paris | Las Vegas | No
The column IsCorrectCity Basically includes an expression that checkes GivenCity and RecordedCity and returns a No if different or a Yes when equal.
Is it possible to add a report filter on the column IsCorrectCity (and how) so the users will be able to just select all records with No or Yes? I know this can be done with a parameter in the SQL query, but I would like to add it based on the expressions rather then adding more calculations and all to the query.
Here's a tutorial which explains how you can do it
Filtering Data Without Changing Dataset [SSRS]

Making an Object Dependent Number of Fields for a Table in MS.Access

I'm trying to make a database that will hold a table of objects, and these objects are comprised of objects from a second table. One table is a table of possible sets, and the second is a table of possible components. The table of sets has to include fields for each of its components, but each set has an unknown number of components. How do I make a table with fields (Component 1, Component 2, Component 3, ...) that are dependent on each set to decide how many of the fields it needs?
Is there a way to do this just using the Access interface or will I actually have to get into the code behind it?
I think it would also solve my problem if there were a way to make a field in a column that acted as an ArrayList so if anyone could think of how to do that please let me know.
Assuming that a component can be part of more than one set, what you need here is a many-to-many relationship.
In a database you don't do this with an arbitrary number of columns, you use a junction table.
When you need a tabular representation, you use a Pivot / Crosstab query.
Your data model could look like this:
Sets
+--------+----------+
| Set_ID | Set_Name |
+--------+----------+
| 1 | foo |
| 2 | bar |
+--------+----------+
Components
+--------------+----------------+
| Component_ID | Component_Name |
+--------------+----------------+
| 1 | aaa |
| 2 | bbb |
| 3 | ccc |
| 4 | ddd |
+--------------+----------------+
Junction table
+----------+----------------+
| f_Set_ID | f_Component_ID |
+----------+----------------+
| 1 | 2 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
+----------+----------------+
(f_ as in Foreign Key)

What database technologies should I consider for building a scalable "running average" view?

We are working on an application where millions of users will be entering information at the same time. Suppose the application allows people to rate geographic regions on where they would like to live. Each participant is allowed to rate each region using a decimal value from 0-10. Each person belongs to one or more groups based upon attributes such as gender, and people that consider themselves active, or enjoy culture.
Every time a rating is made, we need to have a view which shows us the average rating for each region/group. I'm aware that most DB's have an "average" function, but for our purposes we need to be able to use our own function as we may use a the geometric mean instead of the arithmetic mean.
Below are some tables which might be used. Note: I did not include the relationship table PeopleGroups which map which groups a person is a member of for brevity purposes.
Regions People Groups RegionScoresByPerson
+-----+------------+ +-----+-------+ +-----+----------+ +-----+-----+-------+
| RID | NAME | | PID | Name | | GID | Name | | RID | PID | Score |
+-----+------------+ +-----+-------+ +-----+----------+ +-----+-----+-------+
| 1 | Flordia | | P1 | Alice | | G0 | Everyone | | 1 | P1 | 6 |
| 2 | California | | P2 | Bob | | G1 | Women | | 1 | P2 | 8 |
+-----+------------+ | P3 | Frank | | G2 | Men | | 1 | P3 | 3 |
| P4 | Mary | | G3 | Active | | 1 | P4 | 2 |
+-----+-------+ | G4 | Culture | | 1 | P1 | 7 |
+-----+----------+ | 1 | P2 | 5 |
| 1 | P3 | 8 |
| 1 | P4 | 2 |
+-----+-----+-------+
Our current implementation uses a similar set of tables for storing ratings, but we don't calculate averages real-time. Anytime we need the results (e.g. show me the average score California for women), we have to pull all the information into memory and run the calculations manually.
I was wondering how I leverage database technologies such as views, triggers, stored procedures, etc. to present to me a simple table that will allow me to get scores by for people and groups so we don't have to manually run calculations.
I would like some table like the following, where everything is handled by the DB. Any insert,update,delete actions on the RegionScoresByPerson or Groups tables would automatically be reflected in this table. If it is not apparent, the rows marked with * calculated rows. In this case I'm using a simple arithmetic average, but I the design should allow for any type of function.
EID stands for entity ID (a person or group)
Besides deciding how to build such a view, I'm unsure of what sort of datatypes to use (and index) for People and Groups. I suppose I'd like the index to be integers, but that would prevent me from creating the table below because I couldn't distinguish between Person 1 and Group 1 -- Would having ID's such as P1 and G1 be a performance hit? I'm obviously concerned about the design being scalable.
ScoreView
+-----------+-----+-------+
| RID | EID | Score |
| 1 | P1 | 6 |
| 1 | P2 | 8 |
| 1 | P3 | 3 |
| 1 | P4 | 2 |
| 1 | P1 | 7 |
| 1 | P2 | 5 |
| 1 | P3 | 8 |
| 1 | P4 | 2 |
| 1 | G0 | 4.75 |*
| 1 | G1 | 4 |*
| 1 | G2 | … |*
| 1 | G3 | … |*
+-----------+-----+-------+
Apache Flume is the open source tool designed to solve this kind of problem. Also have a look at Google Cloud Dataflow.
https://flume.apache.org/

Resources