I'm trying to create a friendship site. The issue I'm having is when a user joins a website they have to fill out a form. This form has many fixed drop down items the user must fill out. Here is an example of one of the drop downs.
Drop Down (Favorite Pets)
Items in Favorite Pets
1. Dog
2. Cat
3. Bird
4. Hampster
What is the best way to store this info in a database. Right now the profile table has a column for each fixed drop down. Is this correct database design. See Example:
User ID | Age | Country | Favorite Pet | Favorite Season
--------------------------------------------------------------
1 | 29 | United States | Bird | Summer
Is this the correct database design? right now I have probably 30 + columns. Most of the columns are fixed because they are drop down and the user has to pick one of the options.
Whats the correct approach to this problem?
p.s. I also thought about creating a table for each drop down but this would really complex the queries and lead to lots of tables.
Another approach
Profile table
ID | username | age
-------------------
1 | jason | 27
profileDropDown table:
ID | userID | dropdownID
------------------------
1 | 1 | 2
2 | 1 | 7
Drop Down table:
ID | dropdown | option
---------------------
1 | pet | bird
2 | pet | cat
3 | pet | dog
4 | pet | Hampster
5 | season | Winter
6 | Season | Summer
7 | Season | Fall
8 | Season | spring
"Best way to approach" or "correct way" will open up a lot of discussion here, which risks this question being closed. I would recommend creating a drop down table that has a column called "TYPE" or "NAME". You would then put a unique identifier of the drop down in that column to identify that set. Then have another column called "VALUE" that holds the drop down value.
For example:
ID | TYPE | VALUE
1 | PET | BIRD
2 | PET | DOG
3 | PET | FISH
4 | SEASON | FALL
5 | SEASON | WINTER
6 | SEASON | SPRING
7 | SEASON | SUMMER
Then to get your PET drop down, you just select all from this table where type = 'PET'
Will the set of questions (dropdowns) to be asked every user ever be changed? Will you (or your successor) ever need to add or remove questions over time? If no, then a table for users with one column per question is fine, but if yes, it gets complex.
Database purists would require two tables for each question:
One table containing a list of all valid answers for that question
One table containing the many to many relation between user and answer to “this” question
If a new question is added, create new tables; if a question is removed, drop those tables (and, of course, adjust all your code. Ugh.) This would work, but it's hardly efficient.
If, as seems likely, all the questions and answer sets are similar, then a three-table model suggests itself:
A table with one row per question (QuestionId, QuestionText)
A table with one row for each answer for each Question (QuestionId, AnswerId, AnswerText)
A table with one row for each user-answered question (UserId, QuestionId, AnswerId)
Adding and removing questions is straightforward, as is identifying skipped or unanswered questions (such as, if you add a new question a month after going live).
As with most everything, there’s a whole lot of “it depends” behind this, most of which depends on what you want your system to do.
Related
I'm developing an application that uses a mysql database and we wanted to do an approach for history purposes, that we store the current state and the history in the same table for performance reasons (on updates the application doesn't have the id for an entity just a key pair, so it is easier just to insert a new row).
The table looks like this:
+------+-------+-----------+------------------------------+
| id |user_id| type |content |
+------+-------+-----------+------------------------------+
| 1 |'1-2-3'| position | *creation |
| 2 |'1-2-3'| position | *something_changed |
| 3 |'1-2-3'| device | *creation |
| 4 |'1-2-4'| position | *creation |
| 5 |'1-2-4'| device | *creation |
| 6 |'1-2-4'| device | *something_changed |
+------+-------+-----------+------------------------------+
Every entity is described with the user_id and type "key" pair, when something is changed in the entity a new row is inserted. The current state of an entity is selected by the highest id row from the group, which is grouped by the user_id and type. Performance wise the updates should be super fast and the selects can be slower, because those are not used often.
I would like to look up best practices and other people experiences with this method, but I don't know how to search for them. Can you help me? I'm interested in your experiences or opinions on this topic as well.
I know about Kafka and other streaming platforms, but that was sadly not an option for this.
So I am trying to wrap my head around the whole "normalization" thing. To understand it better, I have come up with a this case of storing songs
Suppose I have the following db:
Album Table:
album_name| genre
album_1| genre_1, genre_2
album_2| genre_1
album_3| genre_2
To normalize, I thought of the following approach
Album Table:
album_name| genre_id
album_1| 3
album_2| 1
album_3| 2
Genre Table:
genre_id| genre_1| genre_2
0| false| false
1| true| false
2| false| true
3| true| true
Thus, if a new genre pops up, all I need to do is create a new column in genre table and the new corresponding genre_id can be assigned. Well, that will require filling up of all possible combinations, but that will only happen once for every new genre introduced.
Also, what I thought of, will that be considered "normalizing"? From the examples I have seen around, I haven't seen creation of tables with columns that were originally data.
The canonical way of doing this would be to use three tables:
Album |
album_id | album_name (and maybe other columns)
1 | Rumours
2 | Thriller
3 | To the Moon and Back
Genre
genre_id | genre_name (also maybe other columns)
1 | rock
2 | pop
3 | alternative
AlbumGenre
album_id | genre_id
1 | 1
1 | 2
2 | 2
3 | 2
3 | 3
Normalization is all about avoiding the storage of repetitive data. If you scrutinize this design, you will see that information about albums and genres is stored only once, in each respective table. Then, the AlbumGenre table stores the relationships between albums and the various genres. This table is usually called a "bridge" table, because it links albums to their genres.
The problem with your proposed Genre table is that it repeats information about relationships even if those relationships don't exist. Furthermore, this approach won't scale well at all if you need to add more genres to the database.
The relationship you defined is a many to many relationship. In general you don't want to be adding new columns when you add new data. So we need to look at another solution.
First we define tables for the Albums and Genres:
Album Table:
album_id | album_name
1 | album_1
2 | album_2
3 | album_3
Genre Table:
genre_id | genre_name
1 | genre_1
2 | genre_2
3 | genre_3
Now we need to link those two. We use a junction table to do that. Each instance of a genre belonging to an album will have a row in this table. So albums could be listed in this table multiple times.
Album Genres Junction Table:
album_genre_junction_id | album_id | genre_id
1 1 1
2 1 2
3 2 1
4 3 2
I'm trying to create a database for a frienship website I'm building. I want to store multiple attributes about the user such as gender, education, pets etc.
Solution #1 - User table:
id | age | birth day | City | Gender | Education | fav Pet | fav hobbie. . .
--------------------------------------------------------------------------
0 | 38 | 1985 | New York | Female | University | Dog | Ping Pong
The problem I'm having is the list of attributes goes on and on and right now my user table has 20 something columns.
I feel I could normalize this by creating another table for each attribute see below. However this would create many joins and I'm still left with a lot of columns in the user table.
Solution #2 - User table:
id | age | birth day | City | Gender | Education | fav Pet | fav hobbies
--------------------------------------------------------------------------
0 | 38 | 1985 | New York | 0 | 0 | 0 | 0
Pets table:
id | Pet Type
---------------
0 | Dog
Anyone have any ideas how to approach this problem it feels like both answers are wrong. What is the proper table design for this database?
There is more to this than meets the eye: First of all - if you have tons of attributes, many of which will likely be null for any specific row, and with a very dynamic selection of attributes (i.e. new attributes will appear quite frequently during the code's lifecycle), you might want to ask yourself, whether a RDBMS is the best way to materialize this ... essentially non-schema. Maybe a document store would be a better fit?
If you do want to stay in the RDBMS world, the canonical answer is to have either one or one-per-datatype property table plus a table of properties:
Users.id | .name | .birthdate | .Gender | .someotherfixedattribute
----------------------------------------------------------
1743 | Me. | 01/01/1970 | M | indeed
Propertytpes.id | .name
------------------------
234 | pet
235 | hobby
Poperties.uid | .pid | .content
-----------------------------
1743 | 234 | Husky dog
You have a comment and an answer that recommend (or at least suggest) and Entity-Attribute-Value (EAV) model.
There is nothing wrong with using EAV if your attributes need to be dynamic, and your system needs to allow adding new attributes post-deployment.
That said, if your columns and relationships are all known up front, and they don't need to be dynamic, you are much better off creating an explicit model. It will (generally) perform better and will be much easier to maintain.
Instead of a wide table with a field per attribute, or many attribute tables, you could make a skinny table with many rows, something like:
Attributes (id,user_id,attribute_type,attribute_value)
Ultimately the best solution depends greatly on how the data will be used. People can only have one DOB, but maybe you want to allow for multiple addresses (billing/mailing/etc.), so addresses might deserve a separate table.
I have to generate a view that shows tracking across each month. The ultimate view will be something like this:
| Person | Task | Jan | Feb | Mar| Apr | May | June . . .
| Joe | Roof Work | 100% | 50% | 50% | 25% |
| Joe | Basement Work | 0% | 50% | 50% | 75% |
| Tom | Basement Work | 100% | 100% | 100% | 100% |
I already have the following tables:
Person
Task
I am now creating a new table to foreign key into the above 2 tables and i am trying to figure out the pros and cons of creating 1 or 2 tables.
Option 1:
Create a new table with the following Columns:
Id
PersonId
TaskId
Jan2012
Feb2012
Mar2012
Apr2013
or
Option 2:
have 2 seperate tables
One table for just
Id
PersonId
TaskId
and another table for just the following columns
Id
PersonTaskId (the id from table above)
MonthYearKey
MonthYearValue
So an example record would be
| 1 | 13 | Jan2011 | 100% |
where 13 would represent a specific unique Person and Task combination. This second way would avoid having to create new columns to continue over time (which seems right) but i also want to avoid overkill.
which would be a more scalable way to have this schema. Also, any other suggestions or more elegant ways of doing this would be great as well?
You can have a m2m table with data columns. I don't see a reason why you can't just put MonthYearKey, MonthYearValue on the same table with PersonId and TaskId
Id
TaskId
PersonId
MonthYearKey
MonthYearValue
It's possible too that you would want to move the MonthYearKey out into their own table, it really just comes down to common queries and what this data is used for.
I would note, you never want to design a schema where you are adding columns due to time. The first option would require maintenance all the time, and would become very difficult to query also.
Option 2 is definitely more scalable and is not overkill.
Option 1 would require you to add a new column every month and simple date based queries of your data would not be possible, e.g. Show me all people who worked at least 90% in any month last year.
The ultimate view would be generated from a particular query or view of your data.
I start new e-commerce web application (pet project) that sale both t-shirt and shoe. My store has only free size T shirt so t-shirt has only color column while shoe has columns for size and color.
Now it's time to create table to store that data, I want to know is it good to create separate table for shoe and t-shirt or it's better to keep all of them in one table?
If it has a better idea to store such data, please let me know.
You definitely don't want to create a Shoe table and a TShirt table. Your shop might grow, and one day you'll have a thousand such product tables. Writing SQL for that would be a nightmare. Plus, you might have different kinds of t-shirts eventually, some with color, some with size and color, and so on. If you create a new table for each, you'll lose track of them quickly, and if you don't, why have separate tables for t-shirts and shoes, but not for one-size t-shirts and multi-size t-shirts?
While designing your database, you should be asking yourself: what are the entities in my realm? what are the things that never change and are uniquely identifiable? In a shop, a particular item that can be sold at a particular price is one such entity. So you might have a products table that has a key for each particular item you sell, and maybe a name, a type, a size and a color column:
item
id | type | name | size | color
------------------------------------
1 | shoe | Marathon | 9 | white
2 | shoe | Marathon | 9 | black
Looking at this table, you notice that we have two entries for the highly successful Marathon running shoe, and that seems to be a normalization violation. Indeed, you probably have two entities a shippable item and a catalog product. The shoe "Marathon" is probably something that has one picture and one description in your store, followed by a "available in the following colors and sizes:" line. So now you have two tables:
product
id | type | name | supplier | picture | description
--------------------------------------------------------------------------------------
1 | shoe | Marathon | TrackNField Co. | marathon.jpg | Run faster than light!
2 | tshirt | FlowerPower | SF Shirts | fpower.jpg | If you're going to San Francisco...
item
id | product_id | size | color | price
--------------------------------------
1 | 1 | 9 | white | 99.99
2 | 1 | 9 | black | 99.99
3 | 2 | | blue | 19.99
The "type" column in the product table can be a tricky one. You'll probably want to display products by category, let the user click on "shoes" and get all products with type "shoe". Easy so far, but eventually someone will mistype an entry "sheo", and then you can't find that product under shoes anymore. So it's better to separate the categorization from the products, for example by having a product_type table:
product_type
id | name
---------
1 | shoe
product
id | type_id | name | supplier | picture | description
--------------------------------------------------------------------------------------
1 | 1 | Marathon | TrackNField Co. | marathon.jpg | Run faster than light!
with a reference to the type in the product table. That's ok as long as your type hierarchy stays shallow, but what if you want to have subcategories, like "sneaker", "basketball shoe", "suede shoe", and so on? One shoe might even belong to several of these subcategories. In that case you can try this
category
id | name | supercategory_id
------------------------------------
1 | shoe |
2 | running shoe | 1
product_category
product_id | category_id
------------------------
1 | 2
product
id | name | supplier | picture | description
--------------------------------------------------------------------------------------
1 | Marathon | TrackNField Co. | marathon.jpg | Run faster than light!
And if you want to display multiple hierarchies of categorizations (as most big ecommerce sites do these days), you'll have to come up with something even more sophisticated.
Keep them all in one table and have a type field. The reason to do it this way is so that your data structure is scalable: i.e. if there is a new type of product then instead of adding a new table and having to drastically change your application code, you just use the same table and simply add a type.
if you don't want make it to complex you can keep all of them in the same table and create another table called "ProductType" that tells you if it is a shoe or a t-shirt.
The relationship will be One-To-Many on the "ProductType" side as you can have the same type of product associated with more then one record on the product table(where you store all your products)