I'm new in programming and database. Currently working on an inventory project which involves Department (Physics, Chemistry etc), Category (Physics -> Heat, Chemistry -> Organic) and actual Lab Items (Physics -> Heat -> Match Sticks, Chemistry -> Organic -> Hexane Solution). How should my database diagram look like such that I can do a search of list of items based on Department and Category and while adding item, they are classified under the correct Department and Category. I'm taking of creating a Junction table (Department-Category-Item) linking to Department, Category and Item Details(doesn't contain DeptID and CatID) table.
Am i on the right track??
Hope someone can help to clarify.
Much thanks in advance.
Chris
Yes you are on the right track. If it's right that:
One departmant can have a couple of categories and one category is always connected to one department
One Category can have a couple of items and one item is always connected to one category.
Then you should do some model like this:
Department
Id | Name
---|-------
1 | Physics
2 | Chemistry
Category
Id | DepartmentId |Name
---|--------------|-----
1 | 1 |Heat
2 | 1 |...
Item
Id | CategoryId | Name
---|------------|------
1 | 2 | Match Sticks
2 | 1
You'll get your "junction" with the foreign keys (DepartmentId, CategoryId). You'll add items corrctly by entering the respective foreign key.
P.S.: If one of your relations is n:n (like a item can be in a couple of categories) then you'll need a new entity/table between those.
Related
So I am trying to wrap my head around the whole "normalization" thing. To understand it better, I have come up with a this case of storing songs
Suppose I have the following db:
Album Table:
album_name| genre
album_1| genre_1, genre_2
album_2| genre_1
album_3| genre_2
To normalize, I thought of the following approach
Album Table:
album_name| genre_id
album_1| 3
album_2| 1
album_3| 2
Genre Table:
genre_id| genre_1| genre_2
0| false| false
1| true| false
2| false| true
3| true| true
Thus, if a new genre pops up, all I need to do is create a new column in genre table and the new corresponding genre_id can be assigned. Well, that will require filling up of all possible combinations, but that will only happen once for every new genre introduced.
Also, what I thought of, will that be considered "normalizing"? From the examples I have seen around, I haven't seen creation of tables with columns that were originally data.
The canonical way of doing this would be to use three tables:
Album |
album_id | album_name (and maybe other columns)
1 | Rumours
2 | Thriller
3 | To the Moon and Back
Genre
genre_id | genre_name (also maybe other columns)
1 | rock
2 | pop
3 | alternative
AlbumGenre
album_id | genre_id
1 | 1
1 | 2
2 | 2
3 | 2
3 | 3
Normalization is all about avoiding the storage of repetitive data. If you scrutinize this design, you will see that information about albums and genres is stored only once, in each respective table. Then, the AlbumGenre table stores the relationships between albums and the various genres. This table is usually called a "bridge" table, because it links albums to their genres.
The problem with your proposed Genre table is that it repeats information about relationships even if those relationships don't exist. Furthermore, this approach won't scale well at all if you need to add more genres to the database.
The relationship you defined is a many to many relationship. In general you don't want to be adding new columns when you add new data. So we need to look at another solution.
First we define tables for the Albums and Genres:
Album Table:
album_id | album_name
1 | album_1
2 | album_2
3 | album_3
Genre Table:
genre_id | genre_name
1 | genre_1
2 | genre_2
3 | genre_3
Now we need to link those two. We use a junction table to do that. Each instance of a genre belonging to an album will have a row in this table. So albums could be listed in this table multiple times.
Album Genres Junction Table:
album_genre_junction_id | album_id | genre_id
1 1 1
2 1 2
3 2 1
4 3 2
This is about database structure. (inheritance)
Say you have Place and Restaurant and Cafe are two subtypes of place.
You can create a Place table to hold a common info of the subtypes.
and create a foreign key to connect to Retaurant or Cafe instance.
or
You can duplicate stuff in Restaurant and Cafe
I'm coming from Django background, and many seem to prefer #2 over #1.
Is there a compelling scenario where you should pick one over another?
One scenario I think I need the #1 is when you are going to sort all Places collectively. (Can we use #2 for this?)
I think I would go for #2, as you don't have to think about relations and foreign keys and the model itself is complete, so you could just copy the database and use it for something else.
Further you just have to query one table instead of two.
If you need to sort Restaurant and Cafe, you can use the SQL UNION Operator.
Let's assume you have these two simple tables:
restaurant
id | name | likes
-------------------------
1 | Steakhouse | 5
2 | Italian Food | 3
cafe
id | name | likes
--------------------------
1 | Starbucks | 0
You can query them using the UNION operator like this:
SELECT * FROM cafe
UNION
SELECT * FROM restaurant
ORDER BY likes DESC
Which will return a list of cafes and restaurants ordered by likes as if they are coming from the same table.
I'm trying to create a friendship site. The issue I'm having is when a user joins a website they have to fill out a form. This form has many fixed drop down items the user must fill out. Here is an example of one of the drop downs.
Drop Down (Favorite Pets)
Items in Favorite Pets
1. Dog
2. Cat
3. Bird
4. Hampster
What is the best way to store this info in a database. Right now the profile table has a column for each fixed drop down. Is this correct database design. See Example:
User ID | Age | Country | Favorite Pet | Favorite Season
--------------------------------------------------------------
1 | 29 | United States | Bird | Summer
Is this the correct database design? right now I have probably 30 + columns. Most of the columns are fixed because they are drop down and the user has to pick one of the options.
Whats the correct approach to this problem?
p.s. I also thought about creating a table for each drop down but this would really complex the queries and lead to lots of tables.
Another approach
Profile table
ID | username | age
-------------------
1 | jason | 27
profileDropDown table:
ID | userID | dropdownID
------------------------
1 | 1 | 2
2 | 1 | 7
Drop Down table:
ID | dropdown | option
---------------------
1 | pet | bird
2 | pet | cat
3 | pet | dog
4 | pet | Hampster
5 | season | Winter
6 | Season | Summer
7 | Season | Fall
8 | Season | spring
"Best way to approach" or "correct way" will open up a lot of discussion here, which risks this question being closed. I would recommend creating a drop down table that has a column called "TYPE" or "NAME". You would then put a unique identifier of the drop down in that column to identify that set. Then have another column called "VALUE" that holds the drop down value.
For example:
ID | TYPE | VALUE
1 | PET | BIRD
2 | PET | DOG
3 | PET | FISH
4 | SEASON | FALL
5 | SEASON | WINTER
6 | SEASON | SPRING
7 | SEASON | SUMMER
Then to get your PET drop down, you just select all from this table where type = 'PET'
Will the set of questions (dropdowns) to be asked every user ever be changed? Will you (or your successor) ever need to add or remove questions over time? If no, then a table for users with one column per question is fine, but if yes, it gets complex.
Database purists would require two tables for each question:
One table containing a list of all valid answers for that question
One table containing the many to many relation between user and answer to “this” question
If a new question is added, create new tables; if a question is removed, drop those tables (and, of course, adjust all your code. Ugh.) This would work, but it's hardly efficient.
If, as seems likely, all the questions and answer sets are similar, then a three-table model suggests itself:
A table with one row per question (QuestionId, QuestionText)
A table with one row for each answer for each Question (QuestionId, AnswerId, AnswerText)
A table with one row for each user-answered question (UserId, QuestionId, AnswerId)
Adding and removing questions is straightforward, as is identifying skipped or unanswered questions (such as, if you add a new question a month after going live).
As with most everything, there’s a whole lot of “it depends” behind this, most of which depends on what you want your system to do.
Context: simple webapp game for personal learning purposes, using postgres. I can design it however I want.
2 tables 1 view (there are additional tables view references that aren't important)
Table: Research
col: research_id (foreign key to an outside table)
col: category (integer foreign key to category table)
col: percent (integer)
constraint (unique combination of the three columns)
Table: Category
col: category_id (primary key auto inc)
col: name(varchar(255))
notes: this table exists to capture the 4 categories of research I want in business logic and which I assume is not best practice to hardcode as columns in the db
View: Research_view
col: research_id (from research table)
col: foo1 (one of the categories from category table)
col: foo2 (etc...)
col: other cols from other joins
notes:has insert/update/delete statements that uses above tables appropriately
The research table itself I worry qualifies as a "Skinny Table" (hadn't heard the term until I just saw it in the Ibatis manning book). For example test data within it looks like:
| research_id | percent | category |
| 1 | 25 | 1 |
| 1 | 25 | 2 |
| 1 | 25 | 3 |
| 1 | 25 | 4 |
| 2 | 20 | 1 |
| 2 | 30 | 2 |
| 2 | 25 | 3 |
| 2 | 25 | 4 |
1) Does it make sense to have all columns in a table collectively define unique entries?
2) Does this 'smell' to you?
Couple of notes to start:
constraint (unique combination of the three columns)
It makes no sense to have a unique constraint that includes a single-column primary key. Including that column will cause every row to be unique.
notes: this table exists to capture the 4 categories of research I want in business logic and which I assume is not best practice to hardcode as columns in the db
If a research item/entity is required to have all four categories defined for it to be valid, they should absolutely be columns in the research table. I can't tell definitively from your statement whether this is the case or not, but your assumption is faulty if looked at in isolation. Let your model reflect reality as closely as possible.
Another factor is whether it's a requirement that additional categories may be added to the system post-deployment. Whether the categories are intended to be flexible vs. fixed should absolutely influence the design.
1) Does it make sense to have all columns in a table collectively
define unique entries?
I would say it's not common, but can imagine there are situations where it might be appropriate.
2) Does this 'smell' to you?
Hard to say without more details.
All that said, if the intent is to view and add research items with all four categories, I would say (again) that you should consider whether the four categories are semantically attributes of the research entity.
As a random example, things like height and weight might be considered categories of a person, but they would likely be stored flat on the person table, and not in a separate table.
I'm putting together a database that I need to normalize and I've run into an issue that I don't really know how to handle.
I've put together a simplified example of my problem to illustrate it:
Item ID___Mass___Procurement__Currency__________Amount
0__________2kg___inherited____null________________null
1_________13kg___bought_______US dollars_________47.20
2__________5kg___bought_______British Pounds______3.10
3_________11kg___inherited____null________________null
4__________9kg___bought_______US dollars__________1.32
(My apologies for the awkward table; new users aren't allowed to paste images)
In the table above I have a property (Amount) which is functionally dependent on the Item ID (I think), but which does not exist for every Item ID (since inherited items have no monetary cost). I'm relatively new to databases, but I can't find a similar issue to this addressed in any beginner tutorials or literature. Any help would be appreciated.
I would just create two new tables ItemProcurement and Currencies.
If I'm not wrong, as per the data presented, the amount is part of the procurement of the item itself (when the item has not been inherited), for that reason I would group the Amount and CurrencyID fields in the new entity ItemProcurement.
As you can see, an inherited item wouldn't have an entry in the ItemProcurement table.
Concerning the main Item table, if you expect just two different values for the kind of procurement, then I would use a char(1) column (varying from B => bougth, I => inherited).
I would looks like this:
The data would then look like this:
TABLE Items
+-------+-------+--------------------+
| ID | Mass | ProcurementMethod |
|-------+-------+--------------------+
| 0 | 2 | I |
+-------+-------+--------------------+
| 1 | 13 | B |
+-------+-------+--------------------+
| 2 | 5 | B |
+-------+-------+--------------------+
TABLE ItemProcurement
+--------+-------------+------------+
| ItemID | CurrencyID | Amount |
|--------+-------------+------------+
| 1 | 840 | 47.20 |
+--------+-------------+------------+
| 2 | 826 | 3.10 |
+--------+-------------+------------+
TABLE Currencies
+------------+---------+-----------------+
| CurrencyID | ISOCode | Description |
|------------+---------+-----------------+
| 840 | USD | US dollars |
+------------+---------+-----------------+
| 826 | GBP | British Pounds |
+------------+---------+-----------------+
Not only Amount, everything is dependent on ItemID, as this seems to be a candidate key.
The dependence you have is that Currency and Amount are NULL (I guess this means Unknown/Invalid) when the Procurement is 'inherited' (or 0 cost as pointed by #XIVsolutions and as you mention "inherited items have no monetary cost")
In other words, iems are divided into two types (of procurements) and items of one of the two types do not have all attributes.
This can be solved with a supertype/subtype split. You have a supertype table (Item) and two subtype tables (ItemBought and ItemInherited), where each one of them has a 1::0..1 relationship with the supertype table. The attributes common to all items will be in the supertype table and every other attribute in the respecting subtype table:
Item
----------------------------
ItemID Mass Procurement
0 2kg inherited
1 13kg bought
2 5kg bought
3 11kg inherited
4 9kg bought
ItemBought
---------------------------------
ItemID Currency Amount
1 US dollars 47.20
2 British Pounds 3.10
4 US dollars 1.32
ItemInherited
-------------
ItemID
0
3
If there is no attribute that only inherited items have, you even skip the ItemInherited table altogether.
For other questions relating to this pattern, look up the tag: Class-Table-Inheritance. While you're at it, look up Shared-Primary-Key as well. For a more concpetual treatment, google on "ER Specialization".
Here is my off-the-cuff suggestion:
UPDATE: Mass would be a Float/Decimal/Double depending upon your Db, Cost would be whatever the optimal type is for handling money (in SQL Server 2008, it is "Money" but these things vary).
ANOTHER UPDATE: The cost of an inherited item should be zero, not null (and in fact, there sometime IS an indirect cost, in the form of taxes, but I digress . . .). Therefore, your Item Table should require a value for cost, even if that cost is zero. It should not be null.
Let me know if you have questions . . .
Why do you need to normalise it?
I can see some data integrity challenges, but no obvious structural problems.
The implicit dependency between "procurement" and the presence or not of the value/currency is tricky, but has nothing to do with the keys and so is not a big deal, practically.
If we are to be purists (e.g. this is for homework purposes), then we are dealing with two types of item, inherited items and bought items. Since they are not the same type of thing, they should be modelled as two separate entities i.e. InheritedItem and BoughtItem, with only the columns they need.
In order to get a combined view of all items (e.g. to get a total weight), you would use a view, or a UNION sql query.
If we are looking to object model in the database, then we can factor out the common supertype (Item), and model the subtypes (InheritedItem, BoughtItem) with foreign-keys to the supertype table (ypercube explanation below is very good), but this is very complicated and less future-proof than only modelling the subtypes.
This last point is the subject of much argument, but practically, in my experience, modelling concrete supertypes in the database leads to more pain later than leaving them abstract. Okay, that's probably waaay beyond what you wanted :).