One to many vs Many to many relationship confusion? - database

I am trying to create a proper relationship on Hibernate and I have the following relationship between Recipe and Ingredients entities:
I thought that:
One recipe can have multiple ingredients
One ingredient can also be part of different recipes
In this situation, I would create many to many relationship.
However, by considering the unit and amount fields in the Ingredient entity, I think the amount of ingredient for a specific recipe may be changed later. In this situation, each ingredient should be belonging to a specific recipe. As a result, I create one to many relationship as shown on the image.
1. Is the approach (one to many) explained above true?
2. I also think that for a Category entity (that describes recipe categories e.g. vegetarian, diabetic, ...), I should use many to many relationship as the category is not identical for a specific recipe and when updating any category, all the related recipes should be affected. Is this true?

If you want to map a many to many relationship, then you need a relationship table.
So your database tables will be
recipe <-> recipe_ingredient <-> ingredient
As you want to have attributes on the relationship amount and unit will then go to the recipe_ingredient table.
This will also result in three classes. Recipe, RecipeIngreident and Ingredient.

The answer to both of your questions is yes. However, you should still consider many-to-many according to #Simon Martinelli's answer in this situation as you will eventually have many duplicate entries and data for Olive Oil across many different recipes which will only have unique IDs/Amounts/Units. However, this is just a suggestion and you are free to write your code however you wish.

Related

Do all relational database designs require a junction or associative table for many-to-many relationship?

I'm new to databases and trying to understand why a junction or association table is needed when creating a many-to-many relationship.
Most of what I'm finding on Stackoverflow and elsewhere describe it in either highly technical relational theory terms or it's just described as 'that's the way it's done' without qualifying why.
Are there any relational database designs out there that support having a many-to-many relationship without the use of an association table? Why is it not possible to have, for example, a column on on table that holds the relationships to another and vice a versa.
For example, a Course table that holds a list of courses and a Student table that holds a bunch of student info — each course can have many students and each student can take many classes.
Why is it not possible to have a column on each row in either table (possibly in csv format) that contains the relationships to the others in a list or something similar?
In a relational database, no column holds more than a single value in each row. Therefore, you would never store data in a "CSV format" -- or any other multiple value system -- in a single column in a relational database. Making repeated columns that hold instances of the same item (Course1, Course2, Course3, etc) is also not allowed. This is the very first rule of relational database design and is referred to as First Normal Form.
There are very good reasons for the existence of these rules (it is enormously easier to verify, constrain, and query the data) but whether or not you believe in the benefits the rules are, none-the-less, part of the definition of relational databases.
I do not know the answer to your question, but I can answer a similar question: Why do we use a junction table for many-to-many relationships in databases?
First, if the student table keeps track of which courses the student is in and the course keeps track of which students are in it, then we have duplication. This can lead to problems. What if a student knows it is in a course, but the course doesn't know that it has that student. Every time you made a course change you would have to make sure to change it in both tables. Inevitably this will not happen every time and the data will become inconsistent.
Second, where would we store this information? A list is not a possible type for a field in a database. So do we put a course column in the student table? No, because that would only allow each student to take one course, a many-to-one relationship from students to courses. Do we put a student column in the courses table? No, because then we have one student in each course.
What does work is having a new table that has one student and one course per row. This tells us that a student is in a class without duplicating any data.
"Junction tables" come from ER/ORM presentations/methods/products that don't really understand the relational model.
In the relational model (and in original ER information modeling) application relationships are represented by relations/tables. Each table holds tuples of values that are in that relationship to each other, ie that are so related, ie that satisfy that relationship, ie that participate in the relationship.
A relationship is expressed independently of any particular situation as a predicate, a fill-in-the-(named-)blanks statement. Rows that fill in the named blanks to give a true statement from the predicate in a particular situation go in the table. We pick sufficient predicates (hence base tables) to describe every situation. Both many-to-1 and many-to-many application relationships get tables.
The reason why you don't see a lot of many-to-many relationships along with columns about the participants rather than about their participation in the relationship is that such tables are better split into ones about the participants and one for the relationship. Eg columns in a many-to-many table that are about participants 1. can't say anything about entities that don't participate and 2. say the same thing about an entity every time it participates. Information modeling techniques that focus on identifying independent entity types first then relationships between them tend to lead to designs with few such problems. The reason why you don't see many-to-many relationships in two tables is that that is redundant and susceptible to the error of the tables disagreeing. The problem with collection-valued columns (sequences/lists/arrays) is that you cannot generically query about their parts using usual query notation and implementation because the DBMS doesn't see the parts organized into a table.
See this recent answer or this one.

I'm unable to normalize my Product table as I have 4 different product types

So because I have 4 different product types (books, magazines, gifts, food) I can't just put all products in one "products" table without having a bunch of null values. So I decided to break each product up into their own tables but I know this is just wrong (https://c1.staticflickr.com/1/742/23126857873_438655b10f_b.jpg).
I also tried creating an EAV model for this (https://c2.staticflickr.com/6/5734/23479108770_8ae693053a_b.jpg), but I got stuck as I'm not sure how to link the publishers and authors tables.
I know this question has been asked a lot but I don't understand ANY of the answer's I've seen. I think this is because I'm a very visual learner and this makes it hard to understand what's being talked about when not a lot of information is given.
Your model is on the right track, except that the product name should be sufficient you don't need Gift name, book name etc. What you put in those tables is the information that is specific to the type of product that the other products don't need. The Product table contains all the common fields. I would use productid in the child tables rather than renaming it giftID, magazineID etc. It is easier to remember what things are celled when you are consistent in nameing them.
Now to be practical, you put as much as you can into the product table especially if you are going to do calculations. I prefer the child tables in this specific case to have what is mostly display information. So product contains the product name, the cost, the type of product, the units the product is sold in etc. The stuff that generally is needed to calculate the cost of an order or to have a report of what was ordered. There may be one or two fields that can contain nulls, but it simplifies the calculation type queries so much it might be worth it.
The meat of the descriptive details though would go in the child table for the type of product. These would usually only be referenced when displaying the product in the shopping area and only one at a time, so you can use the product type to let you only join to the one child table you need for display. So while the order cares about the product number and name and cost calculations, it probably doesn't need to go line by line describing the book ISBN number or the megapixels in a camera. But the description page of the product does need those things.
This approach is not purely relational, although it mostly is, but it does group the information by the meanings of the data and how they will be used which will make the database easier to understand and query. I am a big fan of relational tables because database just work better when they hit at least the third normal form but sometimes you can go too far for practicality, so the meaning of the data and the way you are grouping to use the data (and not just for the user interface, but for later reporting as well) is almost always one of my considerations in design.
Breaking each product type into its own table is fine - let the child tables use the same id as the parent Product table, and create views for the child tables that join with Product
Your case is a classic case of types and subtypes. This is often called class/subclass in object modeling and generalization/specialization in ER modeling. It's a well understood pattern. There are known techniques for dealing with this pattern.
Visit the following tabs, and read the description under the info tab (presented as "learn more"). Also look over the questions grouped under these tags.
single-table-inheritance class-table-inheritance shared-primary-key
If you want to rean in more depth use these buzzwords to search for articles on the web.
You've already discovered and discarded single table inheritance on your own. Other answers have pointed you at shared primary key. Class table inheritance involves a single table for generalized data as well as the four specialized tables. Shared primary key is generally used in conjunction with class table inheritance.

Is this one-to-many or many-to-many?

Having trouble figuring out relations in this scenario:
I want to create a checkbox list for income types. The UI will present as "What types of income do you receive?". The choices, to keep things simple, could be full-time, part-time and retirement.
Part of me thinks this is a one-to-many relation, and thereby won't necessitate an association table because one individual can have one or more income types. However, taking things literally, "full-time" employment can relate to many individuals. In this case, I won't be showing a summary table of how many of the individuals are "full-time", I am just dealing with one person and determining what their employment status is.
But I don't think of "full-time" as an entity, like, for example, actors and movies - where many actors can be in many movies and many movies can have many different actors.
I guess what's tripping me up is that a user can select more than one option, as opposed to a radio-button list or drop down list.
In this case, which is it?
many-to-many: Person to Employment Type.
Many Persons may share a single Employment Type.
A single Person may have several Employment Types.
Having said that, I've no idea how rich is your business model, but I'd attach Employment Type to an entity called Employment that would refer Employment Type by a many-to-one association (rather than referring it straight from Person).
From my point of view this is a many-to-many relationship.
Full-Time is an entity (suppose a INCOME_TYPES table), exactly like an actor or a movie.
Since you tell us, you won't showing the things income-type-side but only individual-side, there are two alternatives:
De-normalize your schema and put 3-fields in the INDIVIDUALS table. This is not very nice.
If you do some of the things code-side, you can use a bitmask.
for example, 1 is for Full-time, 2 is for Part-Time and 4 is for retirement.
It depends on whether you have the income type as a separate table or whether it is just a string.
For separate table it is many-to-many: Each person has multiple income types. Each income type has multiple persons.

Database design

I am building a music streaming site, where users will be able to purchase and stream mp3's. I have a subset entity diagram which can be described as follows:
I want to normalise the data to 3NF. How many tables would I need? obviously I want to avoid including partial dependancies, which would require more tables than just album, artist, songs - but I'm not sure what else to add? Any thoughts from experience?
Well, you've done the ER level. You need to identify Keys and Attributes before you can work out Functional Dependencies. There is a fair amount of work to do before you get to 3NF. Eg. Song Titles are duplicated.
Also, there are questions:
is the site selling Albums, Songs, or both ? (I've modelled both)
if both, how do you track a sale or download ?
do you care about the same Song title recorded by different Artists ?
Anyway, here is a resolved ▶Entity Relation Diagram◀, at least for the info provided. It is closer to 5NF than 3NF, but I cannot declare it as such, because it is not complete.
Readers who are unfamiliar with the Standard for Modelling Relational Databases may find ▶IDEF1X Notational◀ useful.
It uses a simple Supertype-Subtype structure, the Principle of Orthogonal Design. The Item that is sold ie either an Album xor a Song.
Feel free to ask clarifying questions.
You will need 4 tables: Artists, Songs, Albums, and AlbumSongs.
The last one is required since the exact same song (=same edit/version...) could be included in several albums, so you have there a m-to-m relationship.
I agree with iDevelop but with 1 extra table. Here is how I would model it.
Tables: Artist, Song, Album, AlbumSongMap, SingleInfo
If the song was a released as a single on a different date, you can get that from SingleInfo. The single may have been released with some cover art that is different from the album art. You would store the singles art in SingleInfo. MAYBE a song can be released as a single multiple times, with new cover art or something so it could possibly be a 1-many relation. Otherwise it is 1-1.
If you can join Song with SingleInfo that means it was released as a single. If you can join Song with Album (using the map) then you will find all the album's it was released under.
A digital enhancement to an old song is a new song. (or at least a different binary). You may want to further normalize Song to allow storage of digital enhancements without duplicating songName, etc.
When you switch over from ER modeling to relational modeling (tables), you need one table for each entity. You also need a table for some relationships.
In the diagram you've given us, both relationships are many to one. Many to one relationships do not require a table. You can get away with adding foreign keys to entity tables. Therefore the answer to your question is 3 tables: Artists, Albums and Songs.
However, I question your ER diagram. It seems to me that the "contains" relationship is really many to many. An album clearly contains many songs. But a given song can appear on more than one album. So there should be an arrowhead on the line that connects "contains" to "album".
If you accept this revision to your ER model, then the number of tables increases to 4: Artists, Albums, Songs, and Contains.
A similar argument might be made for Artist and Song. If two artists collaborate on a single song, (e.g. Dolly Parton and Kenny Rogers singing "Islands in the Stream" together,
then you might want to model "produces" as a many to many relationship. Now you need 5 tables: Artists, Albums, Songs, Contains and Produces.
Artists, Albums, and Songs are going to require a PK that identifies the corresponding entity. Entity integrity demands that the correspondence bewteen entity instances and table rows be one-to-one.
The Contains and Produces tables can be built without a separate Id attibute. You will need a pair of FKs in each of these tables, and you can declare a compound PK for each table consisting of the two FKs.
Referential integrity demands that you enforce the validity of FK references, either in your programs or by declaring a references constraint in the DB. I strongly prefer declaring the constraint in the DB.

Is there a general term for a pairing of tables where one has header information and the other has detail lines?

Is there a general term for a pairing of tables where one has header information and the other has detail lines?
For example, a pair of tables describing sales orders, or a pair storing bill of materials data.
One-To-Many describes the "numbers".
But one could prefer Parent-Child in some contexts, typically when the Child always has a Parent...
I think it is called a one-to-many relationship.
Also, Master-Detail.
Remember that it is a good practice to use consistent naming conventions in your database for these types of tables. Doesn't really matter what names, consistency is really the key. The goal is to have a general idea of what the table holds based on the name. Some I've seen:
(Assume the table holds Products)
Product -- ProductDetail
ProductHeader -- ProductDetail
Product -- ProductLines
For me, Parent-Child names imply a hierarchical relationship, which is a whole 'nother ball of wax.

Resources