I'm struggling to find the best way to store entities with user-defined fields. I would like to be able to do queries on these fields, so I feel NoSQL may not be the best approach. Constant schema migrations seems like a pain, especially since different users may want different fields on similar entities.
For example, let's say we have an entity representing a village. The village has a name (West Town), a type (village), a population (114). The user may want to add their own attributes to the village, say, a nickname. This is not known up front, and may not be required for other villages.
The best technique I've come up with is a table for the entities, and then a separate table for "components" of the entities, consisting of: a component id, a foreign key to the entity it's on, the name of the component, and its value.
So, the village from the example would exist as:
Table 1 - Entity
ID
1
Table 2 - String Components
ID ENTITY_ID NAME VALUE
1 1 name West Town
2 1 type village
Table 3 - Integer Components
ID ENTITY_ID NAME VALUE
1 1 population 114
Then, if the user wanted to add a "nickname" to the village, they could push a button, select a string component, call it "nickname" and give it a value of "Wesson":
Table 2 - String Components
ID ENTITY_ID NAME VALUE
1 1 name West Town
2 1 type village
3 1 nickname Wesson
Then, when the entity needs to be displayed, we query the component tables for the entity ID, and display the information:
name: West Town
population: 114
type: village
nickname: Wesson
Is this crazy? It feels both sort of like an elegant way to represent a mutable schema in a relational database, and like trying to get around the whole point of a relational database. Is there a better way?
Answering my own question. This seems to generally be addressed using a pattern known as "entity-attribute-value" which is similar to what I've suggested.
The entities table could be a little richer, storing also information common to all entities, like "name" and maybe a foreign key into an "entity_type" table.
At its simplest, the attributes tables could be as above, with one for each data type.
https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
Related
I want you to know your opnion about this situation:
I have a table named "movie" with this colums
movie_id
name
price
...... etc
A movie can be available to rent, purchased or both.
If I want a movie available to rent and purchase the price change, for example:
Price for rent: $2.50
Price for purchase: $15.45
The question is:
Is better to make a duplicate in the table movie?
movie_id name price available_for ...... ........
1 300 $2.50 rent
2 300 $15.45 purchase
Or make another table adding the info of price and available_for? Like this:
Table Movie
movie_id name ...... .......... ..........
1 300
2 300
Table Movie_available_for
Id movie_id available_for price
1 1 rent $2.50
2 1 purchase $15.45
I want to know which is the best solution for this
Thanks!
Your relational approach might depend on what level of normalization you hope to achieve. Your question reminds me a lot of the Boyce–Codd normal form (BCNF) vs the 3rd normal form (3NF).
In fact, there is an example similar to your question on this wiki page: Boyce–Codd normal form (Wikipedia)
There is a lot of theory here, but it can many times come down to either what you feel the most comfortable with or whichever technique you can perform the most accurately.
Personally, in this specific case, I would go with the slightly more normalized form (your 2nd example). This is because, the "available_for" and "price" are related variables. If you end up adding more info about movies, that info is potentially going to be duplicated many times. If you add a third "availible_for" or different pricing schemes (1 day for $1.50, 5 days for $4), you will have very significant data duplication.
Besides, when it comes to code, it would be nice to have a movie object that has an array of nested "availible_for" (might name this something else like "offering" or something) objects.
I would suggest you normalize your available_for column as it is repeated and contains few fields only.Store that in another table and create a relation between two tables.
Movie_Available_type
id int, available_for varchar(50)
Then you can use either of two as pointed out by thoughtarray in above post.
I would go with:
Movie (movie_id PK, name, purchase_price, rent_price)
and make the pricing columns nullable. If you don't like nulls, you can decompose it into:
Movie (movie_id PK, name)
PurchasePrice (movie_id PK/FK, price)
RentPrice (movie_id PK/FK, price)
I am creating a database for a DVD rental shop, I have various entities that are related to this question, such as Film, FilmStar.
For each film, you record its unique number, title, the year in which it was made, its category (action adventure, science fiction, horror, romance, comedy, classic, children's), its director, and all stars that appeared in it. For each film, you also want to store the type of DVD hire (new release, classics, other).
I am mostly unsure about "all the stars that appeared in it". I first thought just having an attribute in the 'Film' Entity, for example filmStar and then each star would be inserted into that attribute, for example: "John Doe, Jane Doe" for each film. But then I realised that this wouldn't be 1NF as : "the domains of attributes must include only atomic values, the value of an attribute must be a single value from the domain of that attribute", as it contains more than one value and isn't atomic.
I then thought about having a separate entity that contains certain attributes such as: filmID, filmStarID. So John Doe would have the filmStarID of '0001' (all of this would be in the FilmStar entity, which is a separate entity). But then the same problem would occur, for example the filmID attribute would have all of the filmID's that the filmStar has starred in, for example: John Doe would have "101, 115, 009". Which again wouldn't be 1NF.
I was just wondering what your thoughts are on this?
What you're describing is a many-to-many relationship. Storing such a relationship would require a connecting table between the two related entities.
So you have two essential entities here:
Film
--------
ID
Title
etc.
CastMember
--------
ID
Name
etc.
Neither of these can store their relations to the other, because that would be a list of values rather than a single value. So the relationship itself essentially becomes an entity independent of the main entities. Something like this:
FilmCastMember
--------
FilmID
CastMemberID
NameInFilm
etc.
This relationship entity would be where you store any information specific to the relationship itself, but not descriptive of the entities being related. The lines above, for example, include NameInFilm which would be the character name played by that cast member in that film.
When should I put an attribute in a separate table? I mean is I have an attribute but whether I should put this with the rest of the attributes of a table person or whether I should put it in a separate table with person_ID as FK?
Secondly, when does an association class is formed? Can it form between a class and its multivariate attribute? Ex class book has attribute author. An author can write many books and a book can be written by many authors
You should put an attribute in a separate table whenever you expect that one person could have multiple of that attribute. Otherwise, there's not much reason to separate it, and there can be some conceptual overhead in doing so. (It can be very annoying when you have to write a query that retrieves five different attributes of a person, if each attribute is in its own table unnecessarily.)
You should create an association table between two tables whenever the relationship between them is many-to-many. Your author–book example is a good one.
It depends on what that attribute is dependant of. If it's an attribute of that entity you're creating (person) then it should go in the same table, but as you said in the case of a book where 1 book can have many authors and 1 author can write many books, you have to take into consideration the relationship between the entity and the attribute. Is it a 1 to 1 relationship, 1 to many or many to 1, etc.
This being said, if your person can only have 1 value for that attribute and that only 1 person can have that attribute, then it should go in the same table.
You should have an association table in the following situation:
1 person CAN live in more than 1 house at a time, (home and holiday house) but more people CAN also live in those houses.
Obviously, in the example above, we're ignoring the fact that 1 person cannot be in more than 1 place at a time.
As a general rule, the attribute should be dependant on "the key, the whole key and nothing but the key"
UPDATE for what #ruakh said:
It can create overhead if you separate attributes but a tool which you can use to accommodate this overhead is creating Views of the table. I'm not sure what database system you're using but MySQL has this feature. A View is a "virtual" table that you can create using SQL queries on the current database. You can combine multiple tables and you will query that View just as you would do with a normal table.
Imagine that you have to model incidents. There's four types of incidents (could be more). They all share some characteristics. So, the main shared fields would be in a supertype table called incidents and there will be one table per incident-type having incident_id as a FK. I could even have a incident type table to help me enforce that one incident id will be of one type only. This is all very text book, BUT, I'd like to know what's the best way to model the case when 3 of these incident types share a subset of fields and these fields are mandatory to them (so I can't put then in the supertype table, can I?). On top of that, some of those fields shared only by 3 of the incident types have a limited set of values (i.e. types of rock) and that's a typical look-up table.
So, should I repeat all these fields in the 3 incident tables and have a look-up table being Foreign keyed to these 3 tables? Should I have intermediary tables?
Thanks in advance,
Joe
I was simply wondering, how an ISA relationship in an ER diagram would translate into tables in a database.
Would there be 3 tables? One for person, one for student, and one for Teacher?
Or would there be 2 tables? One for student, and one for teacher, with each entity having the attributes of person + their own?
Or would there be one table with all 4 attributes and some of the squares in the table being null depending on whether it was a student or teacher in the row?
NOTE: I forgot to add this, but there is full coverage for the ISA relationship, so a person must be either a studen or a teacher.
Assuming the relationship is mandatory (as you said, a person has to be a student or a teacher) and disjoint (a person is either a student or a teacher, but not both), the best solution is with 2 tables, one for students and one for teachers.
If the participation is instead optional (which is not your case, but let's put it for completeness), then the 3 tables option is the way to go, with a Person(PersonID, Name) table and then the two other tables which will reference the Person table, e.g.
Student(PersonID, GPA), with PersonID being PK and FK referencing Person(PersonID).
The 1 table option is probably not the best way here, and it will produce several records with null values (if a person is a student, the teacher-only attributes will be null and vice-versa).
If the disjointness is different, then it's a different story.
there are 4 options you can use to map this into an ER,
option 1
Person(SIN,Name)
Student(SIN,GPA)
Teacher(SIN,Salary)
option 2 Since this is a covering relationship, option 2 is not a good match.
Student(SIN,Name,GPA)
Teacher(SIN,Name,Salary)
option 3
Person(SIN,Name,GPA,Salary,Person_Type)
person type can be student/teacher
option 4
Person(SIN,Name,GPA,Salary,Student,Teacher) Student and Teacher are bool type fields, it can be yes or no,a good option for overlapping
Since the sub classes don't have much attributes, option 3 and option 4 are better to map this into an ER
This answer could have been a comment but I am putting it up here for the visibility.
I would like to address a few things that the chosen answer failed to address - and maybe elaborate a little on the consequences of the "two table" design.
The design of your database depends on the scope of your application and the type of relations and queries you want to perform. For example, if you have two types of users (student and teacher) and you have a lot of general relations that all users can part take, regardless of their type, then the two table design may end up with a lot of "duplicated" relations (like users can subscribe to different newsletters, instead of having one M2M relationship table between "users" and newsletters, you'll need two separate tables to represent that relation). This issue worsens if you have three different types of users instead of two, or if you have an extra layer of IsA in your hierarchy (part-time vs full-time students).
Another issue to consider - the types of constraints you want to implement. If your users have emails and you want to maintain a user-wide unique constraint on emails, then the implementation is trickier for a two-table design - you'll need to add an extra table for every unique constraint.
Another issue to consider is just duplications, generally. If you want to add a new common field to users, you'll need to do it multiple times. If you have unique constraints on that common field, you'll need a new table for that unique constraint too.
All of this is not to say that the two table design isn't the right solution. Depending on the type of relations, queries and features you are building, you may want to pick one design over the other, like is the case for most design decisions.
It depends entirely on the nature of the relationships.
IF the relationship between a Person and a Student is 1 to N (one to many), then the correct way would be to create a foreign key relationship, where the Student has a foreign key back to the Person's ID Primary Key Column. Same thing for the Person to Teacher relationship.
However, if the relationship is M to N (many to many), then you would want to create a separate table containing those relationships.
Assuming your ERD uses 1 to N relationships, your table structure ought to look something like this:
CREATE TABLE Person
(
sin bigint,
name text,
PRIMARY KEY (sin)
);
CREATE TABLE Student
(
GPA float,
fk_sin bigint,
FOREIGN KEY (fk_sin) REFERENCES Person(sin)
);
and follow the same example for the Teacher table. This approach will get you to 3rd Normal Form most of the time.