When modelling large systems with some class graph or entity relationship diagram, there are often nodes at the edges of the graph. These may serve to describe the values that some attribute can take, but do not reference any other classes/tables themselves.
An example: say we want to model the gender of a person using a database table. We can then add any number of genders to this table; this allows for a flexible system. This table does not reference any others, but is referenced by person.
Is there a general name for such a concept? It's like a value object, but even more limited (because it's used to describe only a single selectable label)
They're commonly called lookup tables, or dimension tables in data warehouse terminology. In relational terms, they would correspond to user-defined domains.
Related
I'm new to databases and trying to understand why a junction or association table is needed when creating a many-to-many relationship.
Most of what I'm finding on Stackoverflow and elsewhere describe it in either highly technical relational theory terms or it's just described as 'that's the way it's done' without qualifying why.
Are there any relational database designs out there that support having a many-to-many relationship without the use of an association table? Why is it not possible to have, for example, a column on on table that holds the relationships to another and vice a versa.
For example, a Course table that holds a list of courses and a Student table that holds a bunch of student info — each course can have many students and each student can take many classes.
Why is it not possible to have a column on each row in either table (possibly in csv format) that contains the relationships to the others in a list or something similar?
In a relational database, no column holds more than a single value in each row. Therefore, you would never store data in a "CSV format" -- or any other multiple value system -- in a single column in a relational database. Making repeated columns that hold instances of the same item (Course1, Course2, Course3, etc) is also not allowed. This is the very first rule of relational database design and is referred to as First Normal Form.
There are very good reasons for the existence of these rules (it is enormously easier to verify, constrain, and query the data) but whether or not you believe in the benefits the rules are, none-the-less, part of the definition of relational databases.
I do not know the answer to your question, but I can answer a similar question: Why do we use a junction table for many-to-many relationships in databases?
First, if the student table keeps track of which courses the student is in and the course keeps track of which students are in it, then we have duplication. This can lead to problems. What if a student knows it is in a course, but the course doesn't know that it has that student. Every time you made a course change you would have to make sure to change it in both tables. Inevitably this will not happen every time and the data will become inconsistent.
Second, where would we store this information? A list is not a possible type for a field in a database. So do we put a course column in the student table? No, because that would only allow each student to take one course, a many-to-one relationship from students to courses. Do we put a student column in the courses table? No, because then we have one student in each course.
What does work is having a new table that has one student and one course per row. This tells us that a student is in a class without duplicating any data.
"Junction tables" come from ER/ORM presentations/methods/products that don't really understand the relational model.
In the relational model (and in original ER information modeling) application relationships are represented by relations/tables. Each table holds tuples of values that are in that relationship to each other, ie that are so related, ie that satisfy that relationship, ie that participate in the relationship.
A relationship is expressed independently of any particular situation as a predicate, a fill-in-the-(named-)blanks statement. Rows that fill in the named blanks to give a true statement from the predicate in a particular situation go in the table. We pick sufficient predicates (hence base tables) to describe every situation. Both many-to-1 and many-to-many application relationships get tables.
The reason why you don't see a lot of many-to-many relationships along with columns about the participants rather than about their participation in the relationship is that such tables are better split into ones about the participants and one for the relationship. Eg columns in a many-to-many table that are about participants 1. can't say anything about entities that don't participate and 2. say the same thing about an entity every time it participates. Information modeling techniques that focus on identifying independent entity types first then relationships between them tend to lead to designs with few such problems. The reason why you don't see many-to-many relationships in two tables is that that is redundant and susceptible to the error of the tables disagreeing. The problem with collection-valued columns (sequences/lists/arrays) is that you cannot generically query about their parts using usual query notation and implementation because the DBMS doesn't see the parts organized into a table.
See this recent answer or this one.
While studying for my IT exam I came across the following sentence:
"A collection of fields that store information about a certain entity, is a record. A record is a whole row of fields."
..but I have always thought that the correct term for an "object" in a database is an "entity".
So is the correct term an "entity" or a "record"? Or are they the same?
In that sentence, entity doesn't refer to anything in the database. It's using entity to refer to a conceptual object, whatever thing in the real world the database record represents. For instance, if you have an inventory database, each row stands for a product in the warehouse, and that's the entity.
An entity is defined as “something that exists as a particular and
discrete unit.” In terms of identity management, an entity is the
logical relationship between two or more records. [...] An entity is
also called a “linkage set.” There can be an unlimited number of
records in an entity or linkage set.
Source
Along these lines, an entity can be a set of records in a table or even across different tables.
I would say that an entity concept is physicalised by 1 or more tables e.g.
a product concept might be encapsulated entirely in 1 table
a person concept might be spread across several tables, for example due to normalisation - all information relating to a person might not exist in the same table.
Every time I program I recognize this relationship between classes and tables, or am I imagining it.
You can have a class per database table or a table per class i.e. :
tables: customer, products, order.
classes: customer, products, order, may have methods such as addRecord, deleteRecord, updateRecord.
what is this called? Object-Relational? I am not a DBA.
It all depends on the type of database you're using. If you're using an object oriented database (OODB), then there is no relationship, as the objects and the persisted data are the same thing. For example, if you have a Customer class, and you save it in an OODB, then that instance of the customer is what is stored in the DB.
If you are using a relational database, then the class instances, and the persisted representation of them in the DB, can be the same thing, but many times they aren't. This is because most folks use normalization to represent their data in an efficient way (in a relational DB). This means, instead of having a table per class, you can have a class represented by more than one table. In the Customer example, the tables might now be Customer (with Name, date of birth, and other properties), and Order (with order pointing to products in yet another table). The reason for this has to do with cardinality, and the ability for Customers to have more than one order. When your business logic needs this information from the DB, the data access layer's job is to map the data (called ORM) from the DB into your classes.
If you are using yet another type of DB, then there will be a different relationship between the classes (domain model) and what's persisted in the DB.
But, as far as having a name for this relationship? No, there is no name.
In additon to Bob's answer, the following.
In object modeling, the relationship between classes and subclasses is taken care of by inheritance, and object modelers know how to use inheritance to good advantage. The relational data model and by extension the SQL databases do not implement inheritance for you. You have to design tables to give you some of the same results.
In ER (Entity-Relationship) modeling, the corresponding concept is called generalization/specialization. This tells you how to model a class/subclass relationship, but it doesn't tell you how to design the tables when you go to build your database.
There are three techniques that are pretty well understood that can be really helpful when dealing with classes and subclasses. Here are their tags: single-table-inheritance class-table-inheritance shared-primary-key. Unfortunately, many tutorials on database design never cover these techniques. They can be enormously useful to people who know object modeling and want to come up to speed on relational modeling.
I am designing a database and as i do not have much experience in this subject, i am faced with a problem which i do not know how to go about solving.
In my conceptual model i have an object known as "Vehicle" which the customer orders and the stock system monitors. This supertype has two subtypes "Motorcar" and "Motorcycle". The user can order one or the other or even both.
Now that i am at the logical design stage, i need to know how i can have the system allow for two different types of products. The problem i have is that if i put each of the objects separate attributes into the same relation, then i will have columns that are of no use to some objects.
For example, if i just have a generic table holding both "Motorcars" and "Motorcycles" which i call "Vehicles" and all of their attributes, the cars will not need some of the motorcycle attributes and the motorcycle will not need all of the car attributes.
Is there a way to solve this issue?
The decision will need to be guided by the amount of shared information. I would start by identifying all the attributes and the rules about them.
If the majority of information is shared, you might not split into multiple tables. On the other hand, you can always split tables and then join into a view for ease of use.
For instance, you might have a vehicle table with only share information, and then a motorcar table with a foreign key to the vehicles table and a motorcycle table with a foreign key to the vehicles table. There is a certain difficulty ensuring that you don't have a motorocar row AND a motorcycle row referring to the same vehicle, and so there are other possibilities to mitigate that - but all that is unnecessary if the majority of information is common, you just have unused columns in a single vehicles table. You can even enforce with constraints to ensure that columns are NULL for types where they should not be filled in.
Excuse the poor title, i can not think of the correct term.
I have a database structure that represents objects, objects have types and properties.
Only certain properties are available to certain types.
i.e.
Types - House, Car
Properties - Colour, Speed, Address
Objects of type car can have both colour and speed properties, but objects of type House can only have colour, address. The Value for the combination of object, type, property is stored in a values table.
All this works, relationships enforce the above nicely.
My dilemma is that I have another table i.e Addresses. This table has AddressID.
I want to somehow join my address table to my object values table.. is there a neat way to achieve this??
[UPDATE] - More detail
I already have 5 tables. i.e.
Object
Properties
ObjectTypes
ObjectPropertyValues
ObjectTypeProperties
These tables have relationships which lock which property values can be assigned to each type of object.
An object maybe have a name of 'Ferrari' and the type would be 'car' and because the type is car I can set a value for the colour property.
The value though is numeric and I want to be able to join to a colourcodes table to match the id.
First, a "relation" in Relational Databases is a table - it does not refer to the relationships between tables. A relation defines how pieces of data are related - to a key.
In relational modeling, each entity is normalized, so one model for you would be 4 tables:
Car (Colour-FK, Address-FK)
House (Colour-FK, Speed)
Colour (Colour-PK)
Address (Address-PK, Address-Data)
In relational model, cars are not houses and you typically would be extremely unlikely to model them in the same table.
One might argue, that in fact, the valid colours for houses and cars are very different (since the paints are not equivalent), and thus one would not ever combine the two tables based on colour in a real world application.
Possible other modelling considerations might be where the car is garaged - i.e. an FK to a House or an FK to an Address - interesting problem there. Then if you had keys to cars and houses, they could both be part of key rings, in which case you would probably model with link-tables representing the keys.