Terms for related DB records - database

Here's a DB schema:
users(id, name);
items(id, user, name), user is a foreign key
tags(item, name), item is a foreign key
Based on that schema:
each item has one user (one to one relationship)
each item may have multiple tags (one to many relationship)
Pretend you are a record from items.
What term would you use to refer to a record from users, based on your relationship? Would you call it a "one to one relative" or something else?
Similarly, what term would you use to refer to a bunch of tags? Would you call them "one to many relatives" or something else?

If the class is specific to items, you could call the user for an item an "owner", and the tags simply the "tags".
If the class is a generic record handler (e.g. an active record or data mapper), there aren't many English words that would apply. "Relative" and "relatives" are two, though they may not add much meaning to method names. There's a functional relationship between items and users, where the item is the argument/input and the user is the value/output, but these terms are poor descriptions. I suspect most any generic terms will suffer the same problem of having low descriptive power.
Taking inspiration from the term codomain (of a function), you could try using the prefix "co-".
In the OO world, the relationship between items and tags is an aggregation, which suggests that you could refer to the tags as an "aggregate" in method names.

I would say "the corresponding record".

Related

Weak entity with 2 different owner entities

Weak entity with 2 owner entities
I have a database I am trying to design with 3 entities: Store, Item, and Wishlist. Stores and Wishlists exist on their own, and are therefore strong. Items, however, will not exist unless they are either in stock at a store or on someone's wishlist.
What's the best practice in designing the ER diagram ?
What you are describing is really two one-many relationships with an extra constraint on participation ("must be in stock or on a wishlist"). This does not mean that an Item is a weak entity type. I think Item in this case must be strong. I don't think there is any standard ER notation for a participation constraint across mutiple entities unless the entities are all subtypes. I suggest you add a text notation to the diagram if it's important to explain that an item must either be in stock or on a wishlist but doesn't have to be both.
It is sometimes said that a weak entity cannot exist independently of a related entity. That alone is not what makes something weak however. An entity type is weak if it does not have its own identifier independent of another entity type. In your situation I would expect that an item could appear on more than one wishlist or be stocked in more than one store and therefore items must have a common identifier (e.g. a name or a UPC) independently of the wishlists and stores that include them. If that was not the case then wishlist items and stock would presumably have to be subtypes of item, which seems a bit unconventional.
If for some reason you did want to treat items of stock and items on wishlists as subtypes of items then the answer to your question would be a subtype relationship with mandatory participation, e.g. represented with double lines using Chen notation:

Should I create a reference field or a third content type to represent a many-to-many relationship in Drupal 8?

I have a student table and a course table that have a many-to-many relationship (a student can take many courses, and a course can be taken by many students).
If I am implementing the above data model as a database, I would create a third table to represent the many-to-many relationship.
But I want to implement the above data model in Drupal 8. I think that in Drupal 8 there are two ways to implement the above data model:
I can create a reference field in one of the two content types
(student or course) that points to the other content type.
I can create a third content type that have two reference fields
that points to the student and course content types.
Am I correct that these two ways are valid? and if I am correct, which one should I choose?
I think you're right, I would have suggested both ways.
As long as their connection doesn't have any additional parameters (e.g. date of subscription etc.), I would choose the entity reference within the students' content type.
Both of these are valid, and there's an additional option of having a corresponding reference field on each of your content types.
The best option comes down to maintain ability for your editors.
As per #c1u31355, if there is additional "connection" metadata, then a third content type is the way to go (or maybe a paragraph).
If it's a straight connection A <> B, and you only want the reference in one place, then ask yourself where is most convenient to add that data? Is it easier to create a course and then link to 30 students, or are you wanting to add courses to students as you create them? Quicker at creation, but harder to maintain.
Either way, using IEF as I suggested in one of your other questions will help.
As a final thought, having a 3rd content type could cause all sorts of issues unless you control the fields well, something like limiting it to only having one student with a bunch of courses, or vise versa, and in that case maintenance will be easier by just putting a reference field on the content type where you have restricted the field to one value.

When should I put an attribute in a separate table?

When should I put an attribute in a separate table? I mean is I have an attribute but whether I should put this with the rest of the attributes of a table person or whether I should put it in a separate table with person_ID as FK?
Secondly, when does an association class is formed? Can it form between a class and its multivariate attribute? Ex class book has attribute author. An author can write many books and a book can be written by many authors
You should put an attribute in a separate table whenever you expect that one person could have multiple of that attribute. Otherwise, there's not much reason to separate it, and there can be some conceptual overhead in doing so. (It can be very annoying when you have to write a query that retrieves five different attributes of a person, if each attribute is in its own table unnecessarily.)
You should create an association table between two tables whenever the relationship between them is many-to-many. Your author–book example is a good one.
It depends on what that attribute is dependant of. If it's an attribute of that entity you're creating (person) then it should go in the same table, but as you said in the case of a book where 1 book can have many authors and 1 author can write many books, you have to take into consideration the relationship between the entity and the attribute. Is it a 1 to 1 relationship, 1 to many or many to 1, etc.
This being said, if your person can only have 1 value for that attribute and that only 1 person can have that attribute, then it should go in the same table.
You should have an association table in the following situation:
1 person CAN live in more than 1 house at a time, (home and holiday house) but more people CAN also live in those houses.
Obviously, in the example above, we're ignoring the fact that 1 person cannot be in more than 1 place at a time.
As a general rule, the attribute should be dependant on "the key, the whole key and nothing but the key"
UPDATE for what #ruakh said:
It can create overhead if you separate attributes but a tool which you can use to accommodate this overhead is creating Views of the table. I'm not sure what database system you're using but MySQL has this feature. A View is a "virtual" table that you can create using SQL queries on the current database. You can combine multiple tables and you will query that View just as you would do with a normal table.

When I should use one to one relationship?

Sorry for that noob question but is there any real needs to use one-to-one relationship with tables in your database? You can implement all necessary fields inside one table. Even if data becomes very large you can enumerate column names that you need in SELECT statement instead of using SELECT *. When do you really need this separation?
1 to 0..1
The "1 to 0..1" between super and sub-classes is used as a part of "all classes in separate tables" strategy for implementing inheritance.
A "1 to 0..1" can be represented in a single table with "0..1" portion covered by NULL-able fields. However, if the relationship is mostly "1 to 0" with only a few "1 to 1" rows, splitting-off the "0..1" portion into a separate table might save some storage (and cache performance) benefits. Some databases are thriftier at storing NULLs than others, so a "cut-off point" where this strategy becomes viable can vary considerably.
1 to 1
The real "1 to 1" vertically partitions the data, which may have implications for caching. Databases typically implement caches at the page level, not at the level of individual fields, so even if you select only a few fields from a row, typically the whole page that row belongs to will be cached. If a row is very wide and the selected fields relatively narrow, you'll end-up caching a lot of information you don't actually need. In a situation like that, it may be useful to vertically partition the data, so only the narrower, more frequently used portion or rows gets cached, so more of them can fit into the cache, making the cache effectively "larger".
Another use of vertical partitioning is to change the locking behavior: databases typically cannot lock at the level of individual fields, only the whole rows. By splitting the row, you are allowing a lock to take place on only one of its halfs.
Triggers are also typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do that could make this impractical. For example, Oracle doesn't let you modify the mutating table - by having separate tables, only one of them may be mutating so you can still modify the other one from your trigger.
Separate tables may allow more granular security.
These considerations are irrelevant in most cases, so in most cases you should consider merging the "1 to 1" tables into a single table.
See also: Why use a 1-to-1 relationship in database design?
My 2 cents.
I work in a place where we all develop in a large application, and everything is a module. For example, we have a users table, and we have a module that adds facebook details for a user, another module that adds twitter details to a user. We could decide to unplug one of those modules and remove all its functionality from our application. In this case, every module adds their own table with 1:1 relationships to the global users table, like this:
create table users ( id int primary key, ...);
create table users_fbdata ( id int primary key, ..., constraint users foreign key ...)
create table users_twdata ( id int primary key, ..., constraint users foreign key ...)
If you place two one-to-one tables in one, its likely you'll have semantics issue. For example, if every device has one remote controller, it doesn't sound quite good to place the device and the remote controller with their bunch of characteristics in one table. You might even have to spend time figuring out if a certain attribute belongs to the device or the remote controller.
There might be cases, when half of your columns will stay empty for a long while, or will not ever be filled in. For example, a car could have one trailer with a bunch of characteristics, or might have none. So you'll have lots of unused attributes.
If your table has 20 attributes, and only 4 of them are used occasionally, it makes sense to break the table into 2 tables for performance issues.
In such cases it isn't good to have everything in one table. Besides, it isn't easy to deal with a table that has 45 columns!
If data in one table is related to, but does not 'belong' to the entity described by the other, then that's a candidate to keep it separate.
This could provide advantages in future, if the separate data needs to be related to some other entity, also.
The most sensible time to use this would be if there were two separate concepts that would only ever relate in this way. For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. I accept that this is contrived example to demonstrate the point.
Another reason is that you want to specialize a concept in different ways. If you have a Person table and want to add the concept of different types of Person, such as Employee, Customer, Shareholder - each one of these would need different sets of data. The data that is similar between them would be on the Person table, the specialist information would be on the specific tables for Customer, Shareholder, Employee.
Some database engines struggle to efficiently add a new column to a very large table (many rows) and I have seen extension-tables used to contain the new column, rather than the new column being added to the original table. This is one of the more suspect uses of additional tables.
You may also decide to divide the data for a single concept between two different tables for performance or readability issues, but this is a reasonably special case if you are starting from scratch - these issues will show themselves later.
First, I think it is a question of modelling and defining what consist a separate entity. Suppose you have customers with one and only one single address. Of course you could implement everything in a single table customer, but if, in the future you allow him to have 2 or more addresses, then you will need to refactor that (not a problem, but take a conscious decision).
I can also think of an interesting case not mentioned in other answers where splitting the table could be useful:
Imagine, again, you have customers with a single address each, but this time it is optional to have an address. Of course you could implement that as a bunch of NULL-able columns such as ZIP,state,street. But suppose that given that you do have an address the state is not optional, but the ZIP is. How to model that in a single table? You could use a constraint on the customer table, but it is much easier to divide in another table and make the foreign_key NULLable. That way your model is much more explicit in saying that the entity address is optional, and that ZIP is an optional attribute of that entity.
not very often.
you may find some benefit if you need to implement some security - so some users can see some of the columns (table1) but not others (table2)..
of course some databases (Oracle) allow you to do this kind of security in the same table, but some others may not.
You are referring to database normalization. One example that I can think of in an application that I maintain is Items. The application allows the user to sell many different types of items (i.e. InventoryItems, NonInventoryItems, ServiceItems, etc...). While I could store all of the fields required by every item in one Items table, it is much easier to maintain to have a base Item table that contains fields common to all items and then separate tables for each item type (i.e. Inventory, NonInventory, etc..) which contain fields specific to only that item type. Then, the item table would have a foreign key to the specific item type that it represents. The relationship between the specific item tables and the base item table would be one-to-one.
Below, is an article on normalization.
http://support.microsoft.com/kb/283878
As with all design questions the answer is "it depends."
There are few considerations:
how large will the table get (both in terms of fields and rows)? It can be inconvenient to house your users' name, password with other less commonly used data both from a maintenance and programming perspective
fields in the combined table which have constraints could become cumbersome to manage over time. for example, if a trigger needs to fire for a specific field, that's going to happen for every update to the table regardless of whether that field was affected.
how certain are you that the relationship will be 1:1? As This question points out, things get can complicated quickly.
Another use case can be the following: you might import data from some source and update it daily, e.g. information about books. Then, you add data yourself about some books. Then it makes sense to put the imported data in another table than your own data.
I normally encounter two general kinds of 1:1 relationship in practice:
IS-A relationships, also known as supertype/subtype relationships. This is when one kind of entity is actually a type of another entity (EntityA IS A EntityB). Examples:
Person entity, with separate entities for Accountant, Engineer, Salesperson, within the same company.
Item entity, with separate entities for Widget, RawMaterial, FinishedGood, etc.
Car entity, with separate entities for Truck, Sedan, etc.
In all these situations, the supertype entity (e.g. Person, Item or Car) would have the attributes common to all subtypes, and the subtype entities would have attributes unique to each subtype. The primary key of the subtype would be the same as that of the supertype.
"Boss" relationships. This is when a person is the unique boss or manager or supervisor of an organizational unit (department, company, etc.). When there is only one boss allowed for an organizational unit, then there is a 1:1 relationship between the person entity that represents the boss and the organizational unit entity.
The main time to use a one-to-one relationship is when inheritance is involved.
Below, a person can be a staff and/or a customer. The staff and customer inherit the person attributes. The advantage being if a person is a staff AND a customer their details are stored only once, in the generic person table. The child tables have details specific to staff and customers.
In my time of programming i encountered this only in one situation. Which is when there is a 1-to-many and an 1-to-1 relationship between the same 2 entities ("Entity A" and "Entity B").
When "Entity A" has multiple "Entity B" and "Entity B" has only 1 "Entity A"
and
"Entity A" has only 1 current "Entity B" and "Entity B" has only 1 "Entity A".
For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. - I borrowed this example from #Steve Fenton's answer
Where a Driver can drive multiple Cars, just not at the same time. So the Car and Driver entities are 1-to-many or many-to-many. But if we need to know who the current driver is, then we also need the 1-to-1 relation.
Another use case might be if the maximum number of columns in the database table is exceeded. Then you could join another table using OneToOne

a layman's term for identifying relationship

There are couples of questions around asking for difference / explanation on identifying and non-identifying relationship in relationship database.
My question is, can you think of a simpler term for these jargons? I understand that technical terms have to be specific and unambiguous though. But having an 'alternative name' might help students relate more easily to the concept behind.
We actually want to use a more layman term in our own database modeling tool, so that first-time users without much computer science background could learn faster.
cheers!
I often see child table or dependent table used as a lay term. You could use either of those terms for a table with an identifying relationship
Then say a referencing table is a table with a non-identifying relationship.
For example, PhoneNumbers is a child of Users, because a phone number has an identifying relationship with its user (i.e. the primary key of PhoneNumbers includes a foreign key to the primary key of Users).
Whereas the Users table has a state column that is a foreign key to the States table, making it a non-identifying relationship. So you could say Users references States, but is not a child of it per se.
I think belongs to would be a good name for the identifying relationship.
A "weak entity type" does not have its own key, just a "partial key", so each entity instance of this weak entity type has to belong to some other entity instance so it can be identified, and this is an "identifying relationship". For example, a landlord could have a database with apartments and rooms. A room can be called kitchen or bathroom, and while that name is unique within an apartment, there will be many rooms in the database with the name kitchen, so it is just a partial key. To uniquely identify a room in the database, you need to say that it is the kitchen in this particular apartment. In other words, the rooms belong to apartments.
I'm going to recommend the term "weak entity" from ER modeling.
Some modelers conceptualize the subject matter as being made up of entities and relationships among entities. This gives rise to Entity-Relationship Modeling (ER Modeling). An attribute can be tied to an entity or a relationship, and values stored in the database are instances of attributes.
If you do ER modeling, there is a kind of entity called a "weak entity". Part of the identity of a weak entity is the identity of a stronger entity, to which the weak one belongs.
An example might be an order in an order processing system. Orders are made up of line items, and each line item contains a product-id, a unit-price, and a quantity. But line items don't have an identifying number across all orders. Instead, a line item is identified by {item number, order number}. In other words, a line item can't exist unless it's part of exactly one order. Item number 1 is the first item in whatever order it belongs to, but you need both numbers to identify an item.
It's easy to turn an ER model into a relational model. It's also easy for people who are experts in the data but know nothing about databases to get used to an ER model of the data they understand.
There are other modelers who argue vehemently against the need for ER modeling. I'm not one of them.
Nothing, absolutely nothing in the kind of modeling where one encounters things such as "relationships" (ER, I presume) is "technical", "precise" or "unambiguous". Nor can it be.
A) ER modeling is always and by necessity informal, because it can never be sufficient to capture/express the entire definition of a database.
B) There are so many different ER dialects out there that it is just impossible for all of them to use exactly the same terms with exactly the same meaning. Recently, I even discovered that some UK university that teaches ER modeling, uses the term "entity subtype" for the very same thing that I always used to name "entity supertype", and vice-versa !
One could use connection.
You have Connection between two tables, where the IDs are the same.
That type of thing.
how about
Association
Link
Correlation

Resources