Archiving a one-to-one relationship in Ecto - database

I have two models: Actor and Room. This is just a minimal example.
# `Actor` Model
defmodule MyApp.Actor
use Ecto.Schema
schema "actors" do
field :actor_name, :string
field :status, Ecto.Enum, values: [:Active, :Archived]
belongs_to :room, Room
end
end
# `Room Model`
defmodule MyApp.Room
use Ecto.Schema
schema "rooms" do
field :room_number, :integer
has_one :actor, Actor
end
end
I want to model a one-to-one relationship i.e. A Room can have only one Actor at a given time. An Actor can come and leave a Room. What I want is that when an Actor leaves a Room, I still want to retain the reference to its previous Room relationship for archival purposes. In my case, an Actor is created immediately before it occupies a room, and then it has to be "archived" after leaving the room. Additionally, I have the field Actor.status to help me with my "archival" tasks.
Currently, if I add to the Room relationship for my Actor using %Room{} |> Ecto.build_assoc(:actor, %{actor_name: "Test"}), the relevant Actor is created with right pointer to the right Room; however, Room instance will always point to the first Actor object that was added to it.
If I use put_assoc, it fails due to foreign key constraints (and that of course makes sense because I have defaul behavior for :on_delete)
My constraint:
I want to be able to access the current/ most recent Actor that was build_assoc with a Room by something like some_room.actor and likewise some_actor.room. (At this point other dependencies rely on this architecture).
Ideally I stay within one-to-one association. One possible solution for me would be to add another column and manually manage it to map to it's original Room association when detaching from its Room and do some trickery when I do some_actor.room since this will fail for the old Actors.

Related

Dealing with 3NF in terms of database domain modelling where attributes are added knowing they create transitive dependency

I am currently working on setting up a database domain model, where in terms of normalization I will be challenged due to transitive dependency. However, for this particular model it is a choice, that we choose to add such transitive dependency for a reason, and I am wondering how you would go about dealing with such cases in the aspect of normalization?
Let me show what I mean:
I have a table called UserSubscription that have the following attributes:
id {dbgenerated}
created
user
price
currency
subscriptionid
The values for:
price
currency
Depend on the subscriptionid, which points to a second table Subscription (in which the subscriptionid is a FK reference to this tables PK). One might say why, would I even consider including duplicate values from the Subscription table into the UserSubscription table? Well the reason is that the Subscription might change at any point in time, and for reference we want to store the original value of the subscription in the UserSubscription so that even if it changes we still have the values that the user signed up for originally.
I know from the perspective of normalization, that this transitive dependency I create should be fixed, and ideally I would move the values back into the subscription table, and just not allow the values to be modified, and instead create a new subscription whenever it is necessary.
But ideally I do not want to create new subscriptions every time something needs to change in those that exist, simply because it is expected these change often - following say market competition values. At the same time for every new subscription created any user will have more to choose from.
This also means that if we no longer want to use a subscription, we would need to: Remove it, and Create a new one. This can be fixed by simply Updating, since we will no longer need the old one anyways.
The above is a school project, I just wonder whether it would ever be "ok" in terms of normalization to choose such approach, when I choose to do so by choice, and to reduce the tasks associated with removing and creating new subscriptions when I expect these would change frequently.
why don't you instead create a M:N table (mapping table) USER_SUBSCRIPTION where you will have the relationships between USER and SUBSCRIPTION ? You can store all values there historically, and don't have to remove/create anything with the change.. it the user decides to opt-out, you only update the flag_active, flag_deleted, flag_dtime_end, whatever works for you...
Here is a simple model for demonstration:
USER
id_user PK
name
... other details
SUBSCRIPTION
id_subscription PK
name
details
flag_active (TRUE|FALSE or 1|0 values)
... other details
USER_SUBSCRIPTION
id_user FK
id_subscription FK
dtime_start -- when the subscription started
dtime_end -- when the subscription ended
flag_valid (T|F or 1|0) -- optional, will give you a quick headsup about active subscriptions ... but this is sort of redundant, for you can get it from the dtime_start vs dtime_end .. up to you
This will give you a very generic (and therefore flexibile / scalable) model to work with users' subscriptions ... no duplications, all handled by FK/PK referential constraints, ... etc

How do I define a one-to-one relationship over a one-to-many relationship in a relational database?

I'm creating a database schema with the following tables (sorry for the bad pseudocode):
User
====
user_id, PK
Collection
==========
collection_id, PK
user_id, FK(User->user_id)
Issue
=====
issue_id, PK
collection_id, FK(Collection->collection_id)
There is a one-to-many relationship from User to Collection, and also from Collection to Issue. So, a single user may maintain multiple collections, each with many issues.
The problem: I would like to designate a "default" collection to be displayed when the user first logs in to the application. For the record, I'm doing this in the Django framework, but I'm more interested in the elegant platform-independent solution. When I try to make a column in User that is a Foreign Key to Collection, it complains that Collection does not exist yet (I suppose because User is created first). I could add a "default" boolean column to Collection and enforce through my application that only one record per User be "true", but that seems inelegant. I could also have a separate table, say, User_Default_Collection, which has user_id as a Foreign, Unique Key, and a collection_id column which is a Foreign Key to Collection. But I'm certain this is also less than 3rd normal form. Any suggestions?
If you want to enforce that every user must and will always have his "default" collection, then because of the obvious cycle in the inclusion dependencies you are forced into either deferred constraint checking (if your DBMS allows the FK cycle to be declared in the first place) or application-enforced integrity.
If you can tolerate users not having any default collection at all, then create a separate table DFT_COLL(userid, dft_coll_id) with key userid and FK's to both USER and COLLECTION.
If it gives you trouble in cases when a user has no default collection, maybe this can still be addressed by having the system just pick one (e.g. the one with the lowest [or highest] id) and implement this with a UNION view (so that if you need the default then you read the UNION view and you're guaranteed (*) to get some result).
(*) If the user has a collection at all, that is. Note that requiring a default collection and requiring that to exist, implies requiring at least one collection per user. (And the corollary of this is that if it must be allowed for a user to have no collection at all, it is nonsensical and a contradiction to require him to have a default one.)
The most plausible solution i think would be:
Add nullable "default" field to Collection table
Create UNIQUE constraint for used_id and default
Keep "true"-s and NULLs (no false's) in "default" column.
This will not allow for multiple Collections associated with the same user_id to have the same "default" value other than null. You don't need to develop any application logic. However, this design would not force you to always have a default collection for a user.

ER diagram that implements a database for trainee

I edited and remade the ERD. I have a few more questions.
I included participation constraints(between trainee and tutor), cardinality constraints(M means many), weak entities (double line rectangles), weak relationships(double line diamonds), composed attributes, derived attributes (white space with lines circle), and primary keys.
Questions:
Apparently to reduce redundant attributes I should only keep primary keys and descriptive attributes and the other attributes I will remove for simplicity reasons. Which attributes would be redundant in this case? I am thinking start_date, end_date, phone number, and address but that depends on the entity set right? For example the attribute address would be removed from Trainee because we don't really need it?
For the part: "For each trainee we like to store (if any) also previous companies (employers) where they worked, periods of employment: start date and end date."
Isn't "periods of employment: start date, end date" a composed attribute? because the dates are shown with the symbol ":" Also I believe I didn't make an attribute for "where they worked" which is location?
Also how is it possible to show previous companies (employers) when we already have an attribute employers and different start date? Because if you look at the Question Information it states start_date for employer twice and the second time it says start_date and end_date.
I labeled many attributes as primary keys but how am I able to distinguish from derived attribute, primary key, and which attribute would be redundant?
Is there a multivalued attribute in this ERD? Would salary and job held be a multivalued attribute because a employer has many salaries and jobs.
I believe I did the participation constraints (there is one) and cardinality constraints correctly. But there are sentences where for example "An instructor teaches at least a course. Each course is taught by only one instructor"; how can I write the cardinality constraint for this when I don't have a relationship between course and instructor?
Do my relationship names make sense because all I see is "has" maybe I am not correctly naming the actions of the relationships? Also I believe schedules depend on the actual entity so they are weak entities.... so does that make course entity set also a weak entity (I did not label it as weak here)?
For the company address I put a composed attribute, street num, street address, city... would that be correct? Also would street num and street address be primary keys?
Also I added the final mark attribute to courses and course_schedule is this in the right entity set? The statement for this attribute is "Each trainee identified by: unique code, social security number, name, address, a unique telephone number, the courses attended and the final mark for each course."
For this part: "We store in the database all classrooms available on the site" do i make a composed attribute that contains site information?
Question Information:
A trainee may be self-employed or employee in a company
Each trainee identified by:
unique code, social security number, name, address, a unique
telephone number, the courses attended and the final mark for each course.
If the trainee is an employee in a company: store the current company (employer), start date.
For each trainee we like to store (if any) also previous companies (employers) where they worked, periods of employment: start date and end date.
If a trainee is self-employed: store the area of expertise, and title.
For a trainee that works for a company: we store the salary and job
For each company (employer): name (unique), the address, a unique telephone number.
We store in the database all known companies in the
city.
We need also to represent the courses that each trainee is attending.
Each course has a unique code and a title.
For each course we have to store: the classrooms, dates, and times (start time, and duration in minutes) the course is held.
A classroom is characterized by a building name and a room number and the maximum places’ number.
A course is given in at least a classroom, and may be scheduled in many classrooms.
We store in the database all classrooms
available on the site.
We store in the database all courses given at least once in the company.
For each instructor we will store: the social security number, name, and birth date.
An instructor teaches at least a course.
Each course is taught by only one instructor.
All the instructors’ telephone numbers must also be stored (each instructor has at least a telephone number).
A trainee can be a tutor for one or many trainees for a specific
period of time (start date and end date).
For a trainee it is not mandatory to be a tutor, but it is mandatory to have a tutor
The attribute ‘Code’ will be your PK because it’s only use seems to be that of a Unique Identifier.
The relationship ‘is’ will work but having a reference to two tables like that can get messy. Also you have the reference to "Employers" in the Trainee table which is not good practice. They should really be combined. See my helpful hints section to see how to clean that up.
Company looks like the complete table of Companies in the area as your details suggest. This would mean table is fairly static and used as a reference in your other tables. This means that the attribute ‘employer’ in Employed would simply be a Foreign Key reference to the PK of a specific company in Company. You should draw a relationship between those two.
It seems as though when an employee is ‘employed’ they are either an Employee of a company or self-employed.
The address field in Company will be a unique address your current city, yes, as the question states the table is a complete list of companies in the city. However because this is a unique attribute you must have specifics like street address because simply adding the city name will mean all companies will have the same address which is forbidden in an unique field.
Some other helpful hints:
Stay away from adding fields with plurals on them to your diagram. When you have a plural field it often means you need a separate table with a Foreign Key reference to that table. For example in your Table Trainee, you have ‘Employers’. That should be a Employer table with a foreign key reference to the Trainee Code attribute. In the Employer Table you can combine the Self-employed and Employed tables so that there is a single reference from Trainee to Employer.
ERD Link http://www.imagesup.net/?di=1014217878605. Here's a quick ERD I created for you. Note the use of linker tables to prevent Many to Many relationships in the table. It's important to note there are several ways to solve this schema problem but this is just as I saw your problem laid out. The design is intended to help with normalization of the db. That is prevent redundant data in the DB. Hope this helps. Let me know if you need more clarification on the design I provided. It should be fairly self explanatory when comparing your design parameters to it.
Follow Up Questions:
If you are looking to reduce attributes that might be arbitrary perhaps phone_number and address may be ones to eliminate, but start and end dates are good for sorting and archival reasons when determining whether an entry is current or a past record.
Yes, periods_of_employment does not need to be stored as you can derive that information with start and end dates. Where they worked I believe is just meant to say previous employers, so no location but instead it’s meant that you should be able to get a list all the employers the trainee has had. You can get that with the current schema if you query the employer table for all records where trainee code equals requested trainee and sort by start date. The reason it states start_date twice is to let you know that for all ‘previous’ employers the record will have a start and end date. Hence the previous. However, for current employers the employment hasn't ended which means there will be no end_date so it will null. That’s what the problem was stating in my opinion.
To keep it simple PK’s are unique values used to reference a record within another table. Redundant values are values that you essentially don’t need in a table because the same value can be derived by querying another table. In this case most of your attributes are fine except for Final_Mark in the Course table. This is redundant because Course_Schedule will store the Final_Mark that was received. The Course table is meant to simply hold a list of all potential courses to be referenced by Course_Schedule.
There is no multivalued attributes in this design because that is bad practice Job and salary are singular and if and job or salary changes you would add a new record to the employer table not add to that column. Multivalued attributes make querying a db difficult and I would advise against it. That’s why I mentioned earlier to abstract all attributes with plurals into their own tables and use a foreign key reference.
You essentially do have that written here because Course_Schedule is a linker table meaning that it is meant to simplify relationships between tables so you don’t have many to many relationships.
All your relationships look right to me. Also since the schedules are linker tables and cannot exist without the supporting tables you could consider them weak entities. Course in this schema is a defined list of all courses available so can be independent of any other table. This by definition is not a weak entity. When creating this db you’d probably fill in the course table and it probably wouldn’t change after that, except rarely when adding or removing an available course option.
Yes, you can make address a composite attribute, and that would be right in your diagram. To be clear with your use of Primary key, just because an attribute is unique doesn’t make it a primary key. A table can have one and only one primary key so you must pick a column that you are certain will not be repeated. In this example you may think street number might be unique but what if one company leaves an address and another company moves into that spot. That would break that tables primary key. Typically a company name is licensed in a city or state so cannot be repeated. That would be a better choice for your primary key. You can also make composite primary keys, but that is a more advanced topic that I would recommend reading about at a later date.
Take final_mark out of courses. That’s table will contain rows of only courses, those courses won’t be linked to any trainee except by course_schedule table. The Final_Mark should only be in that table. If you add final_mark to Course table then, if you have 10 trainees in a course, You’d have 10 duplicate rows in the course table with only differing final_marks. Instead only hold the course_code and title that way you can assign different instructors, trainees and classrooms using the linker tables.
No composite attribute is needed using this schema. You have a Classroom table that will hold all available classrooms and their relevant information. You then use the Classroom_Schedule linker table to assign a given Classroom to a Course_Schedule. No attributes of Classroom can be broken down to simpler attributes.

When I should use one to one relationship?

Sorry for that noob question but is there any real needs to use one-to-one relationship with tables in your database? You can implement all necessary fields inside one table. Even if data becomes very large you can enumerate column names that you need in SELECT statement instead of using SELECT *. When do you really need this separation?
1 to 0..1
The "1 to 0..1" between super and sub-classes is used as a part of "all classes in separate tables" strategy for implementing inheritance.
A "1 to 0..1" can be represented in a single table with "0..1" portion covered by NULL-able fields. However, if the relationship is mostly "1 to 0" with only a few "1 to 1" rows, splitting-off the "0..1" portion into a separate table might save some storage (and cache performance) benefits. Some databases are thriftier at storing NULLs than others, so a "cut-off point" where this strategy becomes viable can vary considerably.
1 to 1
The real "1 to 1" vertically partitions the data, which may have implications for caching. Databases typically implement caches at the page level, not at the level of individual fields, so even if you select only a few fields from a row, typically the whole page that row belongs to will be cached. If a row is very wide and the selected fields relatively narrow, you'll end-up caching a lot of information you don't actually need. In a situation like that, it may be useful to vertically partition the data, so only the narrower, more frequently used portion or rows gets cached, so more of them can fit into the cache, making the cache effectively "larger".
Another use of vertical partitioning is to change the locking behavior: databases typically cannot lock at the level of individual fields, only the whole rows. By splitting the row, you are allowing a lock to take place on only one of its halfs.
Triggers are also typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do that could make this impractical. For example, Oracle doesn't let you modify the mutating table - by having separate tables, only one of them may be mutating so you can still modify the other one from your trigger.
Separate tables may allow more granular security.
These considerations are irrelevant in most cases, so in most cases you should consider merging the "1 to 1" tables into a single table.
See also: Why use a 1-to-1 relationship in database design?
My 2 cents.
I work in a place where we all develop in a large application, and everything is a module. For example, we have a users table, and we have a module that adds facebook details for a user, another module that adds twitter details to a user. We could decide to unplug one of those modules and remove all its functionality from our application. In this case, every module adds their own table with 1:1 relationships to the global users table, like this:
create table users ( id int primary key, ...);
create table users_fbdata ( id int primary key, ..., constraint users foreign key ...)
create table users_twdata ( id int primary key, ..., constraint users foreign key ...)
If you place two one-to-one tables in one, its likely you'll have semantics issue. For example, if every device has one remote controller, it doesn't sound quite good to place the device and the remote controller with their bunch of characteristics in one table. You might even have to spend time figuring out if a certain attribute belongs to the device or the remote controller.
There might be cases, when half of your columns will stay empty for a long while, or will not ever be filled in. For example, a car could have one trailer with a bunch of characteristics, or might have none. So you'll have lots of unused attributes.
If your table has 20 attributes, and only 4 of them are used occasionally, it makes sense to break the table into 2 tables for performance issues.
In such cases it isn't good to have everything in one table. Besides, it isn't easy to deal with a table that has 45 columns!
If data in one table is related to, but does not 'belong' to the entity described by the other, then that's a candidate to keep it separate.
This could provide advantages in future, if the separate data needs to be related to some other entity, also.
The most sensible time to use this would be if there were two separate concepts that would only ever relate in this way. For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. I accept that this is contrived example to demonstrate the point.
Another reason is that you want to specialize a concept in different ways. If you have a Person table and want to add the concept of different types of Person, such as Employee, Customer, Shareholder - each one of these would need different sets of data. The data that is similar between them would be on the Person table, the specialist information would be on the specific tables for Customer, Shareholder, Employee.
Some database engines struggle to efficiently add a new column to a very large table (many rows) and I have seen extension-tables used to contain the new column, rather than the new column being added to the original table. This is one of the more suspect uses of additional tables.
You may also decide to divide the data for a single concept between two different tables for performance or readability issues, but this is a reasonably special case if you are starting from scratch - these issues will show themselves later.
First, I think it is a question of modelling and defining what consist a separate entity. Suppose you have customers with one and only one single address. Of course you could implement everything in a single table customer, but if, in the future you allow him to have 2 or more addresses, then you will need to refactor that (not a problem, but take a conscious decision).
I can also think of an interesting case not mentioned in other answers where splitting the table could be useful:
Imagine, again, you have customers with a single address each, but this time it is optional to have an address. Of course you could implement that as a bunch of NULL-able columns such as ZIP,state,street. But suppose that given that you do have an address the state is not optional, but the ZIP is. How to model that in a single table? You could use a constraint on the customer table, but it is much easier to divide in another table and make the foreign_key NULLable. That way your model is much more explicit in saying that the entity address is optional, and that ZIP is an optional attribute of that entity.
not very often.
you may find some benefit if you need to implement some security - so some users can see some of the columns (table1) but not others (table2)..
of course some databases (Oracle) allow you to do this kind of security in the same table, but some others may not.
You are referring to database normalization. One example that I can think of in an application that I maintain is Items. The application allows the user to sell many different types of items (i.e. InventoryItems, NonInventoryItems, ServiceItems, etc...). While I could store all of the fields required by every item in one Items table, it is much easier to maintain to have a base Item table that contains fields common to all items and then separate tables for each item type (i.e. Inventory, NonInventory, etc..) which contain fields specific to only that item type. Then, the item table would have a foreign key to the specific item type that it represents. The relationship between the specific item tables and the base item table would be one-to-one.
Below, is an article on normalization.
http://support.microsoft.com/kb/283878
As with all design questions the answer is "it depends."
There are few considerations:
how large will the table get (both in terms of fields and rows)? It can be inconvenient to house your users' name, password with other less commonly used data both from a maintenance and programming perspective
fields in the combined table which have constraints could become cumbersome to manage over time. for example, if a trigger needs to fire for a specific field, that's going to happen for every update to the table regardless of whether that field was affected.
how certain are you that the relationship will be 1:1? As This question points out, things get can complicated quickly.
Another use case can be the following: you might import data from some source and update it daily, e.g. information about books. Then, you add data yourself about some books. Then it makes sense to put the imported data in another table than your own data.
I normally encounter two general kinds of 1:1 relationship in practice:
IS-A relationships, also known as supertype/subtype relationships. This is when one kind of entity is actually a type of another entity (EntityA IS A EntityB). Examples:
Person entity, with separate entities for Accountant, Engineer, Salesperson, within the same company.
Item entity, with separate entities for Widget, RawMaterial, FinishedGood, etc.
Car entity, with separate entities for Truck, Sedan, etc.
In all these situations, the supertype entity (e.g. Person, Item or Car) would have the attributes common to all subtypes, and the subtype entities would have attributes unique to each subtype. The primary key of the subtype would be the same as that of the supertype.
"Boss" relationships. This is when a person is the unique boss or manager or supervisor of an organizational unit (department, company, etc.). When there is only one boss allowed for an organizational unit, then there is a 1:1 relationship between the person entity that represents the boss and the organizational unit entity.
The main time to use a one-to-one relationship is when inheritance is involved.
Below, a person can be a staff and/or a customer. The staff and customer inherit the person attributes. The advantage being if a person is a staff AND a customer their details are stored only once, in the generic person table. The child tables have details specific to staff and customers.
In my time of programming i encountered this only in one situation. Which is when there is a 1-to-many and an 1-to-1 relationship between the same 2 entities ("Entity A" and "Entity B").
When "Entity A" has multiple "Entity B" and "Entity B" has only 1 "Entity A"
and
"Entity A" has only 1 current "Entity B" and "Entity B" has only 1 "Entity A".
For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. - I borrowed this example from #Steve Fenton's answer
Where a Driver can drive multiple Cars, just not at the same time. So the Car and Driver entities are 1-to-many or many-to-many. But if we need to know who the current driver is, then we also need the 1-to-1 relation.
Another use case might be if the maximum number of columns in the database table is exceeded. Then you could join another table using OneToOne

App Engine: how would you... snapshotting entities

Let's say you have two kinds, Message and Contact, related by a
db.ListProperty of keys on Message. A user creates a message, adds
some contacts as recipients, and emails the message. Later, the user
deletes one of the contact entities that was a recipient of the
message. Our application should delete the appropriate Contact
entity, but we want to preserve the original recipient list for the
message that was sent for the user's records. In essence, we want a
snapshot of the message entity at the time it was sent. If we naively
delete the contact entity, though, we lose snapshot integrity; if not,
we are left with an invalid key.
How would you handle this situation,
either in controller logic or model changes?
class User(db.Model):
email = db.EmailProperty(required=True)
class Contact(db.Model):
email = db.EmailProperty(required=True)
user = db.ReferenceProperty(User, collection_name='contacts')
class Message(db.Model):
recipients = db.ListProperty(db.Key) # contacts
sender = db.ReferenceProperty(User, collection_name='messages')
body = db.TextProperty()
is_emailed = db.BooleanProperty(default=False)
I would add a boolean field "deleted" (or something spiffier, such as the date and time of deletion) to the Contact model -- so that contacts are never physically deleted, but rather only "logically" deleted when that field is set. (This also lets you offer other cool features such as "show my old now-deleted contacts", "undelete" functionality, etc, if you wish).
This is a common approach in all storage systems that are required to maintain historical integrity (and/or similar requirements such as "auditability").
In cases where the sheer amount of logically deleted entities is threatening to damage system performance, the classic alternative is to have a separate, identical model "DeletedContacts", but foreign key constraints require more work, e.g. the Message class would have to have both recipients and deleted_recipients fiels if you needed foreign key integrity (but using just keys, as you're doing, this extra work would not be needed).
I doubt the average user will delete such a huge percentage of their contacts as to warrant the optimization explained in the last paragraph, so in this case I'd go with the simple "deleted" field.
Alternately, you could refactor your Contact model by moving the email address into the key name and setting the user as the parent entity. Your recipients property would change to a string list of raw email addresses. This gives you a static list of email recipients without having to fetch a set of corresponding entities for each one, or requiring that such entities still exist. If you want to fetch the contact entities, you can easily construct their keys from the user and the recipient address.
One limitation here is that the email address on an existing contact entity cannot be changed, but I think you have that problem anyway. Changing a contact address with your existing model would retroactively change the recipients of a sent message, which we know is a problem.

Resources