I need to track the version history of several entities, to be able to see the state of all its properties at any given point in its history, that includes their one to many relationships.
Entities & Relationships:
Person
+++++++++++
id
person_code
name
other
Address
+++++++++++
id
person_code
street
city
state
address_type
Person and address has a one to many relationship, using person_code column as the FK.
address_type is an enum, so one person can have many types of addresses. (e.g. residential, billing, permanent)
Address record with each address_type can be updated separately (e.g. billing address can be updated without updating the residential address)
The Problem:
Need to maintain history for "Person" entity as well as "Address" entity
Along with that some mechanism is needed (revision or identifier) that could pinpoint the exact state of a Person and all her Address at that point of time.
Approaches tried:
If we use adjacency list for keeping the history for both the entities, then person.getAddresses() will give a lot of extra (historical) records.
Creating history tables will might increase the complexity of the solution.
If Person and Address tables are versioned separately, then a new table (say State) can be introduced to keep these versions and id of that new table would pinpoint the exact state at a given point of time.
But as the Person-Address relationship is one to many keeping the version of all addresses in a single record seems inappropriate.
Final approach that I've thought of:
Add unique (person_code, revision) to Person
Change FK of Address form (person_code) to (person_code, revision)
On each Person update, duplicate (with new revision) all Address entries for that person_code
Use Person revision everywhere in the system
Can we somehow avoid the unnecessary duplication in step no. 3 ?
Please suggest what other approaches should be explored in this scenario.
Thanks.
Consider
PersonSuper
+++++++++++
personsuper_key
(do not put person attributes in the above table, as weird as that sounds) (the above exists so you can keep referential integrity with address tables)
Person. (<<assumes "non historical)
+++++++++++
person_key
personsuper_key FK
lastname
firstname
PersonArchive
+++++++++++
personarchive_key
personsuper_key FK
person_key (the original value from Person. person_key if you ever need that)
create_datestamp
version_ordinal
lastname
firstname
....
Address. (<<assumes "non historical)
+++++++++++
address_key
personsuper_key FK
AddressArchive
+++++++++++
addressarchive_key
address_key (<< the original Address.address_key, just in case)
personsuper_key FK
create_datestamp
version_ordinal
This will allow referential integrity with your archive (history) tables...and with a child table.
You can create a view to join the Person and PersonArchive tables (if ever needed to see it all together)
Related
I have two tables in my database, Department and Academic_staff. the primary key in the Department table is depId and the primary key in the Academic_staff table is aNo.
Each department is managed by only one member of academic staff, so the relation between the two tables is one-to-one.
I need to record the date when someone of the academic staff starts managing a department, so the relation must have it's own attribute(mStartDate).
How can I implement this new attribute?
At first I was thinking to create a new table with three attributes (depId, aNo, mStartDate) and make two relations between the new table and the other two tables, but I then realized that it's not many-to-many relationship.
So how can I add the attribute mStartDate to the one-to-one relation between the two tables?
There's more than one relation between the two tables, and some of those relations are one-to-many (the department employs more than one academic staff), so I can't merge the two tables.
Your proposed new table (which I shall refer to as DepartmentManagement) could in principle record the history of managers for each department, in which case it would be a many-to-many (temporal) relationship between Department and Academic.
However, if you want to record only the current manager, it's reasonable to "absorb" DepartmentManager into the Department table, giving two columns there (Manager_aNo and Manager_StartDate). Conceptually the object "DepartmentManagement" still exists, but it's absorbed, it doesn't have its own table.
You could also absorb it in the other direction (into Academic) but that wouldn't allow an Academic ever to manage more than one department. You might not need that now, but in principle it's more likely than having a Department with two managers.
Departments
depId
fk_aNo (Unique )
*****- Primary key (depId, fk_aNo)*****
academicStuffs
aNo (PK)
newTable
depId --- Important "Both depId,aNo are from Departments, be sure"
aNo ---
mStartDate
- Primary key (depId, aNo)
Constraints--> a academicStuff can't start to manage a department more
than one time. If this structure proper for you and you want to enable
a aacademic stuff manage a department more than one time inform me.
I edited and remade the ERD. I have a few more questions.
I included participation constraints(between trainee and tutor), cardinality constraints(M means many), weak entities (double line rectangles), weak relationships(double line diamonds), composed attributes, derived attributes (white space with lines circle), and primary keys.
Questions:
Apparently to reduce redundant attributes I should only keep primary keys and descriptive attributes and the other attributes I will remove for simplicity reasons. Which attributes would be redundant in this case? I am thinking start_date, end_date, phone number, and address but that depends on the entity set right? For example the attribute address would be removed from Trainee because we don't really need it?
For the part: "For each trainee we like to store (if any) also previous companies (employers) where they worked, periods of employment: start date and end date."
Isn't "periods of employment: start date, end date" a composed attribute? because the dates are shown with the symbol ":" Also I believe I didn't make an attribute for "where they worked" which is location?
Also how is it possible to show previous companies (employers) when we already have an attribute employers and different start date? Because if you look at the Question Information it states start_date for employer twice and the second time it says start_date and end_date.
I labeled many attributes as primary keys but how am I able to distinguish from derived attribute, primary key, and which attribute would be redundant?
Is there a multivalued attribute in this ERD? Would salary and job held be a multivalued attribute because a employer has many salaries and jobs.
I believe I did the participation constraints (there is one) and cardinality constraints correctly. But there are sentences where for example "An instructor teaches at least a course. Each course is taught by only one instructor"; how can I write the cardinality constraint for this when I don't have a relationship between course and instructor?
Do my relationship names make sense because all I see is "has" maybe I am not correctly naming the actions of the relationships? Also I believe schedules depend on the actual entity so they are weak entities.... so does that make course entity set also a weak entity (I did not label it as weak here)?
For the company address I put a composed attribute, street num, street address, city... would that be correct? Also would street num and street address be primary keys?
Also I added the final mark attribute to courses and course_schedule is this in the right entity set? The statement for this attribute is "Each trainee identified by: unique code, social security number, name, address, a unique telephone number, the courses attended and the final mark for each course."
For this part: "We store in the database all classrooms available on the site" do i make a composed attribute that contains site information?
Question Information:
A trainee may be self-employed or employee in a company
Each trainee identified by:
unique code, social security number, name, address, a unique
telephone number, the courses attended and the final mark for each course.
If the trainee is an employee in a company: store the current company (employer), start date.
For each trainee we like to store (if any) also previous companies (employers) where they worked, periods of employment: start date and end date.
If a trainee is self-employed: store the area of expertise, and title.
For a trainee that works for a company: we store the salary and job
For each company (employer): name (unique), the address, a unique telephone number.
We store in the database all known companies in the
city.
We need also to represent the courses that each trainee is attending.
Each course has a unique code and a title.
For each course we have to store: the classrooms, dates, and times (start time, and duration in minutes) the course is held.
A classroom is characterized by a building name and a room number and the maximum places’ number.
A course is given in at least a classroom, and may be scheduled in many classrooms.
We store in the database all classrooms
available on the site.
We store in the database all courses given at least once in the company.
For each instructor we will store: the social security number, name, and birth date.
An instructor teaches at least a course.
Each course is taught by only one instructor.
All the instructors’ telephone numbers must also be stored (each instructor has at least a telephone number).
A trainee can be a tutor for one or many trainees for a specific
period of time (start date and end date).
For a trainee it is not mandatory to be a tutor, but it is mandatory to have a tutor
The attribute ‘Code’ will be your PK because it’s only use seems to be that of a Unique Identifier.
The relationship ‘is’ will work but having a reference to two tables like that can get messy. Also you have the reference to "Employers" in the Trainee table which is not good practice. They should really be combined. See my helpful hints section to see how to clean that up.
Company looks like the complete table of Companies in the area as your details suggest. This would mean table is fairly static and used as a reference in your other tables. This means that the attribute ‘employer’ in Employed would simply be a Foreign Key reference to the PK of a specific company in Company. You should draw a relationship between those two.
It seems as though when an employee is ‘employed’ they are either an Employee of a company or self-employed.
The address field in Company will be a unique address your current city, yes, as the question states the table is a complete list of companies in the city. However because this is a unique attribute you must have specifics like street address because simply adding the city name will mean all companies will have the same address which is forbidden in an unique field.
Some other helpful hints:
Stay away from adding fields with plurals on them to your diagram. When you have a plural field it often means you need a separate table with a Foreign Key reference to that table. For example in your Table Trainee, you have ‘Employers’. That should be a Employer table with a foreign key reference to the Trainee Code attribute. In the Employer Table you can combine the Self-employed and Employed tables so that there is a single reference from Trainee to Employer.
ERD Link http://www.imagesup.net/?di=1014217878605. Here's a quick ERD I created for you. Note the use of linker tables to prevent Many to Many relationships in the table. It's important to note there are several ways to solve this schema problem but this is just as I saw your problem laid out. The design is intended to help with normalization of the db. That is prevent redundant data in the DB. Hope this helps. Let me know if you need more clarification on the design I provided. It should be fairly self explanatory when comparing your design parameters to it.
Follow Up Questions:
If you are looking to reduce attributes that might be arbitrary perhaps phone_number and address may be ones to eliminate, but start and end dates are good for sorting and archival reasons when determining whether an entry is current or a past record.
Yes, periods_of_employment does not need to be stored as you can derive that information with start and end dates. Where they worked I believe is just meant to say previous employers, so no location but instead it’s meant that you should be able to get a list all the employers the trainee has had. You can get that with the current schema if you query the employer table for all records where trainee code equals requested trainee and sort by start date. The reason it states start_date twice is to let you know that for all ‘previous’ employers the record will have a start and end date. Hence the previous. However, for current employers the employment hasn't ended which means there will be no end_date so it will null. That’s what the problem was stating in my opinion.
To keep it simple PK’s are unique values used to reference a record within another table. Redundant values are values that you essentially don’t need in a table because the same value can be derived by querying another table. In this case most of your attributes are fine except for Final_Mark in the Course table. This is redundant because Course_Schedule will store the Final_Mark that was received. The Course table is meant to simply hold a list of all potential courses to be referenced by Course_Schedule.
There is no multivalued attributes in this design because that is bad practice Job and salary are singular and if and job or salary changes you would add a new record to the employer table not add to that column. Multivalued attributes make querying a db difficult and I would advise against it. That’s why I mentioned earlier to abstract all attributes with plurals into their own tables and use a foreign key reference.
You essentially do have that written here because Course_Schedule is a linker table meaning that it is meant to simplify relationships between tables so you don’t have many to many relationships.
All your relationships look right to me. Also since the schedules are linker tables and cannot exist without the supporting tables you could consider them weak entities. Course in this schema is a defined list of all courses available so can be independent of any other table. This by definition is not a weak entity. When creating this db you’d probably fill in the course table and it probably wouldn’t change after that, except rarely when adding or removing an available course option.
Yes, you can make address a composite attribute, and that would be right in your diagram. To be clear with your use of Primary key, just because an attribute is unique doesn’t make it a primary key. A table can have one and only one primary key so you must pick a column that you are certain will not be repeated. In this example you may think street number might be unique but what if one company leaves an address and another company moves into that spot. That would break that tables primary key. Typically a company name is licensed in a city or state so cannot be repeated. That would be a better choice for your primary key. You can also make composite primary keys, but that is a more advanced topic that I would recommend reading about at a later date.
Take final_mark out of courses. That’s table will contain rows of only courses, those courses won’t be linked to any trainee except by course_schedule table. The Final_Mark should only be in that table. If you add final_mark to Course table then, if you have 10 trainees in a course, You’d have 10 duplicate rows in the course table with only differing final_marks. Instead only hold the course_code and title that way you can assign different instructors, trainees and classrooms using the linker tables.
No composite attribute is needed using this schema. You have a Classroom table that will hold all available classrooms and their relevant information. You then use the Classroom_Schedule linker table to assign a given Classroom to a Course_Schedule. No attributes of Classroom can be broken down to simpler attributes.
I have 2 questions regarding a project. Would appreciate if I get clarifications on that.
I have decomposed Address into individual entities by breaking down to the smallest
unit. Bur addresses are repeated in a few tables. Like Address fields are there in the
Client table as well as Employee table. Should we separate the Address into a separate table with just a linking field
For Example
Create an ADDRESS Table with the following attributes :
Entity_ID ( It could be a employee ID(Home Address) or a client ID(Office Address) )
Unit
Building
Street
Locality
City
State
Country
Zipcode
Remove all the address fields from the Employee table and the Client table
We can obtain the address by getting the employee ID and referring the ADDRESS table for the address
Which approach is better ? Having the address fields in all tables or separate as shown above. Any thoughts on which design in better ?
Ya definitely separating address is better Because people can have multiple addresses so it will be increasing data redundancy.
You can design the database for this problem in two ways according to me.
A. Using one table
Table name --- ADDRESS
Column Names
Serial No. (unique id or primary key)
Client / Employee ID
Address.
B. Using Two tables
Table name --- CLIENT_ADDRESS
Column Names
Serial No. (unique id or primary key)
Client ID (foreign key to client table)
Address.
Table name --- EMPLOYEE_ADDRESS
Column Names
Serial No. (unique id or primary key)
Client ID (foreign key to employee table)
Address.
Definitely you can use as many number of columns instead of address like what you mentioned Unit,Building, Street e.t.c
Also there is one suggestion from my experience
Please add this five Columns in your each and every table.
CREATED_BY (Who has created this row means an user of the application)
CREATED_ON (At what time and date table row was created)
MODIFIED_ON (Who has modified this row means an user of the application)
MODIFIED_BY (At what time and date table row was modified)
DELETE_FLAG (0 -- deleted and 1 -- Active)
The reason for this from point of view of most of the developers is, Your client can any time demand records of any time period. So If you are deleting in reality then it will be a serious situation for you. So every time when a application user deleted an record from gui you have to set the flag as 0 instead of practically deleting it. The default value is 1 which means the row is still active.
At time of retrieval you can select with where condition like this
select * from EMPOLOYEE_TABLE where DELETE_FLAG = 1;
Note : This is an suggestion from my experience. I am not at all enforcing you to adopt this. So please add it according to your requirement.
ALSO tables which don't have any significant purpose doesn't need this.
Separating address into a seperate table is a better design decision as it means any db-side validation logic etc. only needs to be maintained in one place.
I remember when - a long time ago - I was messing around with the Java ActiveObjects ORM, I came across a database pattern it claimed to support.
However, it is very difficult to find the pattern's name, by search for the general idea, thus I would really appreciate it if someone could give me the name of this pattern, and some thoughts on the "cleanness" of using it.
The pattern was defined as such:
Table:
reference_type <enum>
reference <integer>
...
... where the value of the field reference_type would determine the type (and thus the table) to which was being referred. Thus:
User:
location_type <l&l, address, city, country>
location <integer>
...
... where depending on the value of the location_type field, the foreign key location would refer to either the l&l, address, city or country table.
You're having difficulty finding it because it's not a real (in the sense of widely adopted and encouraged) database design pattern.
Stay away from patterns like this. While ORM's make mapping database tables to types easier, tables are not types, and vice versa. While it's not clear what the model you've described is supposed to do, you should not have columns that serve as fake foreign keys to multiple tables (when I say "fake", I mean that you're storing a simple identifier value that corresponds to the primary key of another table, but you can't actually define the column as a foreign key).
Model your database to represent the data, model your objects to represent the process, and use your ORM and intermediate layers to do the translation; don't try to push the database into your code, and don't push your code into the database.
Edit in reponse to comment
You're mixing database and OO terminology; while I'm not familiar with the syntax you're using to define that function, I'm assuming it's an instance function on the User type called getLocation that takes no parameters and returns a Location object. Databases don't support the concepts of instance (or any type-based) functions; relational databases can have user-defined functions, but these are simple procedural functions that take parameters and return either values or result sets. They do not correspond to particular tables or field in any way, other than the fact that you can use them within the body of the function.
That being said, there are two questions to answer here: how to do what you've asked, and what might be a better solution.
For what you've asked, it sounds like you have a supertype-subtype relationship, which is a standard database design pattern. In this case, you have a single supertype table that represents the parent:
Location
---------------
LocationID (PK)
...other common attributes
(Note here that I'm using LocationID for the sake of simplicity; you should have more specific and logical attributes to define the primary key, if possible)
Then you have one or more tables that define subtypes:
Address
-----------
LocationID (PK, FK to Location)
...address-specific attributes
Country
-----------
LocationID (PK, FK to Location)
...country-specific attributes
If a specific instance of Location can only be one of the subtypes, then you should add a discriminator value to the parent table (Location) that indicates which of the subtypes it corresponds to. You can use CHECK constraints to ensure that only valid values are in this field for a given row.
In the end, though, it sounds like you might be better served with a hybrid approach. You're fundamentally representing two different types of locations, from what I can see:
Coordinate-based locations (L&L)
Municipal/Postal/Etc.-based locations (Country, City, Address), and each of these is simply a more specific version of the previous
Given this, a simple model would look like this:
Location
------------
LocationID (PK)
LocationType (non-nullable) ('C' for coordinate, 'P' for postal)
LocationCoordinate
------------------
LocationID (PK; FK to Location)
Latitude (non-nullable)
Longitude (non-nullable)
LocationPostal
------------------
LocationID (PK, FK to Location)
Country (non-nullable)
City (nullable)
Address (nullable)
Now the only problem that remains is that we have nullable columns. If you want to keep your queries simple but take (justified!) flak from people about leaving nullable columns, then you can leave it as-is. If you want to go to what most people would consider a better-designed database, you can move to 6NF for our two nullable columns. Doing this will also have the nice side-effect of giving us a little more control over how these fields are populated without having to do anything extra.
Our two nullable fields are City and Address. I am going to assume that having an Address without a City would be nonsense. In this case, we remove these two attributes from the LocationPostal table and create two more tables:
LocationPostalCity
------------------
LocationID (PK; FK to LocationPostal)
City (non-nullable)
LocationPostalCityAddress
-------------------------
LocationID (PK; FK to LocationPostalCity)
Address (non-nullable)
Seems to me that city and country would be part of the address table, and that L&L wouldn't be mutually exclusive with address (you might have both...), so, why limit yourself like that to one or the other?
Further more, this would prevent the location column from enforcing referential integrity, would it not, since it wouldn't always reference the same table?
For a database assignment I have to model a system for a school. Part of the requirements is to model information for staff, students and parents.
In the UML class diagram I have modelled this as those three classes being subtypes of a person type. This is because they will all require information on, among other things, address data.
My question is: how do I model this in the database (mysql)?
Thoughts so far are as follows:
Create a monolithic person table that contains all the information for each type and will have lots of null values depending on what type is being stored. (I doubt this would go down well with the lecturer unless I argued the case very convincingly).
A person table with three foreign keys which reference the subtypes but two of which will be null - in fact I'm not even sure if that makes sense or is possible?
According to this wikipage about django it's possible to implement the primary key on the subtypes as follows:
"id" integer NOT NULL PRIMARY KEY REFERENCES "supertype" ("id")
Something else I've not thought of...
So for those who have modelled inheritance in a database before; how did you do it? What method do you recommend and why?
Links to articles/blog posts or previous questions are more than welcome.
Thanks for your time!
UPDATE
Alright thanks for the answers everyone. I already had a separate address table so that's not an issue.
Cheers,
Adam
4 tables staff, students, parents and person for the generic stuff.
Staff, students and parents have forign keys that each refer back to Person (not the other way around).
Person has field that identifies what the subclass of this person is (i.e. staff, student or parent).
EDIT:
As pointed out by HLGM, addresses should exist in a seperate table, as any person may have multiple addresses. (However - I'm about to disagree with myself - you may wish to deliberately constrain addresses to one per person, limiting the choices for mailing lists etc).
Well I think all approaches are valid and any lecturer who marks down for shoving it in one table (unless the requirements are specific to say you shouldn't) is removing a viable strategy due to their own personal opinion.
I highly recommend that you check out the documentation on NHibernate as this provides different approaches for performing the above. Which I will now attempt to poorly parrot.
Your options:
1) One table with all the data that has a "delimiter" column. This column states what kind of person the person is. This is viable in simple scenarios and (seriously) high performance where the joins will hurt too much
2) Table per class which will lead to duplication of columns but will avoid joins again, so its simple and a lil faster (although only a lil and indexing mitigates this in most scenarios).
3) "Proper" inheritence. The normalised version. You are almost there but your key is in the wrong place IMO. Your Employee table should contain a PersonId so you can then do:
select employee.id, person.name from employee inner join person on employee.personId = person.personId
To get all the names of employees where name is only specified on the person table.
I would go for #3.
Your goal is to impress a lecturer, not a PM or customer. Academics tend to dislike nulls and might (subconciously) penalise you for using the other methods (which rely on nulls.)
And you don't necessarily need that django extension (PRIMARY KEY ... REFERENCES ...) You could use an ordinary FOREIGN KEY for that.
"So for those who have modelled inheritance in a database before; how did you do it? What method do you recommend and why?
"
Methods 1 and 3 are good. The differences are mostly in what your use cases are.
1) adaptability -- which is easier to change? Several separate tables with FK relations to the parent table.
2) performance -- which requires fewer joins? One single table.
Rats. No design accomplishes both.
Also, there's a third design in addition to your mono-table and FK-to-parent.
Three separate tables with some common columns (usually copy-and-paste of the superclass columns among all subclass tables). This is very flexible and easy to work with. But, it requires a union of the three tables to assemble an overall list.
OO databases go through the same stuff and come up with pretty much the same options.
If the point is to model subclasses in a database, you probably are already thinking along the lines of the solutions I've seen in real OO databases (leaving fields empty).
If not, you might think about creating a system that doesn't use inheritance in this way.
Inheritance should always be used quite sparingly, and this is probably a pretty bad case for it.
A good guideline is to never use inheritance unless you actually have code that does different things to the field of a "Parent" class than to the same field in a "Child" class. If business code in your class doesn't specifically refer to a field, that field absolutely shouldn't cause inheritance.
But again, if you are in school, that may not match what they are trying to teach...
The "correct" answer for the purposes of an assignment is probably #3 :
Person
PersonId Name Address1 Address2 City Country
Student
PersonId StudentId GPA Year ..
Staff
PersonId StaffId Salary ..
Parent
PersonId ParentId ParentType EmergencyContactNumber ..
Where PersonId is always the primary key, and also a foreign key in the last three tables.
I like this approach because it makes it easy to represent the same person having more than one role. A teacher could very well also be a parent, for example.
I suggest five tables
Person
Student
Staff
Parent
Address
WHy - because people can have multiple addesses and people can also have multiple roles and the information you want for staff is different than the information you need to store for parent or student.
Further you may want to store name as last_name, Middle_name, first_name, Name_suffix (like jr.) instead of as just name. Belive me you willwant to be able to search on last_name! Name is not unique, so you will need to make sure you have a unique surrogate primary key.
Please read up about normalization before trying to design a database. Here is a source to start with:
http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx
Super type Person should be created like this:
CREATE TABLE Person(PersonID int primary key, Name varchar ... etc ...)
All Sub types should be created like this:
CREATE TABLE IF NOT EXISTS Staffs(StaffId INT NOT NULL ,
PRIMARY KEY (StaffId) ,
CONSTRAINT FK_StaffId FOREIGN KEY (StaffId) REFERENCES Person(PersonId)
)
CREATE TABLE IF NOT EXISTS Students(StudentId INT NOT NULL ,
PRIMARY KEY (StudentId) ,
CONSTRAINT FK_StudentId FOREIGN KEY (StudentId) REFERENCES Person(PersonId)
)
CREATE TABLE IF NOT EXISTS Parents(PersonID INT NOT NULL ,
PRIMARY KEY (PersonID ) ,
CONSTRAINT FK_PersonID FOREIGN KEY (PersonID ) REFERENCES Person(PersonId)
)
Foreign key in subtypes staffs,students,parents adds two conditions:
Person row cannot be deleted unless corresponding subtype row will
not be deleted. For e.g. if there is one student entry in students
table referring to Person table, without deleting student entry
person entry cannot be deleted, which is very important. If Student
object is created then without deleting Student object we cannot
delete base Person object.
All base types have foreign key "not null" to make sure each base
type will have base type existing always. For e.g. If you create
Student object you must create Person object first.