Many employees have many courses which have expiry dates

Many employees have many courses which have expiry dates - database

I'm looking for the best way to store this information. Not every course has an expiry date.
The easiest way I've found so far is:
tblEmployee
-----------
ID (pk)
Expiry1
Expiry2
tblCourseCatalog
----------------
CourseID(pk)
Name
For every course in tblCourseCatalog, a new Expiry is created in tblEmployee to match tblCourseCatalog.CourseID.
I tried to have:
tblCourseExpiryDates
--------------------
EmployeeID (pk) 1:1 with tblEmployee.ID
FirstAid
UnderWaterBasketWeaving
Anytime a new course was added to tblCourseCatalog, a new column was added to tblCourseExpiryDates to match. This became tricky when trying to query some info. Does my current way (Expiry in tblEmployee) change things much from having tblCourseExpiryDates? to me, having a Expiry2 column is a waste if tblCourseCatalog.CourseID=2 (UnderWaterBasketWeaving) does not expire.

The standard normalised way to store something like this is to have a table where every row looks at just one course employee combination, holds any data that is specific to just that combination, and is usually called something like EmployeeCourse (or CourseEmployee)
EmployeeID
CourseID
ExpiryDate
You only put records in this table where you actually have a date that is valid. For a course that has no dates, no employee would ever get a record. If a given employee has never done a course, they get not record. If you want to remove a date, you can remove the record, or just remove the date but leave the record (I'd probably remove it). If you want to add a new course, you just put a record into the Course table and you're done - you don't have to change anything else.
When you need to look up the record, you need to join to the table with an outer join, so that you get any records in the main table that have no CourseEmployee record.
The downside of this normalised data is that it does make it harder to get a list of all the expiry dates for an employee in one row of output - this is where pivot tables come in (and I'm not sure how they work in access).
If you want to read more about this, look up database normalisation.

Related

Avoid duplicating fields across multiple tables

Let me describe briefly the table structures:
Customer Table
id | name | address_line_one | address_line_two | contact_no_one
SaleInvoice Table
id | id_Customer (Foreign Key) | invoice_no
If I have to print a Sale invoice, I have to use the Customer information (like name, address) from the Customer table.
Assume that after a year, some customer data changes (like name or address), and I update the new data in my customer table. Now, if the customer asks for an old invoice, it will be printed with the new customer data which shall be legally wrong.
Does that mean, I have to create
name_customer
address_line_one_customer
...
and all these fields in the Sale Invoice table too?
If yes, is there a better way to get data from these fields in Customer table to the Sale Invoice table then to write a SQL query to get the values and then set the values?

This is really up to you. In some cases, where it is a legal document, you will save all the details so that you can always bring it up the way it was created. Alternatively if you are producing pdf invoices then save them to be 100% sure.
The other alternative is to create a CustomerHistory table, so that past versions are always saved with a date range, so that you can go back to the old version.
It depends on the use cases, but those are your main options.

It sounds like a problem easily solved by placing the Employee table in version normal form (VNF). This is actually just a flavor of 2nf but done in a way that provides the ability to query current data and past data using the same query.
A datetime parameter is used to provide the distinction. When the value is set to NOW, the current data is returned. When the value is set to a specific datetime value in the past, the data that was current at that date and time is returned.
A brief discussion of the particulars can be found here. That answer also contains links to more information if you think it is something that would work for you.

Keep referenced field data changes

I have a table Salary with a column PersonalId and a table Person with a column Name.
In the first table salary data will saved with a PersonalId which relates it to the Person table. In salary bill all data will gather together and Person name will be referenced from Person table.
After 1 year a specific person name will change from Michael to Maic. Now I want the last year salaries bill remain with previous person name Michael and the new salaries bill generate by new name Maic.
How we can do that?

It could depend on what type of operation you need to to most and on how much people change their name, because the number of joins you may need to make could vary a lot.
keep a field in Person that points to the next Person which is a change of name
keep another key in Person that varies only for the physical person
keep a limited number of names in Person that someone could dispose of, with an index of the current name
in another table you keep the relations between the various name of the Person
It could depend on what rules of normalization you follow, for now I'm not thinking about that.
Anyway, with the first case you don't need to change Salary, but to reconstruct the identity of a Person you need multiple requests or at least a stored procedure.
In the second case you still don't need to change Salary because you add a field to Person, but to get all the Salary entries for that physical person you'll need some work, again probably a stored procedure to get the added field and then something that joins all the Salary entries.
The third maybe is the simplest, but also the limited one, and you need in Salary another field that tells the index of the name to use in that entry.
The last case gives you a stable identity, but it may need some work because of the added table, and still there are multiple implementations. You could have salary reference that table instead of Person, or you could consult that table only when you need all the data, but you cannot reference its primary key from Salary because it would not permit to discriminate the name.

Lunadir's right in a certain way -- but all of those approaches are complex, and of rather great difficulty.
The other way -- simpler, and perhaps more correct & robust -- is to keep NAME and PAID_DATE columns in Salary or SalaryPaid, and write the actual name & date paid at the time the payment is made.
Good old batch-processing style -- and it has the benefit of actually capturing the key financial facts, of what payment was made & what name it was made to, which are the actual auditable transaction history.
Do you pay each Salary entry individually, or in bunch (PaySlip or SalaryPaid)? Put the NAME column wherever you record the actual payment & timestamp it occurred.

Employee table (Master and detail table)

I am wondering if it is okay to have master and detail table for employees?
As per requirments, data can filtered by department by country and by employee code on report level.
If employee's department or country code is changed then the changes will go in detail table and old record will be set to IS_ACTIVE = 'T'.
---------------------Master Table--------------------------------------
**EMPLOYEE_CODE** VARCHAR2(20 BYTE) NOT NULL,
EMAIL VARCHAR2(100 BYTE)
FIRST_NAME VARCHAR2(50 BYTE)
LAST_NAME VARCHAR2(50 BYTE)
WORKING_HOURS NUMBER
---------------------Detail Table--------------------------------------
**PK_USER_DETAIL_ID** NUMBER,
FK_EMPLOYEE_CODE VARCHAR2(20 BYTE),
FK_GROUP NUMBER,
FK_DEPARTMENT_CODE NUMBER,
FK_EMPLOYER_COUNTRY_CODE VARCHAR2(5 BYTE),
FK_MANAGER_ID VARCHAR2(20 BYTE),
FK_ROLE_CODE VARCHAR2(6 BYTE),
START_DATE DATE,
END_DATE DATE,
IS_ACTIVE VARCHAR2(1 BYTE),
INACTIVE_DATE DATE
Employee table will be linked with Timesheet table and for timesheet reports data can be filtered by department, country and by employee code.
OPTION : I
Have one employee table with one Primary Key and create a new entry whenever department or role is updated for an employee.
Add country and department code in the timesheet table.
--> This way i don't need to search employee table.
OPTION : II
Have master and detail table.
Add country and department code in timesheet table.
--> This way i don't need to search employee table plus i will have master detail table
OPTION : NEW
Have master and detail table.
Timesheet table will have EmpCode.
If user move to new location or change department then Insert a new row in the detail table with the new dept Code and same Emp No.
Update an old row and set the End Date field so if he changes his location or department then the End Date field needs to be updated.
Which one is a best option and is there any other better option available?

This is one way of implementing this requirement, and it's an approach many people take. However, it has on emajor drawback: every time you query the current employee status you need to filter the details on start and end date. This may seem like a trivial thing, but you wouldn't believe how much confusion it can cause, and it has performance implications too.
These things matter, because most of the time you will want only the current details, with queries on history being a relatively rare occurence. Consequently you are hampering the implementation of your most common use case to make it easier to implement a less-used one. (Of course I am making assumptions about your business requirements, and perhaps yours is not a run-of-the-mill employee application...)
The better solution would be to have two tables, an EMPLOYEES table with all the detail columns too and an EMPLOYEES_HISTORY table with the same columns plus the start and end date. When you change an employee's record insert a copy of the old record in the History table, probably by a trigger. Your standard processes have just the one table to query, and your history needs are met fully.
By the way, your proposed data model is wrong. Working_hours, email_address and last_name are definitely things which can change and perhaps even first name (e.g. through changes in personal circumstances such as getting married). So all those columns should be held in your details name

Option 3 - Please note that this option is useful only for the reports Point of View.
Whenever you insert the data, create a De-Normalized entry in a new table.
Whenever an entry will be updated, the De-Normalized entry will be updated in the new table.
The New Table will have all De-Normalized columns of Employee.
So while Performing the search, this will benefit you as the results will be calculated without using Joins. Thus, the access time will be reduced.
Records in the new table will be Created/Updated in The Insert/Update Trigger.
Improvements in Option - 2 and Option 1
Don't create redundancy by adding duplicate columns.

Invoice database design

The invoice database design, might look something like this...
http://www.databaseanswers.org/data_models/invoices_and_payments/index.htm
Now If the user decides to change/revise the product code/description
It will change the previous order and invoice produce code/description :(
What do you do? Copy the product code description to the invoice table instead?

You basically have two options:
either you make your Products table "time-enabled" (also known as "temporal database"), e.g. you keep the "previous" state of your individual product in your table, and you give every entry a ValidFrom / ValidTo pair of dates. That way, if you change your product, you get a new entry, and the previous one remains untouched, referenced from those invoices that used it; only the ValidTo date for the product gets updated
or:
you could copy the products (at least those bits you need for your invoice) to the invoice - that'll make sure you always know what the product looked like when you created the invoice - but this will cause lots of data duplication (not recommended)
See this other Stackoverflow question on temporal databases as another input, and also check out this article on Simple-Talk: Database Design: A Point in Time Architecture

Where should I break up my user records to keep track of revisions

I am putting together a staff database and I need to be able to revise the staff member information, but also keep track of all the revisions. How should I structure the database so that I can have multiple revisions of the same user data but be able to query against the most recent revision? I am looking at information that changes rarely, like Last Name, but that I will need to be able to query for out of date values. So if Jenny Smith changes her name to Jenny James I need to be able to find the user's current information when I search against her old name.
I assume that I will need at least 2 tables, one that contains the uid and another that contains the revisions. Then I would join them and query against the most recent revision. But should I break it out even further, depending on how often the data changes or the type of data? I am looking at about 40 fields per record and only one or two fields will probably change per update. Also I cannot remove any data from the database, I need to be able to look back on all previous records.

A simple way of doing this is to add a deleted flag and instead of updating records you set the deleted flag on the existing record and insert a new record.
You can of course also write the existing record to an archive table, if you prefer. But if changes are infrequent and the table is not big I would not bother.
To get the active record, query with 'where deleted = 0', the speed impact will be minimal when there is an index on this field.
Typically this is augmented with some other fields like a revision number, when the record was last updated, and who updated it. The revision number is very useful to get the previous versions and also to do optimistic locking. The 'who updated this last and when' questions usually come once the system is running instead of during requirements gathering, and are useful fields to put in any table containing 'master' data.

I would use the separate table because then you can have a unique identifier that points to all the other child records that is also the PK of the table which I think makes it less likely you will have data integrity issues. For instance, you have Mary Jones who has records in the address table and the email table and performance evaluation table, etc. If you add a change record to the main table, how are you going to relink all the existing information? With a separate history table, it isn't a problem.
With a deleted field in one table, you then have to have an non-autogenerated person id and an autogenrated recordid.
You also have the possiblity of people forgetting to use the where deleted = 0 where clause that is needed for almost every query. (If you do use the deleted flag field, do yourself a favor and set a view with the where deleted = 0 and require developers to use the view in queries not the orginal table.)
With the deleted flag field you will also need a trigger to ensure one and only one record is marked as active.

#Peter Tillemans' suggestion is a common way to accomplish what you're asking for. But I don't like it.
The structure of a database should reflect the real-world facts that are being modeled.
I would create a separate table for obsolete_employee, and just store the historical information that would need to be searched in the future. This way you can keep your real employee data table clean and keep only the old data that is necessary. This approach will also simplify reporting and other features of the application that are not related to searching historical data.
Just think of that warm feeling you'll get when you type select * from employee and nothing but current, correct goodness comes flowing back!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight