Fiscal year handling strategies in database design - database

By fiscal year I mean all the data in the database (in all tables) that occurred in the particular year. Lets say that we are building an application that allows user to choose from different years.
What way of implementing this would you prefer, and why:
Separate fiscal year data based on multiple separate database instances (for example, on every fiscal year start you could create a new instance with no data)
Have everything in one database, but with logic that automatically separates records from different years.
Personally, I have "seen" both methods, and I would choose the second. The only argument I can think of for the first method is to have less records in case that these are really big databases - but still, you could "archive" old records by joining them in summaries or by some other way. What do you think?

Separate fiscal year data based on multiple separate database instances (for example, on every fiscal year start you could create a new instance with no data)
No. Do not create a separate database instance, database, or table per fiscal year.
Besides not being normalized, you would be unnecessarily duplicating the supporting infrastructure: constraints, triggers, stored procedures & functions would all have to be updated to work with the new, current fiscal year. Which would also complicate data for future years for budgetting and planning.
Have everything in one database, but with logic that automatically separates records from different years.
There's no need for separation, just make sure that records contain a timestamp, which can then be used to determine what fiscal year it took place in.

There is a third alternative.
Create a table, let's call it "Almanac", that has one row per day, keyed by date.
In that table you can have a whole lot of attributes that are determined by the date.
Among them could be some attributes for which there is a function, like the day of the week. Some attributes could be company specific, like whether or not the day is a workday at the company.
Among the attributes could be the fiscal year, the fiscal quarter, and the fiscal month, if your company has such things. It's not particularly important to normalize this table.
Write a program that populates this table. All the convoluted logic that goes into calculating the fiscal year from the date can thus be in one place, instead of scattered through out your system. Ten years worth of dates is only going to be about 3,650 rows, a tiny table by today's standards.
Then, cutting all of your date driven data by fiscal year, fiscal quarter, or whatever is just a matter of joining and grouping. You can even automate the production of different time frame views of the same data.
I've done this and it works. It's especially good in reporting databases and data warehouses.

No need for duplication. A time-stamp may be good enough, but to borrow from data-warehousing, you could create a "date dimension". It is a table with a row per a day and a column per date attribute. Some of those columns may be fiscal year, fiscal quarter etc. Then you add a DateKey to the transactions table and join the date dimension when querying.
Something like:
select sum(t.Total)
from Transactions as t
join dimDate as d on d.DateKey = t.DateKey
where d.FiscalYearQuarter = 'F2009-Q3';
The date dimension table may look something like:
CREATE TABLE dimDate
(
DateKey int -- 20090814
,FullDate date -- 2009-8-14
,FullDateDescription varchar(50) -- Friday August 14, 2009
,SQLDateStamp varchar(10) -- 2009-08-14
,DayOfWeek varchar(10) -- Friday
,DayNumberInWeek int -- 6
,DayNumberInMonth int -- 14
,DayNumberInYear int -- 226
-- many more here
,FiscalYear int -- 2009
,FiscalQuarter char(3) -- FQ3
,FiscalHalf char(3) -- FH2
,FiscalYearQuarter varchar(8) -- F2009-Q3
,FiscalYearHalf varchar(8) -- F2009-H2
);
You would pre-load the dimDate, from way back in past to forward in future; 100 years requires 36.5k rows -- not much for any DB.

Each entity should have its fiscal year as part of the metadata/staticdata.
From that you can easily handle the fiscal year breaks, and typically Databases can handle VERY LARGE amounts of data, so you should not have a problem.
Using the correct indexing will greatly inprove the performance of your queries, so worry about the performance once you hit the snag. Until then, worry about the code

Related

SQL Schema - merge data from two tables

I'm trying to create a schema that will allow me to define times, when a supplier website is non operational (planned not unplanned).
I've gone for non-operational as opposed to operational because many suppliers work 24/7, so non-operting times represent the least number of rows.
For example, a supplier might not work:
On a Sunday
On a recognised holiday date - '1/1/2015'
On a Saturday after 5pm
I'm not overly confident with SQL Server, but have come up with a schema that 'does the job'. However, as we all know, there are good ways, not so good ways, and bad ways, that all work in a fashion, so would appreciate comments and advice on what I have to date.
One of the key features is to use data from WorkingDays and Holidays together to represent a WorkingPeriod entity.
I would appreciate coments no matter how small.
Holiday
Contains all recognised holidays - Easter Monday, Good Friday etc.
HolidayDate
Contains dates of holidays. For instance, this year Easter Monday is 6th Apr 2015.
WorkingDay
Sunday through to Monday, mapped to Asp.Net day of week enums.
WorkingPeriodType
A lookup table containing 2 rows - Holiday, or Day of Week
WorkingPeriod
Merges the Holiday table and the WorkingDay table to represent a single WorkingPeriod entity that can be used in the SupplierNonWorkingTimes table.
SupplierNonWorkingTimes
Contains the ID representing the WorkingDay/Holiday and the times of non- operation.
This is a very subjective question, as you've already observed there's no right and wrong, just different ways. I'm a database guy but I don't know your specific circumstances, so this is just some observations - you'll have to judge for yourself whether any of them are appropriate to you.
I like my naming to be crystal clear, it saves all the
misunderstanding by other people later on. If [WorkingDay] holds the
7 days of the week I would call it [WeekDay]. If you intend
[Holiday] to hold whole-day holidays I would call it [HolidayDay].
The main table [SupplierNonWorkingTime] is about 'non-working' so I
would call the [WorkingPeriod] table [NonWorkingPeriod]. The term
'period' always refers to a whole day, so I would replace 'period'
with 'day' (let's ignore start/stop time for now).
My first impression was that your design is over-normalised. The
[WorkingPeriodType] table has 2 rows that will never change,
[WorkingDay] has 7. For these very low numbers I sometimes prefer a
char(1) with a check constraint. Normalisation is generally good,
but lots of JOINs for trivial queries is not so good. You could
eliminate [WorkingPeriodType] and [WorkingDay] but you've mentioned
.Net enums in your question so if you've got some sort of ORM in
your .Net code this level of normalisation might be right for you.
I'd add a Year field to the [HolidayDate] table, then the PK
becomes a better HolidayID+Year - unless you know somewhere that has
lots of Christmas' :)
I'd add an IsAllDay field to the [SupplierNonWorkingTime] table,
otherwise you have to use 'magic values' to represent 'all day' and
magic values are bad. There should be a check constraint to enforce
start/stop times can only be entered if IsAllDay = false.
Like I said, just my thoughts, hope it's helpful.

Can a Accumulating snapshot table has multiple dates in it?

I am trying to make sense of dimension modeling. While reading a dimension modeling book, I have created a star schema.
The fact table is a Accumulating snapshot table and it has multiple date columns which are linked to a date dimension using a surrogate key.
FactApplicants
{
Interview_No_Show_Date_Key (FK)
Cancel_Date_Key (FK)
Interviewed_Date_Key (FK)
. ....
Applicant_Key(FK)
InquiryCount int
}
DimDate
{
Date_Key (PK, int),
FullDateUSA (char(10))
Date (datetime)
}
I do have a well defined process for which i am trying to make this star schema for. I have a date field in the fact table for each of this step as I need to prepare funnel like report and activity reports. So the question really is
Is this correct? can a fact table refer to same date dimension table multiple times?
The examples I am seeing all over the internet seems to indicate this is correct but i am having hard time making it work with Pentaho reporting. so I am not sure if its a design problem or its something i am not doing correctly in Pentaho
Yes it is correct to refer to the date dimension multiple times
Yes, a fact can refer to the same dimension multiple times. However, given only what I see in your example, I am not sure why you need the date dimension. The date in applicants is just a date and can be used as an attribute without referring to a separate date dimension. It's just the attribute "date". You would need a separate date dimension if, for example, (1) you want to ensure that only valid dates are used, or (2) you want to elevate date to a full calendar in which other attributes are used to describe a date, such as day of the week, weekday/weekend, holiday, etc. or (3) you want to rollup date to other levels, such as week, month, year.

Implementing Date Range in OLAP systems

Please bear with me if this is a trivial question,I am a new bee
I am in the design phase of a OLAP system where i need to show cost for a date range.
I have three other dimension like product,vendor and language.
Should I add date as one more dimension??
My queries are mostly cost on a date range like from 5-11-1997 to 01-09-2-13
Which is the best way to do it.
You do need to add a Time Dimension. If all the Date/Time facts are just Dates (no Time part as in the example range) then you need to create a table/view which consists of a row for each Date in the domain range.
This table can also have extra fields for things like week, month, quarter, season, year that your users may be interested in querying. (If there are none of these, then just have one column with the date.)
You would need to tell the OLAP data model that this date column in the Time table is the PK, and that the dates in other tables are FK's to it. The OLAP engine will then allow this new Time Demension to be used is queries just like any other dimensions.

How to design table of schedule with start date and end date and year

I have a table training_schedule where it has data start_date, end_date, year, course, title etc.
my current design is
course
title
start_date (date)
end_date (date)
year_id (fk)
now my app, I needs to fetch schedules base from year or month.
(I've added year table because it is used on different parts like when we are making campaign on specific year.)
I think this design is not good because I am just ignoring the year on start_date and end_date. I am thinking about these options:
a) remove year and base the year on the start_date
b) create extra column start_month, start_day, end_day, end_month, year
or
can someone suggest a better design? also is it good to create tables for year and month? use join statement to get the name of month?
thanks in advance
Splitting the dates out into component parts is usually a bad route to go down. As soon as you want to do date range queries, you need to reassemble them back into a date.
If you're worried about the duplication, you could make the year column a computed column (or whatever the moral equivalent term is in your database system) which is automatically extracted from the start_date column - that way you know it's always correct.
So I would go (a) with a computed column.

storing weekly targets in database

i have the following requirement
Sales Officer: Bob
Week1 Week2 Week3 ................. Week52
Prod1 10 15 12 ................. 14
Prod2 20 14 10 ................. 17
. .
. .
. .
Sales supervisor will set the targets for each sales officer on weekly basis.
Sales officer may enter actual sales on daily basis for each product through a similar grid against the set targets e.g.
Edit
In the above case Supervisor has set target of 10 units for week 1 Now the sales Officer will enter the sales on daily basis as 1,2,0,1,3,2=9(Actual Sale for Week 1) so against the target of 10 unit he has sold 9 units in week one.
I have already created Employee and Product tables. Can any one guide about the best practice about how to store days and weeks in database against which the targets are stored and actual sales can be recorded.
I am thinking storing data in following table
EmpSales (EmployeeID,ProductID,SaleTarget,Actual Sale,Date,WeekNo,Month)
Thanks in advance
This one is really easy in pure Relational modelling terms. I do not see the need for "denormalisation" of any kind.
Sales Data Model.
If you are unfamiliar with the Standard for modelling Relational databases, the IDEF1X Notation may be helpful.
Pure 5NF; full Declarative referential Integrity; no Nulls, no Update Anomalies; no GROUP BYs; pure Date arithmetic.
The SaleTarget is compared against SaleActual by projection, and may be in the same result set.
If you have Monthly and Annual Sales accounting, the extension required is a common calendar table with a bit of control or structure; eg. similar to Week, including rows for each Month and Year. Just let me know, and I will update the model.
I say 5NF because that is the minimum I provide in order to eliminate Update Anomalies, and most modellers are familiar with it. But if it does not scare you off, the two Sales tables are actually Sixth Normal Form.
This allows full Pivoting (weeks or months across the top; Products or Employees down the side; vice versa; any combination) without temporary tables or complex SQL. (Just ask.)
I think it may even be self-explanatory, but I will supply the Verb Phrases which spell out the Business Rules, only because there are three Parents involved in each:
Each Employee is scheduled SaleTarget of Product for Week
Each Product is scheduled SaleTarget By Employee for Week
Each Employee did SaleActual of Product on Day
Each Product did SaleActual by Employee on Day
Comparison
I should have mentioned. Notice there is no vertical (rows) or horizontal (columns) duplication. When columns are duplicated eg, StartDate and EndDate, you have broken 3NF (introduced Functional Dependencies), and introduced an Update Anomaly. The EndDate in any row, is the StartDate in the next row (that, minus 1 second counts as a dupe, is a contrivance); when updating, now two rows instead of one have to be changed. More important, this structure is so simple (it is not a Time Series, or "temporal" requirement), the EndDate is not required.
Response to Comments
The Data Model has been updated to include Month and Year requirements. You now need a Check Constraint on SaleTarget to ensure that DateType is W for week. Loading the Date table is simple, you do not need the nonsense code (manually repeated cut-and-paste) that is posted on SQLTeam; they are famous for being stupid and sub-standard.
The SaleActual table now contains Daily, Weekly, Monthly, and Annual values. Which of course, you summarise programmatically on the first day of each Week, Month, Day. First add the new row to Date.
5NF is prety much the minimum required for standard compliance these days, so you need to get used to it. Basically there was a lot of argument among the academics (plus places like Wikipedia posting completely incorrect entries) of the NFs between 3NF and 5NF. The short and sweet definition of 5NF is that it is what 3NF was intended to be, with zero data duplication, zero Update Anomalies (no duplicated columns to be updated transactionally).
Forget about 6NF for now. Any table that is in 6NF, is in 5NF (and 4NF and BCNF and 3NF). Just treat the two Sales tables as 5NF. When you have to write a pivoted report, say an year from now, that's when you will realise the value of this structure.
I personally would store the targets and actuals in separate rows, and most probably in separate tables:
Targets:
EmployeeId, PeriodId, ProductId, TargetValue
Sales:
EmployeeId, PeriodId, ProductId, SalesValue
In fact, in an integrated system, the second table is usually unnecessary (assuming that you have a complete sales recording system, this should be a projection/view of the actual recorded sales - with appropriate assignment of employee, period and product based on the model of that subsystem).
In order to fit your calendar requirements, I would almost certainly have a date table which will allow you to ensure all your various business rules for definitions of weeks and months without complex date logic. Determining periods and aggregating is then just facilitated with joins to the calendar table.
So the ActualSales would look something like this (with just a generic Period table, which might itself be a period and date table):
SELECT sp.EmployeeId
, p.ProductId
, pd.PeriodType
, pd.PeriodId
, SUM(id.Quantity * id.UnitProce) AS TotalSales
FROM Invoice AS i
INNER JOIN InvoiceDetail AS id
ON id.InvoiceId = i.InvoiceId
INNER JOIN Employee AS sp
ON sp.EmployeeId = i.SalesPersonId
INNER JOIN Product AS p
ON id.ProductId = p.ProductId
INNER JOIN Period AS pd
ON pd.StartDate <= i.InvoiceDate
AND pd.EndDate > i.InvoiceDate
GROUP BY sp.EmployeeId, p.ProductId, pd.PeriodType, pd.PeriodId
In this case, data would be duplicated if you had overlapping periods (like daily, weekly, monthly), so you would need to aggregate ONLY one type of period - that's why I've specifically included it in this example view although it's redundant here.
I expect a generic Period table would look like:
PeriodId
PeriodType
StartDate
EndDate
This would be prepopulated with the various periods you want to report on:
'Q', 1/1/2010, 4/1/2010
'M', 1/1/2010, 2/1/2010
'M', 2/1/2010, 3/1/2010
'M', 3/1/2010, 4/1/2010
'W', 1/3/2010, 1/10/2010
'W', 1/10/2010, 1/17/2010
etc.
'D', 1/1/2010, 1/2/2010
'D', 1/2/2010, 1/3/2010
etc.
It makes very little sense to worry about holidays except that you probably aren't going to assign a target if they aren't working and this is mainly about managing the assignments so that they are presumably realistic. You can have a calendar table of days with various flags
Calendar
DateId
Date
IsHoliday
Then you can include that when you join to count the number of holidays/weekends in a period etc.
This is typically an accounting/business thing, but you may want to look into standardizing your calendar. For instance, in media buys for TV advertising, they make each "quarter" equal and make each "month" standardized - 4 weeks, 4 weeks, 5 weeks. Obviously they make exceptions for holiday and special TV events, but this helps to smooth out the accounting and compare like periods more easily.
Personally I would go for a more generic "period" table.
period(periodId,startDate,endDate,weekNo,Month,Year)
and then add
empSales(EmployeeId,ProductId,SaleTarget,ActualSale,periodId).
This is a bit more flexible (you can easily introduce different time spans and either make the relative week field null, or define some rule that maps the period on a "standard" week), there is less redundancy (note how the month and week have been moved away from the empSales table) and it allows you to do reporting and calculation (btw, you didn't include a Year field, is there a reason?).
Tallying up stuff should be easier, because assuming you have sales sorted by day, summing these up between intervals is easier unless you want to duplicate the "week" field all over the DB.
Note also that you can easily have targets on different, overlapping periods.
Example, you can set a weekly target for week 22-28 November (I am using the European convention of having the week start on monday) and have a special one-day period set on Black Friday
So:
Period:
periodId|startDate |endDate |weekNo|Month | Year|
0030020|22-NOV-2010|28-NOV-2010| 43 |November | 2010 |
0030026|26-NOV-2010|26-NOV-2010| null |November | 2010 |
empSales:
EmployeeId|ProductId|SaleTarget|ActualSale|periodId|
567689| 788585| 58 | 42 | 0030020|
567689| 788585| 28 | 32 | 0030026|
Note how Employee 567689 missed his weekly target but managed to go over his Black Friday target.
Btw, while working on this example I think you better drop the "empSales" table, renaming it to "empTargets":
empTargets(EmployeeId,ProductId,SaleTarget,periodId).
because the Actual Sales is easily calculated on the fly either with a UDF or placed in a view - after all, it's just a
select sum(items_sold)
from sales
where sales.employeeId = empTargets.employeeId and
sales.ProductId= empTargets.ProductId and
sales.saleDate between empTargets.startDate and
empTargets.endDate)
so no need to store it directly in the table (in fact it could become a burden in case of returned items or other future corrections).

Resources