storing weekly targets in database - sql-server

i have the following requirement
Sales Officer: Bob
Week1 Week2 Week3 ................. Week52
Prod1 10 15 12 ................. 14
Prod2 20 14 10 ................. 17
. .
. .
. .
Sales supervisor will set the targets for each sales officer on weekly basis.
Sales officer may enter actual sales on daily basis for each product through a similar grid against the set targets e.g.
Edit
In the above case Supervisor has set target of 10 units for week 1 Now the sales Officer will enter the sales on daily basis as 1,2,0,1,3,2=9(Actual Sale for Week 1) so against the target of 10 unit he has sold 9 units in week one.
I have already created Employee and Product tables. Can any one guide about the best practice about how to store days and weeks in database against which the targets are stored and actual sales can be recorded.
I am thinking storing data in following table
EmpSales (EmployeeID,ProductID,SaleTarget,Actual Sale,Date,WeekNo,Month)
Thanks in advance

This one is really easy in pure Relational modelling terms. I do not see the need for "denormalisation" of any kind.
Sales Data Model.
If you are unfamiliar with the Standard for modelling Relational databases, the IDEF1X Notation may be helpful.
Pure 5NF; full Declarative referential Integrity; no Nulls, no Update Anomalies; no GROUP BYs; pure Date arithmetic.
The SaleTarget is compared against SaleActual by projection, and may be in the same result set.
If you have Monthly and Annual Sales accounting, the extension required is a common calendar table with a bit of control or structure; eg. similar to Week, including rows for each Month and Year. Just let me know, and I will update the model.
I say 5NF because that is the minimum I provide in order to eliminate Update Anomalies, and most modellers are familiar with it. But if it does not scare you off, the two Sales tables are actually Sixth Normal Form.
This allows full Pivoting (weeks or months across the top; Products or Employees down the side; vice versa; any combination) without temporary tables or complex SQL. (Just ask.)
I think it may even be self-explanatory, but I will supply the Verb Phrases which spell out the Business Rules, only because there are three Parents involved in each:
Each Employee is scheduled SaleTarget of Product for Week
Each Product is scheduled SaleTarget By Employee for Week
Each Employee did SaleActual of Product on Day
Each Product did SaleActual by Employee on Day
Comparison
I should have mentioned. Notice there is no vertical (rows) or horizontal (columns) duplication. When columns are duplicated eg, StartDate and EndDate, you have broken 3NF (introduced Functional Dependencies), and introduced an Update Anomaly. The EndDate in any row, is the StartDate in the next row (that, minus 1 second counts as a dupe, is a contrivance); when updating, now two rows instead of one have to be changed. More important, this structure is so simple (it is not a Time Series, or "temporal" requirement), the EndDate is not required.
Response to Comments
The Data Model has been updated to include Month and Year requirements. You now need a Check Constraint on SaleTarget to ensure that DateType is W for week. Loading the Date table is simple, you do not need the nonsense code (manually repeated cut-and-paste) that is posted on SQLTeam; they are famous for being stupid and sub-standard.
The SaleActual table now contains Daily, Weekly, Monthly, and Annual values. Which of course, you summarise programmatically on the first day of each Week, Month, Day. First add the new row to Date.
5NF is prety much the minimum required for standard compliance these days, so you need to get used to it. Basically there was a lot of argument among the academics (plus places like Wikipedia posting completely incorrect entries) of the NFs between 3NF and 5NF. The short and sweet definition of 5NF is that it is what 3NF was intended to be, with zero data duplication, zero Update Anomalies (no duplicated columns to be updated transactionally).
Forget about 6NF for now. Any table that is in 6NF, is in 5NF (and 4NF and BCNF and 3NF). Just treat the two Sales tables as 5NF. When you have to write a pivoted report, say an year from now, that's when you will realise the value of this structure.

I personally would store the targets and actuals in separate rows, and most probably in separate tables:
Targets:
EmployeeId, PeriodId, ProductId, TargetValue
Sales:
EmployeeId, PeriodId, ProductId, SalesValue
In fact, in an integrated system, the second table is usually unnecessary (assuming that you have a complete sales recording system, this should be a projection/view of the actual recorded sales - with appropriate assignment of employee, period and product based on the model of that subsystem).
In order to fit your calendar requirements, I would almost certainly have a date table which will allow you to ensure all your various business rules for definitions of weeks and months without complex date logic. Determining periods and aggregating is then just facilitated with joins to the calendar table.
So the ActualSales would look something like this (with just a generic Period table, which might itself be a period and date table):
SELECT sp.EmployeeId
, p.ProductId
, pd.PeriodType
, pd.PeriodId
, SUM(id.Quantity * id.UnitProce) AS TotalSales
FROM Invoice AS i
INNER JOIN InvoiceDetail AS id
ON id.InvoiceId = i.InvoiceId
INNER JOIN Employee AS sp
ON sp.EmployeeId = i.SalesPersonId
INNER JOIN Product AS p
ON id.ProductId = p.ProductId
INNER JOIN Period AS pd
ON pd.StartDate <= i.InvoiceDate
AND pd.EndDate > i.InvoiceDate
GROUP BY sp.EmployeeId, p.ProductId, pd.PeriodType, pd.PeriodId
In this case, data would be duplicated if you had overlapping periods (like daily, weekly, monthly), so you would need to aggregate ONLY one type of period - that's why I've specifically included it in this example view although it's redundant here.
I expect a generic Period table would look like:
PeriodId
PeriodType
StartDate
EndDate
This would be prepopulated with the various periods you want to report on:
'Q', 1/1/2010, 4/1/2010
'M', 1/1/2010, 2/1/2010
'M', 2/1/2010, 3/1/2010
'M', 3/1/2010, 4/1/2010
'W', 1/3/2010, 1/10/2010
'W', 1/10/2010, 1/17/2010
etc.
'D', 1/1/2010, 1/2/2010
'D', 1/2/2010, 1/3/2010
etc.
It makes very little sense to worry about holidays except that you probably aren't going to assign a target if they aren't working and this is mainly about managing the assignments so that they are presumably realistic. You can have a calendar table of days with various flags
Calendar
DateId
Date
IsHoliday
Then you can include that when you join to count the number of holidays/weekends in a period etc.
This is typically an accounting/business thing, but you may want to look into standardizing your calendar. For instance, in media buys for TV advertising, they make each "quarter" equal and make each "month" standardized - 4 weeks, 4 weeks, 5 weeks. Obviously they make exceptions for holiday and special TV events, but this helps to smooth out the accounting and compare like periods more easily.

Personally I would go for a more generic "period" table.
period(periodId,startDate,endDate,weekNo,Month,Year)
and then add
empSales(EmployeeId,ProductId,SaleTarget,ActualSale,periodId).
This is a bit more flexible (you can easily introduce different time spans and either make the relative week field null, or define some rule that maps the period on a "standard" week), there is less redundancy (note how the month and week have been moved away from the empSales table) and it allows you to do reporting and calculation (btw, you didn't include a Year field, is there a reason?).
Tallying up stuff should be easier, because assuming you have sales sorted by day, summing these up between intervals is easier unless you want to duplicate the "week" field all over the DB.
Note also that you can easily have targets on different, overlapping periods.
Example, you can set a weekly target for week 22-28 November (I am using the European convention of having the week start on monday) and have a special one-day period set on Black Friday
So:
Period:
periodId|startDate |endDate |weekNo|Month | Year|
0030020|22-NOV-2010|28-NOV-2010| 43 |November | 2010 |
0030026|26-NOV-2010|26-NOV-2010| null |November | 2010 |
empSales:
EmployeeId|ProductId|SaleTarget|ActualSale|periodId|
567689| 788585| 58 | 42 | 0030020|
567689| 788585| 28 | 32 | 0030026|
Note how Employee 567689 missed his weekly target but managed to go over his Black Friday target.
Btw, while working on this example I think you better drop the "empSales" table, renaming it to "empTargets":
empTargets(EmployeeId,ProductId,SaleTarget,periodId).
because the Actual Sales is easily calculated on the fly either with a UDF or placed in a view - after all, it's just a
select sum(items_sold)
from sales
where sales.employeeId = empTargets.employeeId and
sales.ProductId= empTargets.ProductId and
sales.saleDate between empTargets.startDate and
empTargets.endDate)
so no need to store it directly in the table (in fact it could become a burden in case of returned items or other future corrections).

Related

Dimensional model to capture Sales weighting on different date schedules

We have a requirement to come up with a strategy to show Sales revenue data weighted by dates differently on different schedules.
We currently have a FactSales table with a grain of one row per order with the measure of sales amount. We have separate DimDate and DimTime dimensions,and a DimBusinessUnit dimension with one row for each entity within the organization.
In DimDate we have a flag for the major US holidays so we know reduced sales revenue may be expected. This flag would apply globally.
The ask is that different business units might have slow revenue days. For example, Monday's might be slow in one business unit, and Friday's slow in another. For analysis it is desireable to capture these different schedules with a flag or a weighting.
Ultimately this probably be reflected as a projected sales amount in a calculated measure.
How can I best add this weighting? Does it belong in the Date dimension, Business Unit dimension, or maybe a degenerate dimension in the Fact table, or something else altogether?
The DimDate is probably not a good place to keep this information, as each Business Unit (BU) may have a different schedule, so quite possibly you will have to have a flag on each of the dates per a combination of BU and a slow day. So for example if BU1 and BU2 has a slow day on Monday, each Monday in your DimDate will have to have a way showing that it's slow for BU1 and BU2.
The Dimension BU, might be a better place, as schedule is specific to each of the unit. So you may opt for extending your dim by adding 7 days as an attributes and flag them as slow or not using for example false or true flags. You could also have one attribute with the bit mask i.e. 0100000 where position of the value corresponds to the day i.e. M T W T F S S and 0 is not slow and 1 is slow, so in this example T is a slow day.
This will also allow you to trace a history if you whish selecting relevant SCD process.
Another option may be a separate Dimension i.e. DimSchedule and Factless Fact Table.
http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/factless-fact-table/
I hope this helps.
Your situation seems to be the same as the Multiple National Calendars problem described by Kimball:
http://www.kimballgroup.com/1998/12/think-globally-act-locally/
Where Kimball is describing holidays in the left-most table, you could also add a "slow day" flag.

Can a join table be combined with another table?

The first image below is my database schema for a project that will use psql, ruby and active record.
While writing my schema, things got a bit complex. My "special_days" table ended up becoming a join table for "days_of_week" and "organizations". I'm assuming that this is not best practice and will end up causing me trouble.
In the second schema below, I made a separate join table for "days of week" and "organizations". My special_days table still needs to be associated with a day_of_week and an organization, so I think I have to keep the joining information in the special_days table. Is there a better way to do this? It seems that my second attempt is too repetitive.
These are my relationships:
days of week & organizations | many to many
city & organizations | one to many
organization & special days | one to many
day of week & special days | one to many
Some more information about the requirements might be useful. Are you required to do something like define which days are holidays, which days are paydays, etc. (e.g. many organizations would define Sat and Sun as non-work days and many U.S. organization would define July 4 as a non-work day)? I'm not sure what days_of_the week represents. Is this a table with 7 records in it (M,T,W,Th,F,S,Sun)? If I'm guessing correctly at the requirements it might be better to do something like have a table called special_day that has a date column and a recurrence column (e.g. weekly, monthly, yearly, etc.). You could then have a organization_special_day table that is a many-to-many join table on organization and special_day.

SQL Schema - merge data from two tables

I'm trying to create a schema that will allow me to define times, when a supplier website is non operational (planned not unplanned).
I've gone for non-operational as opposed to operational because many suppliers work 24/7, so non-operting times represent the least number of rows.
For example, a supplier might not work:
On a Sunday
On a recognised holiday date - '1/1/2015'
On a Saturday after 5pm
I'm not overly confident with SQL Server, but have come up with a schema that 'does the job'. However, as we all know, there are good ways, not so good ways, and bad ways, that all work in a fashion, so would appreciate comments and advice on what I have to date.
One of the key features is to use data from WorkingDays and Holidays together to represent a WorkingPeriod entity.
I would appreciate coments no matter how small.
Holiday
Contains all recognised holidays - Easter Monday, Good Friday etc.
HolidayDate
Contains dates of holidays. For instance, this year Easter Monday is 6th Apr 2015.
WorkingDay
Sunday through to Monday, mapped to Asp.Net day of week enums.
WorkingPeriodType
A lookup table containing 2 rows - Holiday, or Day of Week
WorkingPeriod
Merges the Holiday table and the WorkingDay table to represent a single WorkingPeriod entity that can be used in the SupplierNonWorkingTimes table.
SupplierNonWorkingTimes
Contains the ID representing the WorkingDay/Holiday and the times of non- operation.
This is a very subjective question, as you've already observed there's no right and wrong, just different ways. I'm a database guy but I don't know your specific circumstances, so this is just some observations - you'll have to judge for yourself whether any of them are appropriate to you.
I like my naming to be crystal clear, it saves all the
misunderstanding by other people later on. If [WorkingDay] holds the
7 days of the week I would call it [WeekDay]. If you intend
[Holiday] to hold whole-day holidays I would call it [HolidayDay].
The main table [SupplierNonWorkingTime] is about 'non-working' so I
would call the [WorkingPeriod] table [NonWorkingPeriod]. The term
'period' always refers to a whole day, so I would replace 'period'
with 'day' (let's ignore start/stop time for now).
My first impression was that your design is over-normalised. The
[WorkingPeriodType] table has 2 rows that will never change,
[WorkingDay] has 7. For these very low numbers I sometimes prefer a
char(1) with a check constraint. Normalisation is generally good,
but lots of JOINs for trivial queries is not so good. You could
eliminate [WorkingPeriodType] and [WorkingDay] but you've mentioned
.Net enums in your question so if you've got some sort of ORM in
your .Net code this level of normalisation might be right for you.
I'd add a Year field to the [HolidayDate] table, then the PK
becomes a better HolidayID+Year - unless you know somewhere that has
lots of Christmas' :)
I'd add an IsAllDay field to the [SupplierNonWorkingTime] table,
otherwise you have to use 'magic values' to represent 'all day' and
magic values are bad. There should be a check constraint to enforce
start/stop times can only be entered if IsAllDay = false.
Like I said, just my thoughts, hope it's helpful.

Change Data Capture and SQL Server Analysis Services

I'm designing a database application where data is going to change over time. I want to persist historical data and allow my users to analyze it using SQL Server Analysis Services, but I'm struggling to come up with a database schema that allows this. I've come up with a handful of schemas that could track the changes (including relying on CDC) but then I can't figure out how to turn that schema into a working BISM within SSAS. I've also been able to create a schema that translates nicely in to a BISM but then it doesn't have the historical capabilities I'm looking for. Are there any established best practices for doing this sort of thing?
Here's an example of what I'm trying to do:
I have a fact table called Sales which contains monthly sales figures. I also have a regular dimension table called Customers which allows users to look at sales figures broken down by customer. There is a many-to-many relationship between customers and sales representatives so I can make a reference dimension called Responsibility that refers to the customer dimension and a Sales Representative reference dimension that refers to the Responsibility dimension. I now have the Sales facts linked to Sales Representatives by the chain of reference dimensions Sales -> Customer -> Responsibility -> Sales Representative which allows me to see sales figures broken down by sales rep. The problem is that the Sales facts aren't the only things that change over time. I also want to be able to maintain a history of which Sales Representative was Responsible for a Customer at the time of a particular Sales fact. I also want to know where the Sale Representative's office was located at the time of a particular sales fact, which may be different than his current location. I might also what to know the size of a customer's organization at the time of a particular Sales fact, also which might be different than it is currently. I have no idea how to model this in an BISM-friendly way.
You mentioned that you currently have a fact table which contains monthly sales figures. So one record per customer per month. So each record in this fact table is actually an aggregation of individual sales "transactions" that occurred during the month for the corresponding dimensions.
So in a given month, there could be 5 individual sales transactions for $10 each for customer 123...and each individual sales transaction could be handled by a different Sales Rep (A, B, C, D, E). In the fact table you describe there would be a single record for $50 for customer 123...but how do we model the SalesReps (A-B-C-D-E)?
Based on your goals...
to be able to maintain a history of which Sales Representative was Responsible for a Customer at the time of a particular Sales fact
to know where the Sale Representative's office was located at the time of a particular sales fact
to know the size of a customer's organization at the time of a particular Sales fact
...I think it would be easier to model at a lower granularity...specifcally a sales-transaction fact table which has a grain of 1 record per sales transaction. Each sales transaction would have a single customer and single sales rep.
FactSales
DateKey (date of the sale)
CustomerKey (customer involved in the sale)
SalesRepKey (sales rep involved in the sale)
SalesAmount (amount of the sale)
Now for the historical change tracking...any dimension with attributes for which you want to track historical changes will need to be modeled as a "Slowly Changing Dimension" and will therefore require the use of "Surrogate Keys". So for example, in your customer dimension, Customer ID will not be the primary key...instead it will simply be the business key...and you will use an arbitrary integer as the primary key...this arbitrary key is referred to as a surrogate key.
Here's how I'd model the data for your dimensions...
DimCustomer
CustomerKey (surrogate key, probably generated via IDENTITY function)
CustomerID (business key, what you will find in your source systems)
CustomerName
Location (attribute we wish to track historically)
-- the following columns are necessary to keep track of history
BeginDate
EndDate
CurrentRecord
DimSalesRep
SalesRepKey (surrogate key)
SalesRepID (business key)
SalesRepName
OfficeLocation (attribute we wish to track historically)
-- the following columns are necessary to keep track of historical changes
BeginDate
EndDate
CurrentRecord
FactSales
DateKey (this is your link to a date dimension)
CustomerKey (this is your link to DimCustomer)
SalesRepKey (this is your link to DimSalesRep)
SalesAmount
What this does is allow you to have multiple records for the same customer.
Ex. CustomerID 123 moves from NC to GA on 3/5/2012...
CustomerKey | CustomerID | CustomerName | Location | BeginDate | EndDate | CurrentRecord
1 | 123 | Ted Stevens | North Carolina | 01-01-1900 | 03-05-2012 | 0
2 | 123 | Ted Stevens | Georgia | 03-05-2012 | 01-01-2999 | 1
The same applies with SalesReps or any other dimension in which you want to track the historical changes for some of the attributes.
So when you slice the sales transaction fact table by CustomerID, CustomerName (or any other non-historicaly-tracked attribute) you should see a single record with the facts aggregated across all transactions for the customer. And if you instead decide to analyze the sales transactions by CustomerName and Location (the historically tracked attribute), you will see a separate record for each "version" of the customer location corresponding to the sales amount while the customer was in that location.
By the way, if you have some time and are interested in learning more, I highly recommend the Kimball bible "The Data Warehouse Toolkit"...which should provide a solid foundation on dimensional modeling scenarios.
The established best practices way of doing what you want is a dimensional model with slowly changing dimensions. Sales reps are frequently used to describe the usefulness of SCDs. For example, sales managers with bonuses tied to the performance of their teams don't want their totals to go down if a rep transfers to a new territory. SCDs are perfect for tracking this sort of thing (and the situations you describe) and allow you to see what things looked like at any point historically.
Spend some time on Ralph Kimball's website to get started. The first 3 articles I'd recommend you read are Slowly Changing Dimensions, Slowly Changing Dimensions Part 2, and The 10 Essential Rules of Dimensional Modeling.
Here are a few things to focus on in order to be successful:
You are not designing a 3NF transactional database. Get comfortable with denormalization.
Make sure you understand what grain means and explicitly define the grain of your database.
Do not use natural keys as keys, and do not bake any intelligence into your surrogate keys (with the exception of your time keys).
The goals of your application should be query speed and ease of understanding and navigation.
Understand type 1 and type 2 slowly changing dimensions and know where to use them.
Make sure you have a sponsor on the business side with the power to "break ties". You will find different people in the organization with different definitions of the same thing, and you need an enforcer with the power to make decisions. To see what I mean, ask 5 different people in your organization to define "customer" or "gross profit". You'll be lucky to get 2 people to define either the same way.
Don't try to wing it. Read the The Data Warehouse Lifecycle Toolkit and embrace the ideas, even if they seem strange at first. They work.
OLAP is powerful and can be life changing if implemented skillfully. It can be an absolute nightmare if it isn't.
Have fun!

Fiscal year handling strategies in database design

By fiscal year I mean all the data in the database (in all tables) that occurred in the particular year. Lets say that we are building an application that allows user to choose from different years.
What way of implementing this would you prefer, and why:
Separate fiscal year data based on multiple separate database instances (for example, on every fiscal year start you could create a new instance with no data)
Have everything in one database, but with logic that automatically separates records from different years.
Personally, I have "seen" both methods, and I would choose the second. The only argument I can think of for the first method is to have less records in case that these are really big databases - but still, you could "archive" old records by joining them in summaries or by some other way. What do you think?
Separate fiscal year data based on multiple separate database instances (for example, on every fiscal year start you could create a new instance with no data)
No. Do not create a separate database instance, database, or table per fiscal year.
Besides not being normalized, you would be unnecessarily duplicating the supporting infrastructure: constraints, triggers, stored procedures & functions would all have to be updated to work with the new, current fiscal year. Which would also complicate data for future years for budgetting and planning.
Have everything in one database, but with logic that automatically separates records from different years.
There's no need for separation, just make sure that records contain a timestamp, which can then be used to determine what fiscal year it took place in.
There is a third alternative.
Create a table, let's call it "Almanac", that has one row per day, keyed by date.
In that table you can have a whole lot of attributes that are determined by the date.
Among them could be some attributes for which there is a function, like the day of the week. Some attributes could be company specific, like whether or not the day is a workday at the company.
Among the attributes could be the fiscal year, the fiscal quarter, and the fiscal month, if your company has such things. It's not particularly important to normalize this table.
Write a program that populates this table. All the convoluted logic that goes into calculating the fiscal year from the date can thus be in one place, instead of scattered through out your system. Ten years worth of dates is only going to be about 3,650 rows, a tiny table by today's standards.
Then, cutting all of your date driven data by fiscal year, fiscal quarter, or whatever is just a matter of joining and grouping. You can even automate the production of different time frame views of the same data.
I've done this and it works. It's especially good in reporting databases and data warehouses.
No need for duplication. A time-stamp may be good enough, but to borrow from data-warehousing, you could create a "date dimension". It is a table with a row per a day and a column per date attribute. Some of those columns may be fiscal year, fiscal quarter etc. Then you add a DateKey to the transactions table and join the date dimension when querying.
Something like:
select sum(t.Total)
from Transactions as t
join dimDate as d on d.DateKey = t.DateKey
where d.FiscalYearQuarter = 'F2009-Q3';
The date dimension table may look something like:
CREATE TABLE dimDate
(
DateKey int -- 20090814
,FullDate date -- 2009-8-14
,FullDateDescription varchar(50) -- Friday August 14, 2009
,SQLDateStamp varchar(10) -- 2009-08-14
,DayOfWeek varchar(10) -- Friday
,DayNumberInWeek int -- 6
,DayNumberInMonth int -- 14
,DayNumberInYear int -- 226
-- many more here
,FiscalYear int -- 2009
,FiscalQuarter char(3) -- FQ3
,FiscalHalf char(3) -- FH2
,FiscalYearQuarter varchar(8) -- F2009-Q3
,FiscalYearHalf varchar(8) -- F2009-H2
);
You would pre-load the dimDate, from way back in past to forward in future; 100 years requires 36.5k rows -- not much for any DB.
Each entity should have its fiscal year as part of the metadata/staticdata.
From that you can easily handle the fiscal year breaks, and typically Databases can handle VERY LARGE amounts of data, so you should not have a problem.
Using the correct indexing will greatly inprove the performance of your queries, so worry about the performance once you hit the snag. Until then, worry about the code

Resources