Sales Person (Regional Manager, Zonal manager, Country Head) linking to Fact table - analytics

I have a small data warehouse for Sales. Here i have Fact Table for Sales Invoices and Dimensions like customer, Date Time, Sales Geo, Product Code.
Fact Table: Sales - >
Invoice Date, Customer code, Product Code, Sales Geo Code, Billing qty, Amount, Tax, Total Amount
For Sales Geo dimension - >
Sales Geo Code, City Name, Regional Code, Zone Code, Regional Manager Code, Zonal Manager code
I have confusion in how to link my sales persons like Regional Manager and Zonal Managers etc.
Regional Manager is leading one region of multiple cities,
zonal manager is leading multiple region.
Sometimes we change the regional area or zonal area, they get promoted, they left etc.
How to create dimension and link sales team with Sales Fact to get correct Sales report.
regards

there are a few options I can think of:
Denormalise the Regional and Zonal Manager information into your Sales Geo Dimension
Create a hierarchical Manager dimension keyed on Regional Manager and including their zonal manager details
Create a Person Dim and associate it twice to the Fact - once in the role of Regional Manager and once in the Role of Zonal Manager
If you will never want to link Manager information to a fact except in the context of the Sales Geo then option 1 probably makes more sense - as you have fewer potential joins in queries using this fact table.
Option 2 is more flexible as you can associate manager information to a fact without also using the Sales Geo
Option 3 is the most flexible but also likely to give the worst query performance (for any query that needs both types of manager) and also the only link between Regional and Zonal managers is via the Fact; there is no hierarchical information held in the Dimensions. Therefore option 3 is the one I would be least likely to choose

Denormalize the Regional Manager Code and Zonal Manager code in your fact table.
So basically you will store in each fact row along with the Sales Geo Code also the current assignment of the two manager roles at the time of the sales (more precise at the time of the loading the record).
This model allows both types of the reports using
managers assigned at the transaction time (direct from the fact table) and
current managers (join from the fact table to Sales Geo dimension to get the code of the both managers)
Now your setup allows only the second type of the reporting, which could be suboptimal in case that the managers are frequently re-assigned.
If you prefer not to denormalize the fact table you can always switch the Sales Geo dimension to SCD type 2 which will introduce a historical view on the dimension and the assignment of the managers.
You'll have to join not only with the Sales Geo Code from the fact table to the Sales Geo dimension but also considering the transaction date...
Invoice_Date between sales_geo.validfrom_date and sales_geo.validto_date
... to get the managers assigned at the time of the transaction.
The decision has a typical tradeoff between storage plus maintaining the consistency and more complex joins plus maintaing the dimension history on the other side.

Related

SQL Server Database Design - Seperate Table for Sale and Purchase

I am building a new business application for my personal business which has close to ~100 transactions of sale and purchase per day. I am thinking of having Separate tables to record the sale and purchase with another linked table for Items that were sold and a seperate linked table with items that were purchased.
Example:
**SaleTable**
InvoiceNo
TotalAmt
**SaleTableDetail**
LinkedInvNo
ProductID
Quantity
Amount
etc.,
would this design be better or would it be more efficient to have one transactiontable with a column stating sale or purchase?
-From an App/Database/Query/Reporting Perspective
An invoice is not the same as a sales order. An invoice is a request for payment. A sales order is an agreement to sell products to a party at a price on a date.
A sales order is almost exactly the same as a purchase order, except you are the customer, and a sales order line item can reference a purchase order line item. You can put them in separate tables, but you should probably use Table Inheritance (CTI, extending from an abstract Order). Putting them in the same table with a "type" column is called Single Table Inheritance and is nice and simple.
Don't store totals in your operational db. You can put them in your analytic db though (warehouse).
You are starting small, thats a quick way to do. But, I am sure, very shortly you will run into differences between sale and purchase transactions, some fields will describe only a sale and some fields that will be applicable only for purchases.
In due course, you may want to keep track of modifications or a modification audit. Then you start having multiple rows for the same transaction with fields indicating obsoletion or you have to move history records to another table.
Also, consider the code-style-overhead in all your queries, you got to mention the Transaction Type as sale or purchase for simple queries.
It would be better to design your database with a model that maps business reality closest. At the highest level, everything may abstract to a "transaction", with date, amount and some kind of tag to indicate amount is paid or received against what context. That shouldn't mean we can have a table with Tag, Date, Amount, PayOrReceive to handle all the diverse transactions.

Structuring a cube with paths from a fact table to a dimension via two alternative intermediate dimensions

I'm unsure how to configure a cube in SSAS for a complex case that I can simplify as follows:
A fact table stores data about a Sale.
A dimension called Promotion records details of the marketing activity that generated the Sale
A dimension called Customer records details of the person who we sold to
We also have a table holding data about an Organisation
In some, but not all cases, a Promotion is targetted at an Organisation. There is an optional one-to-one relationship from Promotion to Organisation.
In some, but not all cases, a Customer is associated with an Organisation. There is an optional many-to-one relationship from Customer to Organisation.
We want to be able to analyse Sales by Organisation. For instance, if I report number of Sales by Organisation, the count for each Organisation should include both the sales through Promotions targetted at that Organisation and sales to Customers associated with that Organisation.
Note that with this data structure, each Sale may be associated with 0, 1 or 2 Organisations depending on the Promotion and the Customer. So if I report on number of sales by organisation, the grand total will not necessarily equal the total number of sales.
How would you structure the cube? I don't think it can work by simply setting up a referenced relationship from Sales->Promotion->Organisation and another from Sales->Customer->Organisation because SSAS won't know which path to use (and certainly won't know that it should aggregate across both paths together). Do I create two Organisation dimensions? Do I disconnect Organisation from the other dimensions and define some direct linkage between Organisation and Sales? Do I scrap the Organisation dimension and include organisation details as attributes in both Promotion and Customer?
You are correct in that SSAS won't be able to handle referencing the Organisation dimension through Promotion on one path, and through Customer on another path. This will give you an error when you try to build the cube.
Since each sale can be associated with 0, 1 or 2 organisations, I would recommend modelling this with a bridge-table (many-to-many) between the Organisation-dimension and the Sale-fact. This assumes that you have a unique ID on each Sale-transaction, so that you can create a fact-dimension on the Sale-fact (which need not be visible in the cube).
You construct the bridge-table in your ETL-flow. It should simply contain 2 columns, which relate the Organisation ID's with the Sale ID's. No Sale ID should have more than 2 Organisation ID's. Your final model should look something like this:
DimCustomer <---.
|
FactSale <---- BridgeSaleOrganisation ----> DimOrganisation
|
DimPromotion <---ยด
In the dimension-usage of SSAS, you set up a Many-to-many relation between FactSale and DimOrganisation using the BrdigeSaleOrganisation as the intermediary table. Once this is in place, filtering sales by the Organisation-dimension, will give you all sales belonging to that organisation via the bridge table, no matter whether they are through Promotion or Customer.
For more examples of many-to-many modelling, check out this excellent paper by "the gurus", Marco Russo and Alberto Ferrari.

Change Data Capture and SQL Server Analysis Services

I'm designing a database application where data is going to change over time. I want to persist historical data and allow my users to analyze it using SQL Server Analysis Services, but I'm struggling to come up with a database schema that allows this. I've come up with a handful of schemas that could track the changes (including relying on CDC) but then I can't figure out how to turn that schema into a working BISM within SSAS. I've also been able to create a schema that translates nicely in to a BISM but then it doesn't have the historical capabilities I'm looking for. Are there any established best practices for doing this sort of thing?
Here's an example of what I'm trying to do:
I have a fact table called Sales which contains monthly sales figures. I also have a regular dimension table called Customers which allows users to look at sales figures broken down by customer. There is a many-to-many relationship between customers and sales representatives so I can make a reference dimension called Responsibility that refers to the customer dimension and a Sales Representative reference dimension that refers to the Responsibility dimension. I now have the Sales facts linked to Sales Representatives by the chain of reference dimensions Sales -> Customer -> Responsibility -> Sales Representative which allows me to see sales figures broken down by sales rep. The problem is that the Sales facts aren't the only things that change over time. I also want to be able to maintain a history of which Sales Representative was Responsible for a Customer at the time of a particular Sales fact. I also want to know where the Sale Representative's office was located at the time of a particular sales fact, which may be different than his current location. I might also what to know the size of a customer's organization at the time of a particular Sales fact, also which might be different than it is currently. I have no idea how to model this in an BISM-friendly way.
You mentioned that you currently have a fact table which contains monthly sales figures. So one record per customer per month. So each record in this fact table is actually an aggregation of individual sales "transactions" that occurred during the month for the corresponding dimensions.
So in a given month, there could be 5 individual sales transactions for $10 each for customer 123...and each individual sales transaction could be handled by a different Sales Rep (A, B, C, D, E). In the fact table you describe there would be a single record for $50 for customer 123...but how do we model the SalesReps (A-B-C-D-E)?
Based on your goals...
to be able to maintain a history of which Sales Representative was Responsible for a Customer at the time of a particular Sales fact
to know where the Sale Representative's office was located at the time of a particular sales fact
to know the size of a customer's organization at the time of a particular Sales fact
...I think it would be easier to model at a lower granularity...specifcally a sales-transaction fact table which has a grain of 1 record per sales transaction. Each sales transaction would have a single customer and single sales rep.
FactSales
DateKey (date of the sale)
CustomerKey (customer involved in the sale)
SalesRepKey (sales rep involved in the sale)
SalesAmount (amount of the sale)
Now for the historical change tracking...any dimension with attributes for which you want to track historical changes will need to be modeled as a "Slowly Changing Dimension" and will therefore require the use of "Surrogate Keys". So for example, in your customer dimension, Customer ID will not be the primary key...instead it will simply be the business key...and you will use an arbitrary integer as the primary key...this arbitrary key is referred to as a surrogate key.
Here's how I'd model the data for your dimensions...
DimCustomer
CustomerKey (surrogate key, probably generated via IDENTITY function)
CustomerID (business key, what you will find in your source systems)
CustomerName
Location (attribute we wish to track historically)
-- the following columns are necessary to keep track of history
BeginDate
EndDate
CurrentRecord
DimSalesRep
SalesRepKey (surrogate key)
SalesRepID (business key)
SalesRepName
OfficeLocation (attribute we wish to track historically)
-- the following columns are necessary to keep track of historical changes
BeginDate
EndDate
CurrentRecord
FactSales
DateKey (this is your link to a date dimension)
CustomerKey (this is your link to DimCustomer)
SalesRepKey (this is your link to DimSalesRep)
SalesAmount
What this does is allow you to have multiple records for the same customer.
Ex. CustomerID 123 moves from NC to GA on 3/5/2012...
CustomerKey | CustomerID | CustomerName | Location | BeginDate | EndDate | CurrentRecord
1 | 123 | Ted Stevens | North Carolina | 01-01-1900 | 03-05-2012 | 0
2 | 123 | Ted Stevens | Georgia | 03-05-2012 | 01-01-2999 | 1
The same applies with SalesReps or any other dimension in which you want to track the historical changes for some of the attributes.
So when you slice the sales transaction fact table by CustomerID, CustomerName (or any other non-historicaly-tracked attribute) you should see a single record with the facts aggregated across all transactions for the customer. And if you instead decide to analyze the sales transactions by CustomerName and Location (the historically tracked attribute), you will see a separate record for each "version" of the customer location corresponding to the sales amount while the customer was in that location.
By the way, if you have some time and are interested in learning more, I highly recommend the Kimball bible "The Data Warehouse Toolkit"...which should provide a solid foundation on dimensional modeling scenarios.
The established best practices way of doing what you want is a dimensional model with slowly changing dimensions. Sales reps are frequently used to describe the usefulness of SCDs. For example, sales managers with bonuses tied to the performance of their teams don't want their totals to go down if a rep transfers to a new territory. SCDs are perfect for tracking this sort of thing (and the situations you describe) and allow you to see what things looked like at any point historically.
Spend some time on Ralph Kimball's website to get started. The first 3 articles I'd recommend you read are Slowly Changing Dimensions, Slowly Changing Dimensions Part 2, and The 10 Essential Rules of Dimensional Modeling.
Here are a few things to focus on in order to be successful:
You are not designing a 3NF transactional database. Get comfortable with denormalization.
Make sure you understand what grain means and explicitly define the grain of your database.
Do not use natural keys as keys, and do not bake any intelligence into your surrogate keys (with the exception of your time keys).
The goals of your application should be query speed and ease of understanding and navigation.
Understand type 1 and type 2 slowly changing dimensions and know where to use them.
Make sure you have a sponsor on the business side with the power to "break ties". You will find different people in the organization with different definitions of the same thing, and you need an enforcer with the power to make decisions. To see what I mean, ask 5 different people in your organization to define "customer" or "gross profit". You'll be lucky to get 2 people to define either the same way.
Don't try to wing it. Read the The Data Warehouse Lifecycle Toolkit and embrace the ideas, even if they seem strange at first. They work.
OLAP is powerful and can be life changing if implemented skillfully. It can be an absolute nightmare if it isn't.
Have fun!

SQL Server BI: SIngle cube, multiple fact tables

I'm new to creating cubes, so please be patient.
Here's an example of my data
I have multiple companies, each company has multiple stores.
Sales are each linked to a particular company, with a particular store on a particular date.
ex:5 sales took place for Company A, Store 1, on 5/19/2011
Returns are linked to a particular company on a particular date.
ex: 3 returns took place for Company A on 3/11/2012
Sometimes my users want to see a list of stores, the date, and how many returns took place, and how many sales.
Sometimes they want to see a list of companies, the specific stores, and the number of sales.
I have a table that stores
COMPANY - DATE - STORE- SALES - RETURNS
I end up having the value for returns repeated for each store under a particular COMPANY - DATE pair.
so if I'm writing a query, and I want to find out returns, I just do a
select distinct company, date, returns from mytable
but I am not sure how to this into a cube (using SS BI and Visual Studio).
(I've only made a couple of cubes so far)
Thanks! (also, feel free to point me at appropriate references)
It sounds like Company is an attribute of the Store and should be in the Store dimension rather than the fact table. There may have to be a transformation on returns to convert the Company to a store.
Am I missing anything?

Same Fact Table Column; Records with Multiple Reasons

I am in a situation similar to the one below:
Think for instance we need to store customer sales in a fact table (under a data warehouse built with dimensional modelling). I have sales, discounts related to the sale, sales returns and cancellations to be stored.
Do you think it would be advisable to store sales for a day to a customer in a particular product (when the day is the grain) as a positive value while the returns and discounts are stored as minuses?
Also if a discount is enforced to a customer at a level other than the product (for instance brand), do you think it is alright to persist it with a key particularly assigned to the brand (product is the grain) while the product column being given an N/A, for the particular record?
Thanks in advance.
If your sales are considered a good thing (I'm assuming they are) then recording sales as positive numbers makes perfect sense. Any transaction that reduces sales (i.e. discounts and returns) should therefore be recorded as negative numbers. This will make reporting your sales very natural.
If you have diffent dimensions that might account for a record, you should populate the dimensions that make sense. So yes, attribute a discount to a brand rather than a product if that is what happened in your business transaction. This way your reporting will be able to look at all discounts, at discounts for particular products and discounts for entire brands. If your fact table shows the most direct "cause" of the discount (product or brand) then your reports will be more useful than if you link the fact to brand through a relationship to product.

Resources