SSAS cube with date range records - sql-server

I have to build a cube based on date range records, and not sure about the best way to proceed.
Imagine say a cube of Cars and warranty periods. Each car has a start date, and an end of warranty periods. Then there may be extended warranty periods.. so imagine
CAR REG TYPE WARRANTY START WARRANTY END
CAR A PURCHASE 01/01/2016 31/01/2016
CAR A EXTENDED 01/01/2017 30/06/2017
CAR A EXTENDED 01/08/2017 30/01/2018 -- note, gap here
CAR B PURCHASE 01/01/2016 31/01/2016
CAR B EXTENDED 01/01/2017 30/06/2017
CAR B EXTENDED 01/08/2017 30/01/2018 -- note, gap here
So multiple items, with multiple date ranges. There is a main table (CARS) with car details (colour, model, etc).
Now I want to build a cube, which is reportable at month level, cars under warranty/warranty type, etc.
So plan 1 was to build a view which explodes the above out by a join to a date table, report by month, and feed this into a cube. But, the number of cars multiplied by the months covered leads to multi hundreds of milions of rows - which means sometimes the server runs out of TempDB space, and when it does run, the cube takes hours to build.
Is there a better way - such as a view for the car details, and then another view on the warranty table (how do I get SSAS to deal with months in a date range) - will the join in SSAS be more efficient than a join in a view in SQL?
Thanks all.

You can connect start and end columns to time dimension. And on the report you can use ":" operator to build date tange report.
More details you will find here: http://www.purplefrogsystems.com/blog/2013/04/mdx-between-start-date-and-end-date/

One approach which will work with drag-and-drop client tools like Excel or Power BI would be a many-to-many Date dimension. Since car A and B match, let's pretend there's a car C which has a warranty from 2015-07-30 to 2015-12-31.
Create a DimWarrantyDateRangeKey which represents a unique combination of dates during which a warranty is active. The surrogate key is WarrantyDateRangeKey. Certainly the ETL that builds this table will be a bit expensive, but given the size of your data I think it's a worthwhile investment that will produce better query performance than if your m2m bridge table has one row per active day per car.
Each car should be assigned one WarrantyDateRangeKey. Add the WarrantyDateRangeKey column to your fact tables...
CAR REG WarrantyDateRangeKey
A 1
B 1
C 2
m2mWarrantyDateRange
WarrantyDateRangeKey DateKey
1 20160101
1 20160102
1 ...
1 20170629
1 20170630
1 20170801
1 20170802
1 ...
1 20180129
1 20180130
2 20150701
2 20150702
2 ...
2 20151230
2 20151231
The tables relate together as follows...
FactTable -> DimWarrantyDateRange <- m2mWarrantyDateRange -> DimDate
Then in the cube you DimWarrantyDateRange should be a dimension, m2mWarrantyDateRange should be a measure group with a count measure. DimDate should be a dimension. Then you should relate DimDate to FactTable as a many-to-many (m2m) dimension using m2mWarrantyDateRange as the intermediate measure group.
Now in Excel or Power BI you should be able to filter to a particular date and it will filter down to the cars which had an active warranty on that day.

Related

DB Schema: Versioned price model vs invoice-related data

I am creating some db model for rental invoice generation.
The invoice consists of N booking time ranges.
Each booking belongs to a price model. A price model is a set of rules which determine a final price (base price + season price + quantity discout + ...).
That means the final price for the N bookings within an invoice can be a complex calculation, and of course I want to keep track of every aspect of the final price calculation for later review of an invoice.
The problem is, that a price model can change in the future. So upon invoice generation, there are two possibilities:
(a) Never change a price model. Just make it immutable by versioning it and refer to a concrete version from an invoice.
(b) Put all the price information, discounts and extras into the invoice. That would mean alot of data, as an invoice contains N bookings which may be partly in the range of a season price.
Basically, I would break down each booking into its days and for each day I would have N rows calculating the base price, discounts and extra fees.
Possible table model:
Invoice
id: int
InvoiceBooking # Each booking. One invoice has N bookings
id: int
invoiceId: int
(other data, e.g. guest information)
InvoiceBookingDay # Days of a booking. Each booking has N days
id: int
invoiceBookingId: id
date: date
InvoiceBookingDayPriceItem # Concrete discounts, etc. One days has many items
id: int
invoiceBookingDayId: int
price: decimal
title: string
My question is, which way should I prefer and why.
My considerations:
With solution (a), the invoice would be re-calculated using the price model information each time the data is viewed. I don't like this, as algorithms can change. It does not feel natural for the "read-only" nature of an invoice.
Also the version handling of price models is not a trivial task and the user needs to know about the version concept, which adds application complexity.
With solution (b), I generate a bunch of nested data and it adds alot of complexity to the schema.
Which way would you prefer? Am I missing something?
Thank you
There is a third option which I recommend. I call it temporal (time) versioning and the layout of the table is really quite simple. You don't describe your pricing data so I'll just show a simple example.
Table: DailyPricing
ID EffDate Price ...
A 01/01/2015 17.50 ...
B 01/01/2015 20.00 ...
C 01/01/2015 22.50 ...
B 01/01/2016 19.50 ...
C 07/01/2016 24.00 ...
This shows that all three price schedules (A, B and C just represent whatever method you use to distinguish between price levels) were given a price on Jan 1, 2015. On Jan 1, 2016, the price of plan B was reduced. In July, the price of plan C was increased.
To get the current price of a plan, the query is this:
select dp.Price
from DailyPricing dp
where dp.ID = 'A'
and dp.Effdate =(
select Max( dp2.EffDate )
from DailyPricing dp2
where dp2.ID = dp.ID
and dp2.EffDate >= :DateOfInterest);
The DateOfInterest variable would be loaded with the current date/time. This query returns the one price that is currently in effect. In this case, the price set Jan 1, 2015 as that has never changed since taking effect. If the search had been for plan B, the price set on Jan 1, 2016 would have been returned and for plan C, the price set on July 1, 2016. These are the latest prices set for each plan; that is, the current prices.
Such a query would more likely be in a join with probably the invoice table so you could perform the price calculation.
select ...
from Invoices i
join DailyPricing dp
on dp.ID = i.ID
and dp.Effdate =(
select Max( dp2.EffDate )
from DailyPricing dp2
where dp2.ID = dp.ID
and dp2.EffDate >= i.InvoiceDate )
where i.ID = 1234;
This is a little more complex than a simple query but you are asking for more complex data (or, rather, a more complex view of the data). However, this calculation is probably only executed once and the final price stored back in to the invoice data or elsewhere.
It would be calculated again only if the customer made some changes or you were going through an audit, rechecking the calculation for accuracy.
Notice something, however, that is subtle but very important. If the query above were being executed for an invoice that had just been created, the InvoiceDate would be the current date and the price returned would be the current price. If, however, the query was being run as a verification on an invoice that was two years old, the InvoiceDate would be two years ago and the price returned would be the price that was in effect two years ago.
In other words, the query to return current data and the query to return past data is the same query.
That is because current data and past data remain in the same table, differentiated only by the date the data takes effect. This, I think, is about the simplest solution to what you want to do.
How about A and B?
It's not best practice to re-calculate any component of an invoice, especially if the component was printed. An invoice and invoice details should be immutable, and you should be able to reproduce it without re-calculating.
If you ever have a problem with figuring out how you got to a certain amount, or if there is a bug in your program, you'll be glad you have the details, especially if the calculations are complex.
Also, it's a good idea to keep a history of your pricing models so you can validate how you got to a certain price. You can make this simple to your users. They don't have to see the history -- but you should record their changes in the history log.

SSAS- MDX Assign fact row to dimension member base on calculation

I am looking to calculate in the calc script something, so I can allocate a row from a fact table to a dimension member.
The business scenario is the following. I have a fact table that record customer credit and debit ( customer can do a lot of little loan) and a dimension Customer.I want to classify my customer base on his history of credit and debit on a given period.Classification of customer change over time.
Example
The rule is, if a customer balance (for a given period ) is over - 50 000, the classification is "large", if he have more than a record and have done a payement in the last 3 month he is a "P&P.If he doesn't own any money and have done a payement in the last 3 month its "regular".
My question is more about direction than a specific code,which way is the best to implement this kind of rule ?
Best Regards
Vincent Diallo-Nort
I'd create a fact table with a balance auto-updated status every day:
check the rolling balance yesterday plus today's records.
when the balance = 0, then remove a record.
Plus add a flow fact table with payments only.
Add measures:
LastChild aggregation for the first fact table.
Sum aggregation for the second fact table.
When it's done, you may apply a MDX calculation:
case
when [Measure].[Balance] > 50000
then "Large"
when [Measure].[Payments] + ([Date].[Calendar].CurrentMember.Lag(1),[Measure].[Payments]) + ([Date].[Calendar].CurrentMember.Lag(2),[Measure].[Payments]) > 0
then "P&P"
else "Regular"
end
In order to give you answer in detail you have to provide more information about your data structure.

Is it possible to write a "matchmaking" query in Access 2007?

I am trying to create a database that will essentially act as a matchmaking service. I have two tables with the same categories, Clients and Opportunities.
The idea is to enter buy-side Clients as we acquire them, filling out the applicable descriptive fields (EBITDA, Gross Revenue, etc).
Then, when a possible Opportunity comes along, I enter that info in its respective table (with EBITDA, Gross Revenue, etc).
The idea is then to run a query that will essentially "match" records from Clients with records from Opportunities, in a prioritized order, i.e. the items with most matching fields rank at the top.
So I've created the tables (Clients and Opportunities), a third table Values that contains the possible choices for several of the fields, and forms for entering the information into Opportunities and Clients.
I've entered a test record in both Opportunities and Clients tables with several matching fields. But my query isn't returning any hits. Per some advice I received, I tried to add in some SQL code to address a possible NULL issue, but Access won't accept the code. Can this query be written? Are the results I'm looking for possible?
My query, designed using the Wizard, looks like this and returns no hits, despite the sample records in each table having matching entries:
SELECT Clients.*, Opportunities.*
FROM Clients INNER JOIN Opportunities
ON (Clients.[NAICS Sector Codes]=Opportunities.[NAICS Sector Codes])
AND (Clients.[Business Model]=Opportunities.[Business Model])
AND (Clients.[Gross Revenue]=Opportunities.[Gross Revenue])
AND (Clients.EBITDA=Opportunities.EBITDA)
AND (Clients.[EBITDA Margin (%)]=Opportunities.[EBITDA Margin (%)])
AND (Clients.[Growth-Timeframe in Years]=Opportunities.[Growth-Timeframe in Years])
AND (Clients.[Growth-Revenue]=Opportunities.[Growth-Revenue])
AND (Clients.[Growth-EBITDA (%)]=Opportunities.[Growth-EBITDA (%)])
AND (Clients.[Geographical Operational Location Preference]=Opportunities.[Geographical Operational Location Preference])
AND (Clients.[Debt Load]=Opportunities.[Debt Load])
AND (Clients.[Primary Area of Sales Concentration (Geographic)]=Opportunities.[Primary Area of Sales Concentration (Geographic)])
AND (Clients.[% of total revenue from ≤ top 3 customers]=Opportunities.[% of total revenue from ≤ top 3 customers])
AND (Clients.[% of total revenue from ≤ top 10 customers]=Opportunities.[% of total revenue from ≤ top 10 customers])
AND (Clients.[% of total revenue from ≤ top 20 customers]=Opportunities.[% of total revenue from ≤ top 20 customers])
AND (Clients.[Current Management Post-Transaction Plan]=Opportunities.[Current Management Post-Transaction Plan])
AND (Clients.[Preferred Deal Structure]=Opportunities.[Preferred Deal Structure]) AND (Clients.NAPCS=Opportunities.NAPCS)
AND (Clients.Notes=Opportunities.Notes)
AND (Clients.[ZLD Sector Notes]=Opportunities.[ZLD Sector Notes]);

Database schema design for financial forecasting

I need to develop a web app that allows companies to forecast financials.
the app has different screens, one for defining employee salaries, another for sales projections etc..
basically turn an excel financial forecast model into an app.
question is, what would be the best way to design the database, so that financial reports (e.g. a profit and loss statement or balance sheet) could be quickly generated?
assuming the forecast period is for 5 years, would you have a table
with 5 years*12 months = 60 fields per each row? is that performant
enough?
would you use DB triggers to recalculate salary expenses whenever a single employee data is changed?
I'd think it would be better to store each month's forecast in its own row in a table that looks like this
month forecast
----- --------
1 30000
2 31000
3 28000
... ...
60 52000
Then you can use the aggregate functions to calculate forecast reports, discounted cash flow etc. ( Like if you want the un-discounted total for just 4 years):
SELECT SUM(forecast) from FORECASTS where month=>1 and month<=48
For salary expenses, I would think that having a view that does calculations on the fly (or if you DB engine supports "materialized views" should have sufficient performance unless we're talking some giant number of employees or really slow DB.
Maybe have a salary history table, that trigger populates when employee data changes/payroll runs
employeeId month Salary
---------- ----- ------
1 1 4000
2 1 3000
3 1 5000
1 2 4100
2 2 3100
3 2 4800
... ... ...
Then again, you can do SUM or other aggregate function to get to the reported data.

How to merge rows of SQL data on column-based logic?

I'm writing a margin report on our General Ledger and I've got the basics working, but I need to merge the rows based on specific logic and I don't know how...
My data looks like this:
value1 value2 location date category debitamount creditamount
2029 390 ACT 2012-07-29 COSTS - Widgets and Gadgets 0.000 3.385
3029 390 ACT 2012-07-24 SALES - Widgets and Gadgets 1.170 0.000
And my report needs to display the two columns together like so:
plant date category debitamount creditamount
ACT 2012-07-29 Widgets and Gadgets 1.170 3.385
The logic to join them is contained in the value1 and value 2 column. Where the last 3 digits of value 1 and all three digits of value 2 are the same, the rows should be combined. Also, the 1st digit of value 1 will always been 2 for sales and 3 for costs (not sure if that matters)
IE 2029-390 is money coming in for Widgets and Gadgets sold to customers, while 3029-390 is money being spent to buy the Widgets and Gadgets from suppliers.
How can I so this programmatically in my stored procedure? (SQL Server 2008 R2)
Edit: Would I load the 3000's into one variable table the and the 2000's into another, then join the two on value2 and right(value1, 3)? Or something like that?
Try this:
SELECT RIGHT(LTRIM(RTRIM(value1)),3) , value2, MAX(location),
MAX(date), MAX(category), SUM(debitamount), SUM(creditamount) FROM
table1 GROUP BY RIGHT(LTRIM(RTRIM(value1)),3), value2
It will sum the credit amount and debit amount. It will choose the maximum string value in the other columns, assuming they are always the same when value2 and the last 3 digits of value1 are the same it shouldn't matter.

Resources