modeling correct star schema for ssas tabular - sql-server

I'm using ssas tabular (powerpivot) and need to design a data-model and write some DAX.
I have 4 tables in my relational database-model:
Orders(order_id, order_name, order_type)
Spots (spot_id,order_id, spot_name, spot_time, spot_price)
SpotDiscount (spot_id, discount_id, discount_value)
Discounts (discount_id, discount_name)
One order can include multiple spots but one spot (spot_id 1) can only belong to one order.
One spot can include different discounts and every discount have one discount_value.
Ex:
Order_1 has spot_1 (spot_price 10), spot_2 (spot_price 20)
Spot_1 has discount_name_1(discount_value 10) and discount_name_2 (discount_value 20)
Spot_2 has discount_name_1(discount_value 15) and discount_name_3 (discount_value 30)
I need to write two measures: price(sum) and discount_value(average)
How do I correctly design a star schema with fact table (or maybe two fact tables) so that I in my powerpivot cube can get:
If i choose discount_name_1 I should get
order_1 with spot_1 and spot_2 and price on order_1 level will have value 50 and discount_value = 12,5
If I choose discount_name_3 I should get
order_1 with only spot_2 and price on order level = 20 and discount_value = 30

Fact(OrderKey, SpotKey, DiscountKey, DateKey, TimeKey Spot_Price, Discount_Value,...)
DimOrder, DimSpot, DimDiscount, etc....
TotalPrice:=
SUMX(
SUMMARIZE(
Fact
,Fact[OrderKey]
,Fact[SpotKey]
,Fact[Spot_Price]
)
,Fact[Spot_Price]
)
AverageDiscount:=
AVERAGE(Fact[Discount_Value])
Fact table is denormalized and you end up with the simplest star schema you can have.
First measure deserves some explanation. [Spot_Price] is duplicated for any spot with multiple discounts, and we would get wrong results with a simple SUM(). SUMMARIZE() does a group by on all the columns passed to it, following relationships (if necessary, we're looking at a single table here so nothing to follow).
SUMX() iterates over this table and accumulates the value of the expression in its second argument. The SUMMARIZE() has removed our duplicate [Spot_Price]s so we accumulate the unique ones (per unique combination of [OrderKey] and [SpotKey]) in a sum.

You say
One order can include multiple spots but one spot (spot_id 1) can only
belong to one order.
That's is not supported in the table definitions you give just above that statement. In the table definitions, one order has only one spot but (unless you've added a unique index to Orders on spot_id) each Spot can have multiple orders. Each Spot can also have multiple discounts.
If you want to have the relationship described in your words, the table definitions should be:
Orders(order_id, order_name, order_type)
OrderSpot(order_id, spot_id) -- with a Unique index on spot_id)
Spots (spot_id, spot_name, spot_time, price)
or:
Orders(order_id, order_name, order_type)
Spots (spot_id, spot_name, spot_time, order_id, price)
You can create the ssas cube with Order as the fact table, with one dimention in the Spot Table. If you then add the SpotDiscount and Discount tables with their relations (SpotDiscount to Spot, Discount to SpotDiscount) you have a 1 dimentional.
EDIT as per comments
Well, the Fact table would have order_id, order_name, order_type
The Dimension would be made up of the other 3 tables and have the columns you're interested in: probably spot_name, spot_time, spot_price, discount_name, discount_value.

Related

Identify elements that do not appear in the period (Google Data Studio)

I have a table that shows the recurrence of purchasing a product, with the columns: product_id, report_date, quantity.
I need to list in a table the products that are more than 50 days unsold. The opposite I managed to do (list those that were sold in the last 50 days) but the opposite logic has not yet been able to implement.
Does anyone have any tips?
An example of the table:
product_id,date,report_date,quantity
329,2019-01-02 08:19:17,2019-01-02 14:34:12,6
243,2019-01-03 09:19:17,2019-01-03 15:34:12,6
238,2019-02-02 08:19:17,2019-03-02 14:34:12,84
170,2019-04-02 08:19:17,2019-04-02 14:34:12,84
238,2019-04-02 08:19:17,2019-04-02 14:34:12,8
238,2019-04-02 08:19:17,2019-04-02 14:34:12,100
238,2019-08-02 08:19:17,2019-08-02 14:34:12,100
238,2019-10-02 08:19:17,2019-10-02 14:34:12,100
170,2020-01-02 08:19:17,2020-01-02 14:34:12,84
170,2020-01-02 08:19:17,2020-01-02 14:34:12,84
There are many steps to do this task. I assume the date column is the one to work with. Your example from table includes duplicated entries. Is it right that at the same time the order is there twice?
So here are the steps:
At first add an calculated field date_past to your dataset:
DATE_DIFF(CURRENT_DATE(),date)
To the dataset add a filter SO_demo with:
include date_past<30
Then blend the data with it self. Use product_id as Join key. Only the 2nd dataset has the SO_demo filter. Add to the dimension of this dataset the calculated field sold_last_30_days with the formula "yes".
In the table/chart to display add a filter on the field include sold_last_30_days is Null.

How to use CALCULATE with LOOKUPVALUE and USERELATIONSHIP

I have three tables, one dim table called "ISO_ccy" only showing the ISO acronyms of currencies, one dim table showing the "home currency" of an entity ("entities") and another (fact) table ("trades") showing foreign exchange (FX) trades. The thing with FX trades is, they always have two currencies (ccy) involved, hence the latter table has two columns with currency ISO codes (and corresponding amounts). The two dim tables both only have one column with ISO ccy codes (the table "ISO_ccy" having distinct values only).
I now have one (active) relationship for currency 1 (ccy1) and one inactive for currency 2 (ccy2) between the "ISO_ccy" and the "trades" table. There is also an active relationship between "ISO_ccy" and "entities" tables.
I need to calculate the sum for each currency and each entity where the currency is not equal to the "home currency" of that entity.
Seems to be pretty straight forward for the ccy with an active relationship (ccy1):
Sum_Hedges_activeRelation:=
CALCULATE(
SUM([Amount_ccy1]);
FILTER(trades;trades[ccy1]>LOOKUPVALUE(entities[ccy];entities[name];trades[name]))
)
The filter expression ensures that only amounts are shown where the ccy of a trade is not equal to the "home" ccy of the entity.
Here, I'm getting the desired result.
Now I need to do the same with the inactive relation (ccy2).
This is what I tried:
Sum_Hedges_in-activeRelation:=
CALCULATE(
SUM([Amount_ccy2]);
USERELATIONSHIP(trades[ccy2];ISO_ccy[ccy]);
FILTER(trades;trades[ccy2]<>LOOKUPVALUE(entities[ccy];entities[name];trades[name]))
)
However, I only get an "empty" result.
I also tried to add "ALL(trades)" to the CALCULATE function. No results there as well.
So, I am a bit at a loss now, how I can make this work. Can you please help?
UPDATE 08 April 2019 with a solution:
I found a solution to my problem here:
sqlbi: USERELATIONSHIP in a Measure
Now my forumlar looks like this:
Sum_Hedges_in-activeRelation:=
CALCULATE(
CALCULATE(
SUM([Amount_ccy2]);
FILTER(trades;trades[ccy2]
<>LOOKUPVALUE(entities[ccy];entities[name];trades[name]))
);
USERELATIONSHIP(trades[ccy2];ISO_ccy[ccy])
)
This is slightly different from the solution provided (for a column related context) in the referenced article, as I omitted the ALL() instruction in the outer CALCULATE(). I could not explain, though...

How to add an non aggregated column in a fact table?

I'm working on a SSAS cube to allow user to analyze some sales.
So, I created a fact table to record all sales and few dimensions to browse inside data (category / location & store, etc...).
This is a example of the fact table output (from SQL Server Management Studio) :
When I browse my cube, I can review all sales including date, quantity, etc.
However, when I add some fields like the "unit price" or the "unit cost", it returns me a strange result probably due to an aggregate behavior.
It seems it return the sum of all matching rows (aggregateFunction property).
How to simply display the unit price of a sale without apply any calculation to the unit price column. The None value for the AttributeFunction display BLANK/NULL.
If your unitCost and unitPrice are the same for each product (I mean unitCost can be only 77.6 for product_id = 2), you can just use average (or even emulate average by SUM/COUNT) - but only for product dimension on axis!
Another dimensions will show real average values.
Maybe it's better to use this 'static' fields like attribute properties in 'Product' dimension? But you still need to add some logic to choose one value for several (or all) product members selected.

inserting an artificial column in mdx query

from some reasons I need to insert an artificial(dummy) column into a mdx expression. (the reason is that i need to obtain a query with specific number of columns )
to ilustrate, this is my sample query:
SELECT {[Measures].[AFR],[Measures].[IB],[Measures].[IC All],[Measures].[IC_without_material],[Measures].[Nonconformance_PO],[Measures].[Nonconformance_GPT],[Measures].[PM_GPT_Weighted_Targets],[Measures].[PM_PO_Weighted_Targets], [Measures].[AVG_LC_Costs],[Measures].[AVG_MC_Costs]} ON COLUMNS,
([dim_ProductModel].[PLA].&[SME])
* ORDER( {([dim_ProductModel].[Warranty Group].children)} , ([Measures].[Nonconformance_GPT],[Dim_Date].[Date Full].&[2014-01-01]) ,desc)
* ([dim_ProductModel].[PLA Text].members - [dim_ProductModel].[PLA Text].[All])
* {[Dim_Date].[Date Full].&[2013-01-01]:[Dim_Date].[Date Full].&[2014-01-01]} ON ROWS
FROM [cub_dashboard_spares]
it is not very important, just some measures and crossjoined dimensions. Now I would need to add f.e. 2 extra columns, I don't care whether this would be a measure with null/0 values or another crossjoined dimension. Can I do this in some easy way without inserting any data into my cube?
In sql I can just write Select 0 or select "dummy1", but here it is not possible neither in ON ROWS nor in ON COLUMNS part of the query.
Thank you very much for your help,
Regards,
Peter
ps: so far I could just insert some measure more times, but I am interested whether there is a possibility to insert really "dummy" column
Your query just has the measures dimension on columns. The easiest way to extend it by some columns would be to repeat the last measure as many times that you get the correct number of columns.
Another possibility, which may be more efficient in case the last measure is complex to calculate would be to use
WITH member Measures.dummy as NULL
SELECT {[Measures].[AFR],[Measures].[IB],[Measures].[IC All],[Measures].[IC_without_material],[Measures].[Nonconformance_PO],[Measures].[Nonconformance_GPT],[Measures].[PM_GPT_Weighted_Targets],[Measures].[PM_PO_Weighted_Targets], [Measures].[AVG_LC_Costs],[Measures].[AVG_MC_Costs],
Measures.dummy, Measures.dummy, Measures.dummy
}
ON COLUMNS,
([dim_ProductModel].[PLA].&[SME])
* ORDER( {([dim_ProductModel].[Warranty Group].children)} , ([Measures].[Nonconformance_GPT],[Dim_Date].[Date Full].&[2014-01-01]) ,desc)
* ([dim_ProductModel].[PLA Text].members - [dim_ProductModel].[PLA Text].[All])
* {[Dim_Date].[Date Full].&[2013-01-01]:[Dim_Date].[Date Full].&[2014-01-01]}
ON ROWS
FROM [cub_dashboard_spares]
i. e. adding a dummy measure that should not need much computation as many times as you need it to the end of the columns.

DB Design: Sort Order for Lookup Tables

I have an application where the database back-end has around 15 lookup tables. For instance there is a table for Counties like this:
CountyID(PK) County
49001 Beaver
49005 Cache
49007 Carbon
49009 Daggett
49011 Davis
49015 Emery
49029 Morgan
49031 Piute
49033 Rich
49035 Salt Lake
49037 San Juan
49041 Sevier
49043 Summit
49045 Tooele
49049 Utah
49051 Wasatch
49057 Weber
The UI for this app has a number of combo boxes in various places for these lookup tables, and my client has asked that the boxes list in this case:
CountyID(PK) County
49035 Salt Lake
49049 Utah
49011 Davis
49057 Weber
49045 Tooele
'The Rest Alphabetically
The best plan I have for accomplishing this is to add a column to each lookup table for SortOrder(numeric). I had a colleague tell me he thought that would cause the tables to violate 3rd-Normal-Form, but I think the sort order still depends on the key and only the key (even though the rest of the list is alphabetical).
Is adding the SortOrder column the best way to do this, or is there a better way I am just not seeing?
I agree with #cletus that a sort order column is a good way to go and it does not violate 3NF (because, as you said, the sort order column entries are functionally dependent on the candidate keys of the table).
I'm not sure I agree that alphanumeric is better than numeric. In the specific case of counties, there are seldom new ones created. But there is no requirement that the numbers assigned are sequential; you can allocate them with numbers that are a multiple of a hundred, for example, leaving ample room for insertions.
Yes I agree a sort order column is the best solution when the requirements call for a custom sort order like the one you cite. I wouldn't go with a numeric column however. If the data is alphanumeric, the sort order should be alphanumeric. That way you can seed the value with whatever is in the county field.
If you use a numeric field you'll have to resequence the entire table (potentially) whenever you add a new entry. So:
Columns: ID, County, SortOrder
Seed:
UPADTE County SET SortOrder = CONCAT('M-', County)
and for the special cases:
UPDATE County
SET SortOrder = CONCAT('E-' . County)
WHERE County IN ('Salt Lake', 'Utah', 'Davis', 'Weber', 'Tooele')
Arguably you may want to put another marker column in to indicate those entries are special.
I went with numeric and large multiples.
Even with the CONCAT('E-'.. example, I don't get the required sort order. That would give me Davis, SL, Tooele... and Salt Lake needs to be first.
I ended up using multiples of 10 and assigned the non-special-sort entries a value like 10000. That way the view for each lookup can have
ORDER BY SortOrder ASC, OtherField ASC
Another programmer suggested using DECODE in Oracle, or CASE statements in SQL Server, but this is a more general solution. YMMV.

Resources