MDX Union Across Different Dimensions - union

I would like to combine two MDX queries from separate dimensions on columns. Example:
Number of Sales/Product Type vs. Gender and State:
| CA | OR | WA | Male | Female
------------------------------------------------
food | 125 | 343 | 130 | 570 | 459
------------------------------------------------
drink | 123 | 465 | 135 | 678 | 343
State and Gender are their own respective dimensions, and I would like to do some aggregation (eg. sales count) across different product types (food, drink). Below is some idea of how this might work, although the queries can not be joined as they have different hierarchies. How might I go about tacking on male and female, for example, as columns in this result?
SELECT
NON EMPTY
{ [Store].[Store State].Members, [Gender].[Gender].Members } ON COLUMNS,
{ [Product].[Product Family].Members } ON ROWS
FROM [Sales]
WHERE { [Measures].[Sales Count] }
Example error:
MondrianEvaluationException: Expressions must have the same hierarchy
Is there a way to do this effectively in MDX,? If so, may I a specify specific aggregations for each colunm (eg. aggregate state data by total sales, gender data by profit).
Thank you for your help

In MDX, all axes must have the same dimensionality. The best approach would be to run two queries, and show them beside each other in the client tool.
However, you could do something similar to what #Vhteghem_Ph proposed:
SELECT
NON EMPTY
[Store].[Store State].Members * { [Gender].[Gender].[(All Gender)] }
+
{ [Store].[Store State].[(All Store State)] } * [Gender].[Gender].Members
ON COLUMNS,
{ [Product].[Product Family].Members } ON ROWS
FROM [Sales]
WHERE { [Measures].[Sales Count] }
Note that the + used here, which has two sets as parameters, is a short form of Union(set1, set2). And again, Union needs both sets to have the same dimensionality, in this case the first dimension of the set is the Store States hierarchy, and the second is the Gender hierarchy.

I don't know if the following will work with mondrian:
SELECT
{[Measures].[Internet Sales Amount]} ON 0
,NON EMPTY
{
(
[Customer].[Gender].[Gender].MEMBERS
,[Customer].[Marital Status].[(All)]
)
,(
[Customer].[Gender].[(All)]
,[Customer].[Marital Status].[Marital Status].MEMBERS
)
} ON 1
FROM [Adventure Works];
Philip

Related

Use excel to summarise data from a column by identifier

I have a spreadsheet with a column called MRN (the identifier) and the drugs administered next to them. There are duplicates of the MRN in column A that correspond to different courses of drugs. What I'm hoping to do is to summarise all the drugs administered associated with one MRN in one line, removing all duplicates. It looks something like this.
| | A | B |
| 1 | MRN Item
| 2 | 1 cefoTAXime
| 3 | 1 ampicillin
| 4 | 1 cefoTAXime
| 5 | 1 vancomycin
| 6 | 1 cefTRIaxone
| 7 | 2 ampicillin
| 8 | 2 vancomycin
| 9 | 2 vancomycin
I have 3 different formulas. The first is to produce a list of MRNs that are all unique. The second is to pull all drugs by MRN and list them in one line. The third is to remove duplicates from this list. They are below (in order).
{=IFERROR(INDEX($A$2:$A$2885, MATCH(0,COUNTIF(D$1:$D1, $A$2:$A$2885),0 )),"")}
{=INDEX($A$2:$B$2885,SMALL(IF($A$2:$A$2885=$D2,ROW($A$2:$A$2885)),COLUMN(D:D))-4,2)}
{=IFERROR(INDEX($E$2:$AE$2, MATCH(0,COUNTIF(D$3:$D3, $E$2:$AE$2),0 )),"")}
*I know that I can edit the second one by adding IF(ISERROR ...) to remove NA and print blanks if drug not found, but want to keep the formulas as simple as possible at this time.
My problem is that second formula isn't pulling all the drugs by MRN, and in an ideal world I would be able to combine the second and third formula into one, but I am not sure how to. Here is a link to a test file that shows my issue and the formulas in action.
https://1drv.ms/x/s!ApoCMYBhswHzhooXnumW2iV7yx-JaA
I appreciate that there may be a better way to do this using python/R, and if that's possible then I'm more than happy to try, but I couldn't make any headway. Thanks for your help and suggestions.
If you could deal with a count of the number of courses per drug per MRN, you can do this with Power Query (aka Get & Transform in Excel 2016)
Starting with the data you provided on your worksheet, the results would look like:
M-Code
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"MRN", Int64.Type}, {"Item", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"MRN"}, {{"Count", each _, type table}}),
#"Expanded Count" = Table.ExpandTableColumn(#"Grouped Rows", "Count", {"MRN", "Item"}, {"Count.MRN", "Count.Item"}),
#"Pivoted Column" = Table.Pivot(#"Expanded Count", List.Distinct(#"Expanded Count"[Count.Item]), "Count.Item", "Count.MRN", List.NonNullCount)
in
#"Pivoted Column"

Pivot Table with merged date fields

I have a source data sheet, each data item having two date fields, startDate and endDate. What I would like to to in excel is generate a pivot table with row headers for each date from either of these columns, and two summary columns, one for Count Started, the other Count Ended.
For example, the following source data:
ItemId | startDate | endDate
1 | 6/1/16 | 6/2/16
2 | 6/2/16 | 6/3/16
3 | 6/1/16 | 6/3/16
Would produce a pivot table like this:
Date | Started | Ended
6/1/16 | 2 | 0
6/2/16 | 1 | 1
6/3/16 | 0 | 2
I doubt I would choose a PivotTable solution for this (that's unlike me!) but I think possible with a PT:
1) Create a PT from multiple consolidation ranges (example here) with ranges A:B and A:C (assuming ItemID is in A1).
2) After 7. select ColumnsA:C (in the new sheet) and apply Remove Duplicates (with all Columns checked).
3) Create a new PT from what remains (Column for COLUMNS, Value for ROWS, Count of Row for VALUES)
4) Right-click on startDate, Move, and click on first option.
5) In PivotTable Options..., Totals & Filters uncheck both Grand Totals and in Layout & Format, Format, check For empty cells show and enter 0.
6) Adjust labels to suit.

using .where with an array of parameters [duplicate]

I have three tables offers, sports and the join table offers_sports.
class Offer < ActiveRecord::Base
has_and_belongs_to_many :sports
end
class Sport < ActiveRecord::Base
has_and_belongs_to_many :offers
end
I want to select offers that include a given array of sport names. They must contain all of the sports but may have more.
Lets say I have these three offers:
light:
- "Yoga"
- "Bodyboarding"
medium:
- "Yoga"
- "Bodyboarding"
- "Surfing"
all:
- "Yoga"
- "Bodyboarding"
- "Surfing"
- "Parasailing"
- "Skydiving"
Given the array ["Bodyboarding", "Surfing"] I would want to get medium and all but not light.
I have tried something along the lines of this answer but I get zero rows in the result:
Offer.joins(:sports)
.where(sports: { name: ["Bodyboarding", "Surfing"] })
.group("sports.name")
.having("COUNT(distinct sports.name) = 2")
Translated to SQL:
SELECT "offers".*
FROM "offers"
INNER JOIN "offers_sports" ON "offers_sports"."offer_id" = "offers"."id"
INNER JOIN "sports" ON "sports"."id" = "offers_sports"."sport_id"
WHERE "sports"."name" IN ('Bodyboarding', 'Surfing')
GROUP BY sports.name
HAVING COUNT(distinct sports.name) = 2;
An ActiveRecord answer would be nice but I'll settle for just SQL, preferably Postgres compatible.
Data:
offers
======================
id | name
----------------------
1 | light
2 | medium
3 | all
4 | extreme
sports
======================
id | name
----------------------
1 | "Yoga"
2 | "Bodyboarding"
3 | "Surfing"
4 | "Parasailing"
5 | "Skydiving"
offers_sports
======================
offer_id | sport_id
----------------------
1 | 1
1 | 2
2 | 1
2 | 2
2 | 3
3 | 1
3 | 2
3 | 3
3 | 4
3 | 5
4 | 3
4 | 4
4 | 5
Group by offer.id, not by sports.name (or sports.id):
SELECT o.*
FROM sports s
JOIN offers_sports os ON os.sport_id = s.id
JOIN offers o ON os.offer_id = o.id
WHERE s.name IN ('Bodyboarding', 'Surfing')
GROUP BY o.id -- !!
HAVING count(*) = 2;
Assuming the typical implementation:
offer.id and sports.id are defined as primary key.
sports.name is defined unique.
(sport_id, offer_id) in offers_sports is defined unique (or PK).
You don't need DISTINCT in the count. And count(*) is even a bit cheaper, yet.
Related answer with an arsenal of possible techniques:
How to filter SQL results in a has-many-through relation
Added by #max (the OP) - this is the above query rolled into ActiveRecord:
class Offer < ActiveRecord::Base
has_and_belongs_to_many :sports
def self.includes_sports(*sport_names)
joins(:sports)
.where(sports: { name: sport_names })
.group('offers.id')
.having("count(*) = ?", sport_names.size)
end
end
One way to do it is using arrays and the array_agg aggregate function.
SELECT "offers".*, array_agg("sports"."name") as spnames
FROM "offers"
INNER JOIN "offers_sports" ON "offers_sports"."offer_id" = "offers"."id"
INNER JOIN "sports" ON "sports"."id" = "offers_sports"."sport_id"
GROUP BY "offers"."id" HAVING array_agg("sports"."name")::text[] #> ARRAY['Bodyboarding','Surfing']::text[];
returns:
id | name | spnames
----+--------+---------------------------------------------------
2 | medium | {Yoga,Bodyboarding,Surfing}
3 | all | {Yoga,Bodyboarding,Surfing,Parasailing,Skydiving}
(2 rows)
The #> operator means that the array on the left must contain all the elements from the one on the right, but may contain more. The spnames column is just for show, but you can remove it safely.
There are two things you must be very mindful of with this.
Even with Postgres 9.4 (I haven't tried 9.5 yet) type conversion for comparing arrays is sloppy and often errors out, telling you it can't find a way to convert them to comparable values, so as you can see in the example I've manually cast both sides using ::text[].
I have no idea what the level of support for array parameters is Ruby, nor the RoR framework, so you may end-up having to manually escape the strings (if input by user) and form the array using the ARRAY[] syntax.

Can I set rules for string comparison in SQL? (or do I need to hardcode using CASE WHEN)

I need to make a comparison for ratings in two points in time and indicate if the change was upwards,downwards or stayed the same.
For example:
This would be a table with four columns:
ID T0 T0+1 Status
1 AAA AA Lower
2 BB A Higher
3 C C Same
However, this does not work when applying regular string comparison, because in SQL
A<B
B<BBB
I need
A>B
B<BBB
So my order(highest to lowest): AAA,AA,A,BBB,BB,B
SQL order(highest to lowest): BBB,BB,B,AAA,AA,A
Now I have 2 options in mind, but I wonder if someone know a better one:
1) Use CASE WHEN statements for all the possibilities of ratings going up and down ( I have more values than indictaed above)
CASE WHEN T0=T0+1 then 'Same'
WHEN T0='AAA' and To+1<>'AAA' then 'Lower'
....adress all other options for rating going down
ELSE 'Higher'
However, this generates a very large number of CASE WHEN statements.
2) My other option requires generating 2 tables. In table 1 I use case when statements to assign values/rank to the ratings.
For example:
CASE WHEN T0='AAA' then 6
CASE WHEN T0='AA' then 5
CASE WHEN T0='A' then 4
CASE WHEN T0='BBB' then 3
CASE WHEN T0='BB' then 2
CASE WHEN T0='B' then 1
The same for T0+1.
Then in table 2 I use a regular compariosn between column T0 and Column T0+1 on the numeric values.
However, I am looking for a solution where I can do it in one table (with as little lines as possible), and optimally never really show the ranking column.
I think a nested statement would be the best option, but it did now work for me.
Anybody has suggestions?
I use SQL Server 2008.
If you are using Credit Rating, this is very likely that this is not just about AAA > AA or BBB > BB.
Whether you are using one agency or another, it could also be AA+ or Aa1 for long term, F1+ for short term or something else in different contexts or with other agencies.
It is also often requiered to convert data from one agency to other agencies Rating.
Therefore it is better to use a mapping table such as:
Id | Rating
0 | AAA
1 | AA+
2 | AA
3 | AA-
4 | A+
5 | A
6 | A-
7 | BBB+
Using this table, you only have to join the rating in your data table with the rating in the mapping table:
SELECT d.Rating_T0, d.Rating_T1
CASE WHEN d.Rating_T0 = d.Rating_T1 THEN '='
WHEN m0.id < m1.id THEN '<'
WHEN m0.id > m1.id THEN '>'
END
FROM yourData d
INNER JOIN RatingMapping m0
ON m0.Rating= d.Rating_T0
INNER JOIN RatingMapping m1
ON m1.Rating= d.Rating_T1
If you only store the Rating id in you data table, you will not only save space (1 byte for tinyint versus up to 4 chars) but will also be able to compare without the JOIN to the mapping table.
SELECT d.Rating_Id0, d.Rating_Id1
CASE WHEN d.Rating_Id0 = d.Rating_Id1 THEN '='
WHEN d.Rating_Id0 < d.Rating_Id1 THEN '<'
WHEN d.Rating_Id0 > d.Rating_Id1 THEN '>'
END
FROM yourData d
The JOIN would only be requiered when you want to display the actual Rating value such as AAA for Rating_ID = 0.
You could also add an agency_Id to the Mapping table. This way, you can easily choose which Notation agency you want to display and easily convert between Agency 1 and Agency 2 or Agency 3 (ie. Id 1 => S&P and Id 2 => Fitch, Id 3 => ...)

How do I define a Calculated Measure in MDX based on a Dimension Attribute?

I would like to create a calculated measure that sums up only a specific subset of records in my fact table based on a dimension attribute.
Given:
Dimension
Date
LedgerLineItem {Charge, Payment, Write-Off, Copay, Credit}
Measures
LedgerAmount
Relationships
* LedgerLineItem is a degenerate dimension of FactLedger
If I break down LedgerAmount by LedgerLineItem.Type I can easily see how much is charged, paid, credit, etc, but when I do not break it down by LedgerLineItem.Type I cannot easily add the credit, paid, credit, etc into a pivot table. I would like to create separate calculated measures that sum only specific type (or multiple types) of ledger facts.
An example of the desired output would be:
| Year | Charged | Total Paid | Amount - Ledger |
| 2008 | $1000 | $600 | -$400 |
| 2009 | $2000 | $1500 | -$500 |
| Total | $3000 | $2100 | -$900 |
I have tried to create the calculated measure a couple of ways and each one works in some circumstances but not in others. Now before anyone says do this in ETL, I have already done it in ETL and it works just fine. What I am trying to do as part of learning to understand MDX better is to figure out how to duplicate what I have done in the ETL in MDX as so far I am unable to do that.
Here are two attempts I have made and the problems with them.
This works only when ledger type is in the pivot table. It returns the correct amount of the ledger entries (although in this case it is identical to [amount - ledger] but when I try to remove type and just get the sum of all ledger entries it returns unknown.
CREATE MEMBER CURRENTCUBE.[Measures].[Received Payment]
AS CASE WHEN ([Ledger].[Type].currentMember = [Ledger].[Type].&[Credit])
OR ([Ledger].[Type].currentMember = [Ledger].[Type].&[Paid])
OR ([Ledger].[Type].currentMember = [Ledger].[Type].&[Held Money: Copay])
THEN [Measures].[Amount - ledger]
ELSE 0
END
, FORMAT_STRING = "Currency"
, VISIBLE = 1
, ASSOCIATED_MEASURE_GROUP = 'Ledger' ;
This works only when ledger type is not in the pivot table. It always returns the total payment amount, which is incorrect when I am slicing by type as I would only expect to see the credit portion under credit, the paid portion, under paid, $0 under charge, etc.
CREATE MEMBER CURRENTCUBE.[Measures].[Received Payment]
AS sum({([Ledger].[Type].&[Credit]), ([Ledger].[Type].&[Paid])
, ([Ledger].[Type].&[Held Money: Copay])}
, [Measures].[Amount - Ledger])
, FORMAT_STRING = "Currency"
, VISIBLE = 1
, ASSOCIATED_MEASURE_GROUP = 'Ledger' ;
Is there any way to make this return the correct numbers regardless of whether Ledger.Type is included in my pivot table or not?
Try EXISTING:
CREATE MEMBER CURRENTCUBE.[Measures].[Received Payment]
AS sum(Existing({([Ledger].[Type].&[Credit]), ([Ledger].[Type].&[Paid])
, ([Ledger].[Type].&[Held Money: Copay])})
, [Measures].[Amount - Ledger])
, FORMAT_STRING = "Currency"
, VISIBLE = 1
, ASSOCIATED_MEASURE_GROUP = 'Ledger' ;
Should make it pay attention to the members in play.
Can't comment on Meff's answer, so I'll post my own.
You should consider using Aggregate instead of Sum, as results may not always be the ones you expect using Sum:
CREATE MEMBER CURRENTCUBE.[Measures].[Received Payment]
AS Aggregate(Existing({([Ledger].[Type].&[Credit]), ([Ledger].[Type].&[Paid])
, ([Ledger].[Type].&[Held Money: Copay])})
, [Measures].[Amount - Ledger])
, FORMAT_STRING = "Currency"
, VISIBLE = 1
, ASSOCIATED_MEASURE_GROUP = 'Ledger' ;
I am assuming the Ledger dimention has a key within the FactLedger but the issue that it doesn't rollup right with the first MDX calculated member leads me to believe that you may want to rethink the hierachies for this.
Then a simple SUM based on your ledger type would work, does that make sense?

Resources