Distinct count over multiple dimensions - sql-server

I am completely new to MDX.
I have to implement a set of indicators. Example:
Discharges, for patients ages 18 years and older with either:
- any-listed ICD-9-CM procedure codes for esophageal resection; or
- any-listed ICD-9-CM procedure codes for gastrectomy and any-listed ICD-9­ CM diagnosis codes for esophageal cancer. Link to complete spec here.
Thus I think I need to create a measurement (lets call is countX) that counts the number of facts (discharges) that belong to some computed set OR that belong to some other computed set. If a fact belongs in both sets, then it should only be counted once.
A set definition may contain crossjoins and filters over multiple dimensions.
The idea is then to be able to slice countX along any dimensions (ideally including those used to compute the sets).
I already learned that the UNION operator only works when joining sets over the same dimension. So, is my approach feasible to do using MDX? Maybe I can formulate the problem in a different way to somehow make use of calculated members or so?
Or would it be better to create specific fact or dimension tables populated with the correct information using SQL?
Thanks

You can UNION two sets of different dimensionality by creating tuples that use both dimensions, but basically ignore one in one case, and the other in the other case.
I don't know your cube or your data, so I'm going to use very simple psuedocode. Say you want to get all widgets that are red and all widgets that are small.
Another way to think is: I want all widgets that are red, regardless of their size, and I want all widgets that are small, regardless of their color.
So psuedo-MDX for expressing this would be:
({[dimColor].[&Red]}, {[dimSize].[All]})
+
({[dimColor].[All]}, {[dimSize].[&Small]})
Get the specific member(s) you want from Dimension A, CrossJoined with ALL members of Dimension B. And then you can UNION that with ALL members of Dimension A, CrossJoined with the specific member(s) you want from Dimension B, because you have met UNION's requirement that the two sets have the same dimensionality.

Related

Excel Dynamic Array Running Count of Duplicates

I've been retooling some older spreadsheet tools for filtering and formatting dynamic data outputs using Excel's newer Dynamic Array Formulas functionality. This has been helping remove some of the need for pre-allocating cells and lower amounts of helper columns (which has allowed for reduced file sizes and snappier performance).
One function type I am struggling to replace is pulling out dynamic, running duplicate counts.
For instance, say I have a column B of 20 names that can vary in length from a handful to say 200 names. There is also related data in columns C, D, etc that similarly varies in size. For use of filtering the Data in the later columns, we currently use a helper column in A consisting of the running count of the duplicates in A with a formula using semi-anchored ranges(ie. Beginning the range with an anchored cell that expands as the formula is copied down the helper column akin to the solution here with CountIf() and a semi-anchored range). The drawback here vs the new dynamic array formulas is that the helper column needs to be pre-allocated for the data.
Despite attempts with Index(), Aggregate(), Filter(), and a few more involved notations like Sumproduct(--(...)), the most straightforward method I can find to make helper column A seems to be by creating the running count via semi-anchored ranges, which unfortunately does not seem to translate well to the new dynamic array Formulas.
Has anyone had any luck adapting the use of semi-anchored ranges and formulas for use in dynamic array formulas?
To use the dynamic array formula we need to use OFFSET which is volatile.
=COUNTIFS(OFFSET(A1,0,0,SEQUENCE(COUNTA(A1#))),A1#)
Appreciate this is an old post, but for future reference (I personally at least couldn't find an answer elsewhere), the below seems to work as a non-volatile formula alternative.
=LET(InputArray,A1#,
RowCount,ROWS(InputArray),
Temp,1*(InputArray=TRANSPOSE(InputArray)),
MMULT(TRANSPOSE(IF(SEQUENCE(RowCount,1)>SEQUENCE(1,RowCount),0,Temp)),SEQUENCE(RowCount,1,1,0)))

Can I make an array out of a range of countif functions?

A truncated version of my data is in the form shown in the screenshot below: three columns of 5 unique names. The names appear in any order and in any position but never repeat in a single row.
My goal is to create an array that contains the number of times Adam appears in each row. I can fill down the formula=countif(A2:C2,$I$2) in a new column, or if I write the array manually for each row, it looks like:
={countif(A2:C2,$I$2);countif(A3:C3,$I$2);countif(A4:C4,$I$2);countif(A5:C5,$I$2);countif(A6:C6,$I$2)}
Where cell I2 contains "Adam". Of course, this is not feasible for large data sets.
I know that arrays are effectively cells turned into ranges, but my main issue is that the cell I'm trying to transform already references a range, and I don't know how to tell the software to apply the countif down each row (i.e. I intuitively would like to do something like countif((A2:C2):(A99:C99),"Adam") but understand that's not how spreadsheets work).
My goal is ultimately to perform some operations on the corresponding array but I think I'm comfortable enough with that once I can get the array formula I'm looking for.
try:
=ARRAYFORMULA(IF(A2:A="",,MMULT(IF(A2:C="Adam", 1, 0), {1;1;1})))

How to use index match (array formula) to return corresponding values from a drop down list?

Excel Screenshot
Excel Screenshot with Formulas
I have attached photos to show an idea of what I am trying to do. Basically, I have a very large list of features that are shared between certain groups. I want to use a drop down list of the features, and then have a formula that will output the group that has the lowest cost of that feature along with the cost of that feature within the group.
(Also you will see that I purposefully ignore zero values. I do this because not every group has a certain feature and those cells default to zero).
I figured out how to get the cost of the feature to output, but I'm having trouble getting to output the group name. I am assuming there will be an array formula to do this, but I am just starting to learn those and I'm having trouble with this one.
Well you could always use the same approach you used to pull in the value, by pulling in the index of the column heading that matches the computed min, and using an offset function to match on the right row:
=+INDEX($B$1:$D$1,MATCH($B$10,OFFSET($B$1:$D$1,MATCH($A$7,$A$2:$A$4,0),0),0))
The thing is, I'm not sure how you would want to handle ties, if 2 vendors had the same price, this would match the first one in the list.

Array multiplication in Excel

In my excel document I have two sheets. The first is a data set and the second is a matrix of the relationship between two of the variables in my data set. Each possibility of the variable is a column in my matrix. I'm trying to get the sum of the products of the elements in two different arrays. Right now I'm using the formula {=SUM(N3:N20 * F3:F20)} and manually changing the columns each time. But my data set is over 800 items...
Ideally I'd like to know how to write a program that reads the value of the variable in my dataset looks up the correct columns in the matrix, multiplies them together, sums the products, and puts the result in the correct place in my data set. However, just knowing the result for all the possible combinations of columns would also save me alot of time. Its an 18x18 matrix. Thanks for any feedback!
Your question is a little bit ambiguous but as far as i understand your question you want to multiply different sets of two columns in the same sheet and put their result into the next sheet, is it so? if so, please post images of your work (all sheets). Your answer is possible even in Excel only without any vba code, thanks.
you can also use =SUMPRODUCT(N3:N20,F3:F20) for your formula instead of {=SUM(N3:N20 * F3:F20)}

TI Process - how to merge two dimensions into a new one?

I have two dimensions - Invoice_In and Invoice_Out. I need to create a new dimension Invoice which combines both of these. Is there any easy way to do this with a TI process (or any other way using TI or Performance Modeler)? Thanks.
Have you consulted the Reference Guide (TM1 TurboIntegrator Functions chapter) about this?
You could use the All subsets of the two dimensions as a data source and iterate through both in the Metadata tab using two processes (or a master process which calls the same process and passes it parameters) but it would be just as easy (and more importantly you could keep it in one process by) doing this in the Prolog tab with a data source of None:
Use DimensionExists as an argument to an If() block to determine
whether the dimension Invoice exists;
If it doesn't, use DimensionCreate to create it. Add any consolidations that you want to add using DimensionElementInsert statements.
Use the DimSiz Rules function to get the number of elements in Invoice_In and Invoice_Out and store both in variables;
Your first loop iterates through InvoiceIn using a While block to count from 1 to the DimSiz value.
In your loop you would obtain the existing element using DimNm(). (You will also need to use ElLev or DType if you want to obtain only the N level elements.) You insert each element into Invoice through DimensionElementInsert. You may also need to use DimensionElementComponentAdd to add it to any top level consolidation.
Your second loop would do exactly the same but for Invoice_Out.
Where you may run into issues is if you have the same element names in both dimensions. DimensionElementInsert won't spit the dummy over that but it will ignore the insertion when it's encountered the second time.
Do NOT call any other processes which are intended to refer to this new dimension in the Prolog. You need to cross the Metadata boundary to ensure that the new dimension is registered with the server.
Export both Elements, copy and paste both list into one sheet.
Use the sheet as a source then use one line of code DimensionElementInsert in your TI.
DimensionElementInsert(DimName, InsertionPoint, ElName, ElType);
Alternatively, use the existing dimensions as a source. Then you don't need to construct a file.
You can set the datasourcename and cycle through N amount of dimensions.
(note: The new dimension needs to exist. Or you can create a new dimension within your TI. Depends how much you want to code. But I gave you the solution with the least coding).

Resources