Counting rows in a table based on multiple array criterias - arrays

I am trying to count rows in a table based on multiple criteria in different columns of that table. The criteria are not directly in the formula though; they are arrays which I would like to refer to (and not list them in the formula directly).
Range table example:
Group Status
a 1
b 4
b 6
c 4
a 6
d 5
d 4
a 2
b 2
d 3
b 2
c 1
c 2
c 1
a 4
b 3
Criteria/arrays example:
group
a
b
status
1
2
3
I am able to do this if i only have one array search through a range (1 column) in that table:
{=SUM(COUNTIFS(data[Group],group[group]))}
Returns "9" as expected (=sum of rows in the group column of the data table which match any values in group[group])
But if I add a second different range and a different array I get an incorrect result:
{=SUM(COUNTIFS(data[Group],group[group], data[Status],status[status]))}
Returns "3" but should return "5" (=sum of rows which have 1, 2 or 3 in the status column and a or b in the group column)
I searched and googled for various ideas related to using sumproduct or defining arrays within the formula instead of classifying the whole formula as an array but I was not able to get expected results via those means.
Thank you for your help.

Because your group and status criteria are a different number of values (2 values for group, but 3 values for status), I'm not sure you can do this in a single formula. Best way I know of to do this would be to use a helper column (which can be hidden if preferred).
Put this array formula in a helper column and copy down the length of your data (array formulas must be confirmed with Ctrl+Shift+Enter):
=AND(OR(data[#Group]=group[group]),OR(data[#Status]=status[status]))
And then get the count with: =COUNTIF(helpercolumn,TRUE)

You could use a slightly different approach, using Power Query / Power Pivot.
Name your tables Data, Group and Status, then create the following query, named Filtered Data:
let
tbData = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
tbGroup = Excel.CurrentWorkbook(){[Name="Group"]}[Content],
tbStatus = Excel.CurrentWorkbook(){[Name="Status"]}[Content],
#"Merged Group" = Table.NestedJoin(tbData,{"Group"},tbGroup,{"Group"},"tbGroup",JoinKind.Inner),
#"Merged Status" = Table.NestedJoin(#"Merged Group",{"Status"},tbStatus,{"Status"},"Merged Status",JoinKind.Inner),
#"Removed Columns" = Table.RemoveColumns(#"Merged Status",{"tbGroup", "Merged Status"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Status", type number}})
in
#"Changed Type"
Load To as connection only, and tick Load to Data Model
Now create a DAX measure:
Status Sum:=SUM ( 'Filtered Data'[Status] )
You can then use the following formula on your worksheet, to get the Sum of Status values, for rows matching the criteria specified in the Group and Status tables:
=CUBEVALUE("ThisWorkbookDataModel","[Measures].[Status Sum]")
Simply refresh the data connection to update the value.

Related

Using the window function "last_value", when the values of the sorted field are same, the value snowflake returns is not the last value

As we all known, the window function "last_value" returns the last value within an ordered group of values.
In the following example, group by field "A" and sort by field "B" in positive order.
In the group of "A = 1", the last value is returned, which is, the C value 4 when B = 2.
However, in the group of "A = 2", the values of field "B" are the same.
At this time, instead of the last value, which is, the C value 4 in line 6, the first C value 1 in B = 2 is returned.
This puzzles me why the last value within an ordered group of values is not returned when I encounter the value I want to use for sorting.
Example
row_number
A
B
C
LAST_VALUE(C) IGNORE NULLS OVER (PARTITION BY A ORDER BY B ASC)
1
1
1
2
4
2
1
1
1
4
3
1
1
3
4
4
1
2
4
4
5
2
2
1
1
6
2
2
4
1
This puzzles me why the last value within an ordered group of values is not returned when I encounter the value I want to use for sorting.
For partition A equals 2 and column B, there is a tie:
The sort is NOT stable. To achieve stable sort a column or a combination of columns in ORDER BY clause must be unique.
To ilustrate it:
SELECT C
FROM tab
WHERE A = 2
ORDER BY B
LIMIT 1;
It could return either 1 or 4.
If you sort by B within A then any duplicate rows (same A and B values) could appear in any order and therefore last_value could give any of the possible available values.
If you want a specific row, based on some logic, then you would need to sort by all columns within the group to reflect that logic. So in your case you would need to sort by B and C
Good day Bill!
Right, the sorting is not stable and it will return different output each time.
To get stable results, we can run something like below
select
column1,
column2,
column3,
last_value(column3) over (partition by column1 order by
column2,column3) as column2_last
from values
(1,1,2), (1,1,1), (1,1,3),
(1,2,4), (2,2,1), (2,2,4)
order by column1;

How to show #ERROR as Zero or as Nothing in matrix cell when groups and calculated field are based on switch-function in SSRS?

Imagine simple BI task:
We have two grouping columns and one calculated field.
like
select group1, group2, sum(sales) Sum_of_sales from dbo.table1
group by group1, group2
When for some combination of grouping columns there's no ROW in dataset - report shows me Empty. It's ok.
Now i'm building matrix with groups depending on parametrs. Some sort of programmatic matrix. User deciedes what to analyze by himself. He chooses X_dimension and Y_dimension and target CalculatedField
Grouping expression for first group now looks like
=Switch(
Parameters!param1.Value = "var1", Fields!groupingColumn1.Value,
Parameters!param1.Value = "var2", Fields!groupingColumn2.Value)
And for second group
=Switch(
Parameters!param2.Value = "var3", Fields!groupingColumn3.Value,
Parameters!param2.Value = "var4", Fields!groupingColumn4.Value)
With target value:
=Switch(
Parameters!param3.Value = "var5", Count(Fields!sales_sum.Value),
Parameters!param3.Value = "var6", Sum(Fields!sales_sum.Value)
)
When there are sales in combinations of groups - it's ok. It shows me exact value - like i would sum it with calculator. But when THERE'S NO DATA for some partition - it just raises error in that cell. And it seems like no way out. Pls help.
P.S. I tried all possible variants such as
iif(sum(...) is Nothing, 0 , sum(...))
sum(iif(value > 0 , value , 0))
and combination of them
i tried comparing with ZERO, Nothing , and checking isNothing
but i still get error in cell

EXCEL lookup an a row / array based on a cell value

I would like to look up a row (an array) based on the DATE value, such that an array of price value (instead of a single return if using VLOOKUP) is returned for a given DATE value. Below is the data
Column A Column B Column C Column D
Row1 DATE Product A Product B Product C
Row2 1/1/2017 1 5 7
Row3 7/1/2017 3 6 5
Row4 13/1/2017 2 8 3
Thank you in advance
So one way to do this would be to make the lookup row a relative value. So assuming that your sample data is on Sheet1 and the list of dates you are looking up against is on Sheet2 Col A with a header. I would first make the lookup range a named range so that Sheet1 Col A thru Col D is named something like "Data". Then in B2 place the below formula and copy it to over to Col D and then all the way down the list of dates.
=vlookup($A2,Data,Column(B1),False)
The $ for A2 allows it to always look at column A even when copying over the formula. The Column(B1) returns a value of 2 but when you copy that formula to the left it will change to Column(c1), Column(d1)... thus changing what column of data you want returned.

MATLAB Extract all rows between two variables with a threshold

I have a cell array called BodyData in MATLAB that has around 139 columns and 3500 odd rows of skeletal tracking data.
I need to extract all rows between two string values (these are timestamps when an event happened) that I have
e.g.
BodyData{}=
Column 1 2 3
'10:15:15.332' 'BASE05' ...
...
'10:17:33:230' 'BASE05' ...
The two timestamps should match a value in the array but might also be within a few ms of those in the array e.g.
TimeStamp1 = '10:15:15.560'
TimeStamp2 = '10:17:33.233'
I have several questions!
How can I return an array for all the data between the two string values plus or minus a small threshold of say .100ms?
Also can I also add another condition to say that all str values in column2 must also be the same, otherwise ignore? For example, only return the timestamps between A and B only if 'BASE02'
Many thanks,
The best approach to the first part of your problem is probably to change from strings to numeric date values. In Matlab this can be done quite painlessly with datenum.
For the second part you can just use logical indexing... this is were you put a condition (i.e. that second columns is BASE02) within the indexing expression.
A self-contained example:
% some example data:
BodyData = {'10:15:15.332', 'BASE05', 'foo';...
'10:15:16.332', 'BASE02', 'bar';...
'10:15:17.332', 'BASE05', 'foo';...
'10:15:18.332', 'BASE02', 'foo';...
'10:15:19.332', 'BASE05', 'bar'};
% create column vector of numeric times, and define start/end times
dateValues = datenum(BodyData(:, 1), 'HH:MM:SS.FFF');
startTime = datenum('10:15:16.100', 'HH:MM:SS.FFF');
endTime = datenum('10:15:18.500', 'HH:MM:SS.FFF');
% select data in range, and where second column is 'BASE02'
BodyData(dateValues > startTime & dateValues < endTime & strcmp(BodyData(:, 2), 'BASE02'), :)
Returns:
ans =
'10:15:16.332' 'BASE02' 'bar'
'10:15:18.332' 'BASE02' 'foo'
References: datenum manual page, matlab help page on logical indexing.

VBA two dimensional arrays connecting to a database

I am working with vba in excel and a database in access. The access database is a table that contains 3 columns; OrderIDs which is a column of numbers saying what order the particular item was in, OrderDescription which is a column that contains the description of the item, and Item # which is a column that gives a number to each particular item (if the item is the same as another, they both are the same item).
I need to build a 2-dimensional array in excel using VBA holding which items were purchased in which orders. The rows will be the Order ID and the columns will be the Item ID. The elements of this array will contain an indicator (like True or a “1”) that indicates that this order contains certain items. For example, row 6 (representing order ID 6) will have “True” in columns 1, 5, and 26 if that order purchased item IDs 1, 5, and 26. All other columns for that order will be blank.
In order to do this, i think I will have to determine the max order number (39) and the max item number(33). This information is available in the database which I can connect to using a .connection and .recordset. Some order numbers and some item numbers may not appear.
Note also that this will likely be a sparse array (not many entries) as most orders contain only a few items. We do not care how many of an item a customer purchased, only that the item was purchased on this order.
MY QUESTION is how can I set up this array? I tried a loop that would assign the values of the order numbers to an array and the items numbers to an array and then dimensioning the array to those sizes, but it wont work.
is there a way to make an element of an array return a value of True if it exists?
Thanks for your help
It seems to me that the best bet may be a cross tab query run on an access connection. You can create your array with the ADO method GetRows : http://www.w3schools.com/ado/met_rs_getrows.asp.
TRANSFORM Nz([Item #],0)>0 AS Val
SELECT OrderNo
FROM Table
GROUP BY OrderNo
PIVOT [Item #]
With a Counter table containing integers from 1 to maximum number of items in a column (field) Num.
TRANSFORM First(q.Val) AS FirstOfVal
SELECT q.OrderNo
FROM (SELECT t.OrderNo, c.Num, Nz([Item #],0)>0 AS Val
FROM TableX t RIGHT JOIN [Counter] c ON t.[Item #] = c.Num
WHERE c.Num<12) q
GROUP BY q.OrderNo
PIVOT q.Num
Output:
OrderNo 1 2 3 4 5 6 7 8 9 10 11
0 0 0 0 0 0
1 -1 -1 -1 -1
2 -1 -1 -1 -1

Resources