SUM and COUNTIFS with multiple arrays? - arrays

I am trying to obtain a count of rows where specific criteria are met. I have searched on SO for an answer to this, but have been unable to locate a solution thus far (NOTE: I have found solutions to using SUM and COUNTIFS with one array, but not multiple).
Here is the formula I am using:
=SUM(COUNTIFS('Sheet1'!$AA:$AA,"<="&$B1,'Sheet1'!$AE:$AE,"="&"",'Sheet1'!$Q:$Q,"<>"&{"Value1","Value2","Value3","Value4"},'Sheet1'!$S:$S,"<>"&{"ValueA","Value B","ValueC","ValueD","ValueE"}))
The data is customer case data. Cell B1 is a week ending date. Column AA in Sheet 1 is the Case Open Date, and Column AE is the Case Closed Date. Column Q and S are filter criteria.
The expected count returned is a count of cases that were opened during the current week or earlier (Criteria 1) that are still open (Criteria 2) where the case Category (Column Q) is not one of the values in Array 1 (Criteria 3) and the case Type is not one of the values in Array 2 (Criteria 4). The result being returned is higher than expected (this was validated by do a manual filtering on the data set using these same criteria).
My guess is that double counting is going on, since the formula returns 1,828 cases for one week tested but manually filtered validation shows 1,241 (the expected result). Any thoughts or ideas on this is much appreciated.

The problem is that the arrayed count against the Value1, Value2, Value3, Value4 is being treated as an OR condition, not an AND condition. Even if Q2 has Value1 in it, it doesn't have Value2, Value3 or Value4 so the count is coming back as true. You need to make sure that Q2 has none of Value1, Value2, Value3, Value4 in it. The same goes for column S and ValueA, ValueB, ValueC, ValueD. This OR behavior works well when you are trying to see in the values are included but not when they are intended to be excluded.
    
An old-fashioned SUMPRODUCT function can handle the OR conditions.
        
The formula with array constants in B2 is,
=SUMPRODUCT((Sheet1!$AA$2:$AA$999<=$B1)*SIGN(Sheet1!$AA$2:$AA$999)*(Sheet1!$AE$2:$AE$999="")*ISERROR(MATCH(Sheet1!$Q$2:$Q$999, {"Value1","Value2","Value3","Value4"}, 0))*ISERROR(MATCH(Sheet1!$S$2:$S$999, {"ValueA","Value B","ValueC","ValueD","ValueE"}, 0)))
The alternate formula in B3 using the values from the ranges in D4:D7 and E4:E8 is,
=SUMPRODUCT((Sheet1!$AA$2:$AA$999<=$B1)*SIGN(Sheet1!$AA$2:$AA$999)*(Sheet1!$AE$2:$AE$999="")*ISERROR(MATCH(Sheet1!$Q$2:$Q$999, $D$4:$D$7, 0))*ISERROR(MATCH(Sheet1!$S$2:$S$999, $E$4:$E$8, 0)))

Related

Snowflake: Trouble getting numbers to return from a PIVOT function

I am moving a query from SQL Server to Snowflake. Part of the query creates a pivot table. The pivot table part works fine (I have run it in isolation, and it pulls numbers I expect).
However, the following parts of the query rely on the pivot table- and those parts fail. Some of the fields return as a string-type. I believe that the problem is Snowflake is having issues converting string data to numeric data. I have tried CAST, TRY_TO_DOUBLE/NUMBER, but these just pull up 0.
I will put the code down below, and I appreciate any insight as to what I can do!
CREATE OR REPLACE TEMP TABLE ATTR_PIVOT_MONTHLY_RATES AS (
SELECT
Market,
Coverage_Mo,
ZEROIFNULL(TRY_TO_DOUBLE('Starting Membership')) AS Starting_Membership,
ZEROIFNULL(TRY_TO_DOUBLE('Member Adds')) AS Member_Adds,
ZEROIFNULL(TRY_TO_DOUBLE('Member Attrition')) AS Member_Attrition,
((ZEROIFNULL(CAST('Starting Membership' AS FLOAT))
+ ZEROIFNULL(CAST('Member Adds' AS FLOAT))
+ ZEROIFNULL(CAST('Member Attrition' AS FLOAT)))-ZEROIFNULL(CAST('Starting Membership' AS FLOAT)))
/ZEROIFNULL(CAST('Starting Membership' AS FLOAT)) AS "% Change"
FROM
(SELECT * FROM ATTR_PIVOT
WHERE 'Starting Membership' IS NOT NULL) PT)
I realize this is a VERY big question with a lot of moving parts... So my main question is: How can I successfully change the data type to numeric value, so that hopefully the formulas work in the second half of the query?
Thank you so much for reading through it all!
EDITED FOR SHORTENING THE QUERY WITH UNNEEDED SYNTAX
CAST(), TRY_TO_DOUBLE(), TRY_TO_NUMBER(). I have also put the fields (Starting Membership, Member Adds) in single and double quotation marks.
Unless you are quoting your field names in this post just to highlight them for some reason, the way you've written this query would indicate that you are trying to cast a string value to a number.
For example:
ZEROIFNULL(TRY_TO_DOUBLE('Starting Membership'))
This is simply trying to cast a string literal value of Starting Membership to a double. This will always be NULL. And then your ZEROIFNULL() function is turning your NULL into a 0 (zero).
Without seeing the rest of your query that defines the column names, I can't provide you with a correction, but try using field names, not quoted string values, in your query and see if that gives you what you need.
You first mistake is all your single quoted columns names are being treated as strings/text/char
example your inner select:
with ATTR_PIVOT(id, studentname) as (
select * from values
(1, 'student_a'),
(1, 'student_b'),
(1, 'student_c'),
(2, 'student_z'),
(2, 'student_a')
)
SELECT *
FROM ATTR_PIVOT
WHERE 'Starting Membership' IS NOT NULL
there is no "starting membership" column and we get all the rows..
ID
STUDENTNAME
1
student_a
1
student_b
1
student_c
2
student_z
2
student_a
So you need to change 'Starting Membership' -> "Starting Membership" etc,etc,etc
As Mike mentioned, the 0 results is because the TRY_TO_DOUBLE always fails, and thus the null is always turned to zero.
now, with real "string" values, in real named columns:
with ATTR_PIVOT(Market, Coverage_Mo, "Starting Membership", "Member Adds", "Member Attrition") as (
select * from values
(1, 10 ,'student_a', '23', '150' )
)
SELECT
Market,
Coverage_Mo,
ZEROIFNULL(TRY_TO_DOUBLE("Starting Membership")) AS Starting_Membership,
ZEROIFNULL(TRY_TO_DOUBLE("Member Adds")) AS Member_Adds,
ZEROIFNULL(TRY_TO_DOUBLE("Member Attrition")) AS Member_Attrition
FROM ATTR_PIVOT
WHERE "Starting Membership" IS NOT NULL
we get what we would expect:
MARKET
COVERAGE_MO
STARTING_MEMBERSHIP
MEMBER_ADDS
MEMBER_ATTRITION
1
10
0
23
150

google sheet : How to vlookup by matching value in between max and min?

I have 2 sheets like this :
In that 2nd sheet, i want to lookup the id (S/M/L/XL) by checking if value is in between the Min value and Max value. So the goal is to have something like that 2nd sheet where in 'level' column, i need to put a formula so it will check the value is between what range so then it will retrieve the correct 'id' from 1st sheet.
the rule is if the value is >= minvalue and < max value
How can i do this ?
Thanks
use:
=INDEX(IF(A9:A="",,VLOOKUP(A9:A, {C2:C5, A2:A5}, 2, 1)))
Your first table, has overlapping values, so I suggest you think better about the rules you want to apply.
For example, 1, according your table can match both "S" and "M" , same for 3, which can be "M" or "L".
Once you have resolved that, you can use the QUERY function.
Example:
=QUERY($A$2:$D$5,
"select A,D where C<="&A2&" AND D >="&A2&" ORDER BY D DESC LIMIT 1 ")
Working solution can be found here:
https://docs.google.com/spreadsheets/d/1oLVwQqihT_df2y_ZQnfx7By77HnKSFz0bcbOzMuWqOM/edit?usp=sharing
Rather than have min and max columns, you could just use one column to list incremental values that determine the next size, and use vlookup() with a sort option of true - this avoids overlapping values:
=arrayformula({"level";if(A2:A<>"",VLOOKUP(A2:A,{Source!C:C,Source!A:A},2,1),)})

Weighted Average w/ Array Formula & Query That Pulls From A Separate Sheet

Link To Sheet
So I've got an array formula which I've included below. I need to adjust this so that it becomes a weighted average based on variables stored on a sheet titled Variables.
Current Formula:
=ARRAYFORMULA(QUERY(
{PROPER(ADP!A3:A),ADP!E3:S;
PROPER(ADP!J3:J),ADP!S3:S;
PROPER(ADP!Z3:Z),ADP!AG3:AG},
"select Col1, Sum(Col2)
where
Col2 is not null and
Col1 is not null
group by Col1
order by Sum(Col2)
label
Col1 'PLAYER',
Sum(Col2) 'ADP AVG'"))
Here's what I thought would work but doesn't:
=ARRAYFORMULA(QUERY(
{PROPER(ADP!A3:A),ADP!E3:E*(Variables!$F$11/Variables!$F$14);
PROPER(ADP!J3:J),ADP!S3:S*(Variables!$F$12/Variables!$F$14);
PROPER(ADP!Z3:Z),ADP!AG3:AG*(Variables!$F$13/Variables!$F$14)},
"select Col1, Sum(Col2)
where
Col2 is not null and
Col1 is not null
group by Col1
order by Sum(Col2)
label
Col1 'PLAYER',
Sum(Col2) 'ADP AVG'"))
What I'm trying to get is the value pulled in K to be multiplied by the value in VariablesF11, the value pulled in Y to be multiplied by VariablesF12, and the value in AL multiplied by the variables in F13. And have that numerator divided by the value in VariablesF14.
After our extensive chat, I'm providing here the answer we came up with, just on the chance it might somehow help someone else. But the issue in your case was less about the technicalities of the formula, and more about the structuring of multiple data sources, and the associated logic to pull the data together.
Here is the main formula:
={"Adjusted
Ranking
by " & Variables!F21;
arrayformula(
if(A2:A<>"",
( if(((D2:D>0) * Source1Used),D2:D,Variables!$F$21)*Variables!$F$12
+ if(((F2:F>0) * Source2Used),F2:F,Variables!$F$21)*Variables!$F$13
+ if(((H2:H>0) * Source3Used),H2:H,Variables!$F$21)*Variables!$F$14
+ if(((J2:J>0) * Source4Used),J2:J,Variables!$F$21)*Variables!$F$15
+ if(((L2:L>0) * Source5Used),L2:L,Variables!$F$21)*Variables!$F$16
+ if(((N2:N>0) * Source6Used),N2:N,Variables!$F$21)*Variables!$F$17 )) / Variables!$F$18) }
A2:A is the list of players' names. The D2:D>0 is a test of whether that player has a rating obtained from a particular data source.
Source1Used is a named range for a tickbox cell, where the user can indicate whether that data source is to be included in the calculations.
This formula creates an average value, using from 1 to 6 possible sources, user selectable.
The formula that gave the rating value for one specific source is as follows:
={"Rating in
Source1";ArrayFormula(if(A2:A<>"",if(C2:C,vlookup(A2:A,indirect("ADP!$" & ADP!E3 & "$10:" & ADP!E5),ADP!E6-ADP!E4+1,0),0),""))}
This takes a name in column A, checks if it is listed in a specific source's data, and if so, it pulls back the rating value from the data source. INDIRECT is used since the column locations for each data source may vary, but are obtained from a fixed table, in cells ADP!E3 and E5. E4 and E6 are the numeric values of the column letters.

Counting rows in a table based on multiple array criterias

I am trying to count rows in a table based on multiple criteria in different columns of that table. The criteria are not directly in the formula though; they are arrays which I would like to refer to (and not list them in the formula directly).
Range table example:
Group Status
a 1
b 4
b 6
c 4
a 6
d 5
d 4
a 2
b 2
d 3
b 2
c 1
c 2
c 1
a 4
b 3
Criteria/arrays example:
group
a
b
status
1
2
3
I am able to do this if i only have one array search through a range (1 column) in that table:
{=SUM(COUNTIFS(data[Group],group[group]))}
Returns "9" as expected (=sum of rows in the group column of the data table which match any values in group[group])
But if I add a second different range and a different array I get an incorrect result:
{=SUM(COUNTIFS(data[Group],group[group], data[Status],status[status]))}
Returns "3" but should return "5" (=sum of rows which have 1, 2 or 3 in the status column and a or b in the group column)
I searched and googled for various ideas related to using sumproduct or defining arrays within the formula instead of classifying the whole formula as an array but I was not able to get expected results via those means.
Thank you for your help.
Because your group and status criteria are a different number of values (2 values for group, but 3 values for status), I'm not sure you can do this in a single formula. Best way I know of to do this would be to use a helper column (which can be hidden if preferred).
Put this array formula in a helper column and copy down the length of your data (array formulas must be confirmed with Ctrl+Shift+Enter):
=AND(OR(data[#Group]=group[group]),OR(data[#Status]=status[status]))
And then get the count with: =COUNTIF(helpercolumn,TRUE)
You could use a slightly different approach, using Power Query / Power Pivot.
Name your tables Data, Group and Status, then create the following query, named Filtered Data:
let
tbData = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
tbGroup = Excel.CurrentWorkbook(){[Name="Group"]}[Content],
tbStatus = Excel.CurrentWorkbook(){[Name="Status"]}[Content],
#"Merged Group" = Table.NestedJoin(tbData,{"Group"},tbGroup,{"Group"},"tbGroup",JoinKind.Inner),
#"Merged Status" = Table.NestedJoin(#"Merged Group",{"Status"},tbStatus,{"Status"},"Merged Status",JoinKind.Inner),
#"Removed Columns" = Table.RemoveColumns(#"Merged Status",{"tbGroup", "Merged Status"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Status", type number}})
in
#"Changed Type"
Load To as connection only, and tick Load to Data Model
Now create a DAX measure:
Status Sum:=SUM ( 'Filtered Data'[Status] )
You can then use the following formula on your worksheet, to get the Sum of Status values, for rows matching the criteria specified in the Group and Status tables:
=CUBEVALUE("ThisWorkbookDataModel","[Measures].[Status Sum]")
Simply refresh the data connection to update the value.

MATLAB Extract all rows between two variables with a threshold

I have a cell array called BodyData in MATLAB that has around 139 columns and 3500 odd rows of skeletal tracking data.
I need to extract all rows between two string values (these are timestamps when an event happened) that I have
e.g.
BodyData{}=
Column 1 2 3
'10:15:15.332' 'BASE05' ...
...
'10:17:33:230' 'BASE05' ...
The two timestamps should match a value in the array but might also be within a few ms of those in the array e.g.
TimeStamp1 = '10:15:15.560'
TimeStamp2 = '10:17:33.233'
I have several questions!
How can I return an array for all the data between the two string values plus or minus a small threshold of say .100ms?
Also can I also add another condition to say that all str values in column2 must also be the same, otherwise ignore? For example, only return the timestamps between A and B only if 'BASE02'
Many thanks,
The best approach to the first part of your problem is probably to change from strings to numeric date values. In Matlab this can be done quite painlessly with datenum.
For the second part you can just use logical indexing... this is were you put a condition (i.e. that second columns is BASE02) within the indexing expression.
A self-contained example:
% some example data:
BodyData = {'10:15:15.332', 'BASE05', 'foo';...
'10:15:16.332', 'BASE02', 'bar';...
'10:15:17.332', 'BASE05', 'foo';...
'10:15:18.332', 'BASE02', 'foo';...
'10:15:19.332', 'BASE05', 'bar'};
% create column vector of numeric times, and define start/end times
dateValues = datenum(BodyData(:, 1), 'HH:MM:SS.FFF');
startTime = datenum('10:15:16.100', 'HH:MM:SS.FFF');
endTime = datenum('10:15:18.500', 'HH:MM:SS.FFF');
% select data in range, and where second column is 'BASE02'
BodyData(dateValues > startTime & dateValues < endTime & strcmp(BodyData(:, 2), 'BASE02'), :)
Returns:
ans =
'10:15:16.332' 'BASE02' 'bar'
'10:15:18.332' 'BASE02' 'foo'
References: datenum manual page, matlab help page on logical indexing.

Resources