Google Sheets SumIfs with left formula - arrays

I want to use the sumifs formula, but the sum interval range has text in it.
Example:
|Criteria|Sum Interval|
|--------|------------|
| A | 1 - Good |
| A | 2 - Regular|
| C | 3 - Bad |
So, I want to check the criteria field and, when met, sum the first character of the Sum Interval. I tried something like this:
= sumifs( arrayformula(left(suminterval, 1)) , criteria, 'A')
In this case, the formula should return 3 (1 + 2)
arrayformula(left(suminterval, 1)) = interval with only first character
This work when used alone, but when I use it as an argument, a receive a message saying that the argument must be a range.
Ps: The hole solution has to be in an only formula.

try:
=INDEX(QUERY({A2:A, REGEXEXTRACT(B2:B, "\d+")*1}, "select sum(Col2) where Col1 = 'A'"), 2)

Related

Counting aggregated and then getting average

So, I am sure it's got to be simple, but, I cannot get Google Data Studio to provide an average of two aggregated columns. Example:
+----------+----------+
| Column 1 | Column 2 |
+----------+----------+
| A | TRUE |
| A | FALSE |
| B | TRUE |
| C | FALSE |
| A | TRUE |
| C | TRUE |
| B | FALSE |
| B | TRUE |
+----------+----------+
How can you get a count of the total value of A in Column 1 and divide it by the total number of TRUE in Column 2? I have tried Count(Column 1)/Count (Column 2) but it gives me the totals for the other values as well.
I have tried creating a new field with a CASE statement, but there is an error when trying to divide the two resulting CASE WHEN values:
CASE
WHEN Column 1 = A THEN 1
ELSE 0
END
The below shows two approaches of achieving the required calculation:
Approach 1: Ratio Metrics
Using Scorecards, Filters and Ratio Metrics:
1) Column 1 (A) Scorecard
- Add a Scorecard;
- Drag and Drop the field Column 1 onto the Metric field and change the aggregation to COUNT;
- Create and apply the Filter: Include Column 1 RegExp Match A
2) Column 2 (TRUE) Scorecard
- Add a Scorecard;
- Drag and Drop the field Column 2 onto the Metric field and change the aggregation to COUNT;
- Create and apply the Filter: Include Column 2 RegExp Match TRUE
3) Ratio Metric
- Select both Scorecards: Click on the Column 1 (A) Scorecard and then Ctrl + Click on the Column 2 (TRUE) Scorecard Scorecard;
- Blend Data: Right click on one of the selected Scorecards and select Blend data from the Drop-down.
Google Data Studio Report to demonstrate, as well as a GIF showing the process:
Approach 2: CASE Statements
An approach with CASE statements (create formula #1 and #2 at the Data Source-level; formula 3 can be created at the Data Source-level or Chart-level if required):
1) Column 1 (A)
CASE
WHEN REGEXP_MATCH(Column 1, "A") THEN "A"
ELSE NULL
END
2) Column 2 (TRUE)
CASE
WHEN REGEXP_MATCH(Column 2, "TRUE") THEN "TRUE"
ELSE NULL
END
3) Column 1 (A) / Column 2 (TRUE)
COUNT(Column 1 (A)) / COUNT(Column 2 (TRUE))
Added a New Page to the Google Data Studio Report to demonstrate as well as a GIF showing the process above:

Extract last two elements of an array in HIVE

I have an array in a hive table, and I want to extract the two last elements of each array, something like this:
["a", "b", "c"] -> ["b", "c"]
I tried a code like this:
SELECT
*,
array[size] AS term_n,
array[size - 1] AS term_n_1
FROM
(SELECT *, size(array) AS size FROM MyTable);
But it didn't work, someone has any idea please?
array is a reserved word and should be qualified.
An inner sub-query should be aliased.
Array index start with 0. If the array size is 5 then the last index is 4.
Demo
with MyTable as (select array('A','B','C','D','E') as `array`)
SELECT *
,`array`[size - 1] AS term_n
,`array`[size - 2] AS term_n_1
FROM (SELECT *
,size(`array`) AS size
FROM MyTable
) t
;
+-----------------------+--------+--------+----------+
| t.array | t.size | term_n | term_n_1 |
+-----------------------+--------+--------+----------+
| ["A","B","C","D","E"] | 5 | E | D |
+-----------------------+--------+--------+----------+
I don't know the error that you are getting, but it should be something like
select
yourarray[size(yourarray)],
yourarray[size(yourarray)-1]
from mytable
This is a solution to extract the last element of an array in the same query (notice it is not very optimal, and you can apply the same principle to extract n last elements of the array), the logic is to calculate the size of the last element (amount of letters minus the separator character) and then make a substring from 0 to the total size minus the calculated amount of characters to extract
Table of example:
col1 | col2
--------------
row1 | aaa-bbb-ccc-ddd
You want to get (extracting the last element, in this case "-ddd"):
row1 | aaa-bbb-ccc
the query you may need:
select col1, substr(col2,0,length(col2)-(length(reverse(split(reverse(col2),'-')[0]))+1)) as shorted_col2_1element from example_table
If you want to add more elements you have to keep adding the positions in the second part of the operation.
Example to extract the last 2 elements:
select col1, substr(col2,0,length(col2)-(length(reverse(split(reverse(col2),'-')[0]))+1) + length(reverse(split(reverse(col2),'-')[1]))+1)) as shorted_col2_2element from example_table
after executing this second command line you will have something like:
row1 | aaa-bbb
*As said previously this is a not optimal solution at all, but may help you

sum divisor returning 0

I have this table that i will populate with random figures:
|--week--||-2016-||-2017-|
| 1 || 26734||6314916|
| 2 || 64565||9876768|
| 3 || 32243||9976757|
what I want to do is create a fourth column that is basically a variance of these numbers.
I created the script below which I know works as I had created it from for another table, there's no difference between these tables apart from the figures in them.
select CAST(ROUND(sum (([2017]) /(([2016]))-1)*100, 0) as NUMERIC(36,0)) as [variance%]
from table2
I get the below
|--week--||-2016-||-2017-| variance%
| 1 || 26734||6314916|0
| 2 || 64565||9876768|0
| 3 || 32243||9976757|0
why am i getting zeros when the other table i had delivers the actual results for variances?
All you need to do is mould your INT into float value. Something like this :
select
CAST(ROUND(sum (([2017]*1.0) /(([2016]))-1)*100, 0) as NUMERIC(36,0)) as [variance%]
read this stack exchange answer to know more about this.

Excel Lookup IP addresses in multiple ranges

I am trying to find a formula for column A that will check an IP address in column B and find if it falls into a range (or between) 2 addresses in two other columns C and D.
E.G.
A B C D
+---------+-------------+-------------+------------+
| valid? | address | start | end |
+---------+-------------+-------------+------------+
| yes | 10.1.1.5 | 10.1.1.0 | 10.1.1.31 |
| Yes | 10.1.3.13 | 10.1.2.16 | 10.1.2.31 |
| no | 10.1.2.7 | 10.1.1.128 | 10.1.1.223 |
| no | 10.1.1.62 | 10.1.3.0 | 10.1.3.127 |
| yes | 10.1.1.9 | 10.1.4.0 | 10.1.4.255 |
| no | 10.1.1.50 | … | … |
| yes | 10.1.1.200 | | |
+---------+-------------+-------------+------------+
This is supposed to represent an Excel table with 4 columns a heading and 7 rows as an example.
I can do a lateral check with
=IF(AND((B3>C3),(B3 < D3)),"yes","no")
which only checks 1 address against the range next to it.
I need something that will check the 1 IP address against all of the ranges. i.e. rows 1 to 100.
This is checking access list rules against routes to see if I can eliminate redundant rules... but has other uses if I can get it going.
To make it extra special I can not use VBA macros to get it done.
I'm thinking some kind of index match to look it up in an array but not sure how to apply it. I don't know if it can even be done. Good luck.
Ok, so I've been tracking this problem since my initial comment, but have not taken the time to answer because just like Lana B:
I like a good puzzle, but it's not a good use of time if i have to keep guessing
+1 to Lana for her patience and effort on this question.
However, IP addressing is something I deal with regularly, so I decided to tackle this one for my own benefit. Also, no offense, but getting the MIN of the start and the MAX of the end is wrong. This will not account for gaps in the IP white-list. As I mentioned, this required 15 helper columns and my result is simply 1 or 0 corresponding to In or Out respectively. Here is a screenshot (with formulas shown below each column):
The formulas in F2:J2 are:
=NUMBERVALUE(MID(B2,1,FIND(".",B2)-1))
=NUMBERVALUE(MID(B2,FIND(".",B2)+1,FIND(".",B2,FIND(".",B2)+1)-1-FIND(".",B2)))
=NUMBERVALUE(MID(B2,FIND(".",B2,FIND(".",B2)+1)+1,FIND(".",B2,FIND(".",B2,FIND(".",B2)+1)+1)-1-FIND(".",B2,FIND(".",B2)+1)))
=NUMBERVALUE(MID(B2,FIND(".",B2,FIND(".",B2,FIND(".",B2)+1)+1)+1,LEN(B2)))
=F2*256^3+G2*256^2+H2*256+I2
Yes, I used formulas instead of "Text to Columns" to automate the process of adding more information to a "living" worksheet.
The formulas in L2:P2 are the same, but replace B2 with C2.
The formulas in R2:V2 are also the same, but replace B2 with D2.
The formula for X2 is
=SUMPRODUCT(--($P$2:$P$8<=J2)*--($V$2:$V$8>=J2))
I also copied your original "valid" set in column A, which you'll see matches my result.
You will need helper columns.
Organise your data as outlined in the picture.
Split address, start and end into columns by comma (ribbon menu Data=>Text To Columns).
Above the start/end parts, calculate MIN FOR START, and MAX FOR END for all split text parts (i.e. MIN(K5:K1000) .
FORMULAS:
VALIDITY formula - copy into cell D5, and drag down:
=IF(AND(B6>$I$1,B6<$O$1),"In",
IF(OR(B6<$I$1,B6>$O$1),"Out",
IF(B6=$I$1,
IF(C6<$J$1, "Out",
IF( C6>$J$1, "In",
IF( D6<$K$1, "Out",
IF( D6>$K$1, "In",
IF(E6>=$L$1, "In", "Out"))))),
IF(B6=$O$1,
IF(C6>$P$1, "Out",
IF( C6<$P$1, "In",
IF( D6>$Q$1, "Out",
IF( D6<$Q$1, "In",
IF(E6<=$R$1, "In", "Out") )))) )
)))

SPSS: using IF function with REPEAT when each case has multiple linked instances

I have a dataset as such:
Case #|DateA |Drug.1|Drug.2|Drug.3|DateB.1 |DateB.2 |DateB.3 |IV.1|IV.2|IV.3
------|------|------|------|------|--------|---------|--------|----|----|----
1 |DateA1| X | Y | X |DateB1.1|DateB1.2 |DateB1.3| 1 | 0 | 1
2 |DateA2| X | Y | X |DateB2.1|DateB2.2 |DateB2.3| 1 | 0 | 1
3 |DateA3| Y | Z | X |DateB3.1|DateB3.2 |DateB3.3| 0 | 0 | 1
4 |DateA4| Z | Z | Z |DateB4.1|DateB4.2 |DateB4.3| 0 | 0 | 0
For each case, there are linked variables i.e. Drug.1 is linked with DateB.1 and IV.1 (Indicator Variable.1); Drug.2 is linked with DateB.2 and IV.2, etc.
The variable IV.1 only = 1 if Drug.1 is the case that I want to analyze (in this example, I want to analyze each receipt of Drug "X"), and so on for the other IV variables. Otherwise, IV = 0 if the drug for that scenario is not "X".
I want to calculate the difference between DateA and DateB for each instance where Drug "X" is received.
e.g. In the example above I want to calculate a new variable:
DateDiffA1_B1.1 = DateA1 - DateB1.1
DateDiffA1_B2.1 = DateA1 - DateB2.1
DateDiffA1_B1.3 = DateA1 - DateB1.3
DateDiffA1_B2.3 = DateA1 - DateB2.3
DateDiffA1_B3.3 = DateA1 - DateB3.3
I'm not sure if this new variable would need to be linked to each instance of Drug "X" as for the other variables, or if it could be a single variable that COUNTS all the instances for each case.
The end goal is to COUNT how many times each case had a date difference of <= 2 weeks when they received Drug "X". If they did not receive Drug "X", I do not want to COUNT the date difference.
I will eventually want to compare those who did receive Drug "X" with a date difference <= 2 weeks to those who did not, so having another indicator variable to help separate out these specific patients would be beneficial.
I am unsure about the best way to go about this; I suspect it will require a combination of IF and REPEAT functions using the IV variable, but I am relatively new with SPSS and syntax and am not sure how this should be coded to avoid errors.
Thanks for your help!
EDIT: It seems like I may need to use IV as a vector variable to loop through the linked variables in each case. I've tried the syntax below to no avail:
DATASET ACTIVATE DataSet1.
vector IV = IV.1 to IV.3.
loop #i = .1 to .3.
do repeat DateB = DateB.1 to DateB.3
/ DrugDateDiff = DateDiff.1 to DateDiff.3.
if IV(#i) = 1
/ DrugDateDiff = datediff(DateA, DateB, "days").
end repeat.
end loop.
execute.
Actually there is no need to add the vector and the loop, all you need can be done within one DO REPEAT:
compute N2W=0.
do repeat DateB = DateB.1 to DateB.3 /IV=IV.1 to IV.3 .
if IV=1 and datediff(DateA, DateB, "days")<=14 N2W = N2W + 1.
end repeat.
execute.
This syntax will first put a zero in the count variable N2W. Then it will loop through all the dates, and only if the matching IV is 1, the syntax will compare them to dateA, and add 1 to the count if the difference is <=2 weeks.
if you prefer to keep the count variable as missing when none of the IV are 1, instead of compute N2W=0. start the syntax with:
If any(1, IV.1 to IV.3) N2W=0.

Resources