Excel: Problems w. INDIRECT, Arrays, and Aggregate Functions (SUM, MAX, etc.)

Excel: Problems w. INDIRECT, Arrays, and Aggregate Functions (SUM, MAX, etc.) - arrays

Objective
I have a Microsoft Excel spreadsheet containing a price list that may change over time (B2:B5 in the example). Separately, I have a budget that too may change over time (D2). I am attempting to construct a formula for E2 to output the number of items that can be purchased with the budget in D2. Thereafter, I'll attempt to construct formulas to output any change that would be made (F2) and a comma-delimited list of purchasable items (G2).
Note: It unfortunately isn't possible to add an intermediate calculation column to the list, such as a running total. As such, I'm trying for formulas for single cells (i.e., E2, F2, and G2).
Note: I'm using Excel for Mac 2019.
A B C D E F G
+---------+---------+-----+---------+-------+---------+---------------------------+
1 | Label | Price | | Budget | Items | Change | Item(s) |
+---------+---------+-----+---------+-------+---------+---------------------------+
2 | Item #1 | $ 10.00 | | $ 40.00 | 3 | $ 4.50 | Item #1, Item #2, Item #3 |
+---------+---------+-----+---------+-------+---------+---------------------------+
3 | Item #2 | $ 20.00 | | | | | |
+---------+---------+-----+---------+-------+---------+---------------------------+
4 | Item #3 | $ 5.50 | | | | | |
+---------+---------+-----+---------+-------+---------+---------------------------+
5 | Item #4 | $ 25.00 | | | | | |
+---------+---------+-----+---------+-------+---------+---------------------------+
6 | Item #5 | $ 12.50 | | | | | |
+---------+---------+-----+---------+-------+---------+---------------------------+
For E2, I've attempted:
{=MAX(N(SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2)*ROW($B$2:$B$6)-MIN(ROW($B$2:$B$6))+1)}
Though, the above values and this formula result in an output of -1.
Note: The formula for F2 and G2 seemingly easily follow E2; e.g. {=$D2-SUM(IF((ROW($B$2:$B$6)-MIN(ROW($B$2:$B$6))+1)<=$E2,$B$2:$B$6,0))} and {=TEXTJOIN(", ",TRUE,INDIRECT("$A$2:$A$"&(MIN(ROW($B$2:$B$6))+$E2-1)))} seem to work well, respectively.
Observations
{="$B$2:$B$"&ROW($B$2:$B$6)} evaluates to {"$B$2:$B$2";"$B$2:$B$3";...;"$B$2:$B$6"} (as desired);
{=INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)) should evaluate to the equivalent of {{$B$2:$B$2},{$B$2:$B$3},...,{$B$2:$B$6}}; though, as a 1x5 multi-cell array formula, evaluates to the equivalent of {#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!} and, with F9 does to {10;#N/A;#N/A;#N/A;12.5};
{=SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2}, as a 1x5 multi-cell array formula, evaluates to the equivalent of {TRUE;TRUE;TRUE;FALSE;FALSE} (as desired); though, with F9 does to #VALUE!;
{=N(SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2)}, as a 1x5 multi-cell array formula, evaluates to the equivalent of 1;1;1;0;0 (as desired); though, with F9 does again to #VALUE!;
{=N(SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2)*ROW($B$2:$B$6), as as 1x5 multi-cell array formula, evaluates to the equivalent of {2,3,4,0,0} (as desired); though, with F9 does to {#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!};
{=N(SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2)*ROW($B$2:$B$6)-MIN(ROW($B$2:$B$6))+1}, as a 1x5 multi-cell array formula, evaluates to the equivalent of {1,2,3,-1,-1} (as desired); though, with F9 does again to {#VALUE!,#VALUE!,#VALUE!,#VALUE!,#VALUE!}; and,
{=MAX(N(SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2)*ROW($B$2:$B$6)-MIN(ROW($B$2:$B$6))+1)} evaluates to -1
Interestingly:
If {=N(SUM(INDIRECT("$B$2:$B$"&ROW($B$2:$B$6)))<=$D2)*ROW($B$2:$B$6)-MIN(ROW($B$2:$B$6))+1} is placed as the multi-cell array formula in, say, E10:E14, a =MAX($E$10:$E$14) results in 3 (as desired).
Speculation
At present, I'm speculating that, when entered as a single cell array formula, the INDIRECT is not being assessed to be array producing and/or the SUM, as part of a single cell array formula, is not producing an array result.
Please assist. And, thank you in advance.
Solutions (Thanks to Contributors Below)
For E2, {=IF($B$2<=$D2,MATCH(1,0/(MMULT(N(ROW($B$2:$B$6)>=TRANSPOSE(ROW($B$2:$B$6))),$B$2:$B$6)<=$D2)),0)} (thank you Jos Woolley);
For F2, =IF($E2=0,MAX(0,$D2),$D2-SUM($B$2:INDEX($B$2:$B$6,$E2))) (thank you P.b); and,
For G2, =IF($E2=0,"",TEXTJOIN(", ",TRUE,$A$2:INDEX($A$2:$A$6,$E2))) (thank you P.b).

The first point to make, as I mentioned in the comments, is that it must be understood that piecemeal evaluation of a formula - via highlighting subsections of that formula and committing with F9 within the formula bar - will not necessarily correspond to the actual evaluation.
Evaluation via F9 in the formula bar always forces that part to be evaluated as an array. Though this is misleading, since the overall construction may not actually evaluate that part as an array.
The second point to make is that SUM cannot iterate over an array of ranges, though SUBTOTAL, for example, can, so replacing SUM with SUBTOTAL (9, in your current formula should work.
However, you would still be left with a construction which is volatile, so I would recommend this non-volatile alternative:
=MATCH(1,0/(MMULT(N(ROW(B2:B6)>=TRANSPOSE(ROW(B2:B6))),B2:B6)<=D2))

In E2 you can use:
=MATCH(TRUE,--SUBTOTAL(9,OFFSET(B2:B6,,,ROW(B2:B6)))>=D2,0)
In F2 you can use:
=D2-SUM(B2:INDEX(B2:B6,E2))
In G2 you can use:
=TEXTJOIN(", ",1,A2:INDEX(A2:A6,E2))

Related

Excel: create an array with n occurences of a value x

I'm looking for a way to create an excel array with n occurences of an x value, n and x being vectors.
Desired behaviour :
|---------------------|------------------|------------------|
| occurences | value | result |
|---------------------|------------------|------------------|
| 3 | 4 | {4;4;4;1;1} |
|---------------------|------------------|------------------|
| 2 | 1 |
|---------------------|------------------|
This is a question similar to this one, except that I want one more dimension. I'm not interested in a VBA answer, I'm looking for a formula.
I've tried playing around with index and concatenation like in the answer to the previously linked question but with no luck until now.
This result will be used in a bigger formula that will sum the m greatest values (I already have that part figured and working, the m value is irrelevant here). You can consider this question as if the occurences are the storage amounts, and I want the sum of the m greatest individual values.

Here's another approach in O365:
=INDEX(B:B,MATCH(SEQUENCE(SUM(A1:A3),1,0),
MMULT(N(ROW(A1:A3)>=TRANSPOSE(ROW(A1:A3))),A1:A3)-A1:A3))
where you're looking up the row number of the output array in the running total of the input counts.
I think it could be modified to work over an arbitrary range but would then be a fairly long formula.
If the inputs aren't in the sheet but coming from an array formula, then still possible but it would be a very long formula.

=FILTERXML("<t><s>" & TEXTJOIN("</s><s>",TRUE,SEQUENCE(3,,4,0),SEQUENCE(2,,1,0)) & "</s></t>","//s")
will return: {4;4;4;1;1} which can be used as part of a larger formula.

make x in a cell equal 8 and total

I need an excel formula that will look at the cell and if it contains an x will treat it as a 8 and add it to the total at the bottom of the table. I have done these in the pass and I am so rusty that I cannot remember how I did it.

Generally, I try and break this sort of problem into steps. In this case, that'd be:
Determine if a cell is 'x' or not, and create new value accordingly.
Add up the new values.
If your values are in column A (for example), in column B, fill in:
=if(A1="x", 8, 0) (or in R1C1 mode, =if(RC[-1]="x", 8, 0).
Then just sum those values (eg sum(B1:B3)) for your total.
A | B
+---------+---------+
| VALUES | TEMP |
+---------+---------+
| 0 | 0 <------ '=if(A1="x", 8, 0)'
| x | 8 |
| fish | 0 |
+---------+---------+
| TOTAL | 8 <------ '=sum(B1:B3)'
+---------+---------+
If you want to be tidy, you could also hide the column with your intermediate values in.
(I should add that the way your question is worded, it almost sounds like you want to 'push' a value into the total; as far as I've ever known, you can really only 'pull' values into a total.)

Try this one for total sum:
=SUMIF(<range you want to sum>, "<>" & <x>, <range you want to sum>)+ <x> * COUNTIF(<range you want to sum>, <x>)

How to force Postgresql "cartesian product" behavior when unnest'ing multiple arrays in select?

Postgresql behaves strangely when unnesting multiple arrays in the select list:
select unnest('{1,2}'::int[]), unnest('{3,4}'::int[]);
unnest | unnest
--------+--------
1 | 3
2 | 4
vs when arrays are of different lengths:
select unnest('{1,2}'::int[]), unnest('{3,4,5}'::int[]);
unnest | unnest
--------+--------
1 | 3
2 | 4
1 | 5
2 | 3
1 | 4
2 | 5
Is there any way to force the latter behaviour without moving stuff to the from clause?
The SQL is generated by a mapping layer and it will be very much easier for me to implement the new feature I am adding if I can keep everything in the select.

https://www.postgresql.org/docs/10/static/release-10.html
Set-returning functions are now evaluated before evaluation of scalar
expressions in the SELECT list, much as though they had been placed in
a LATERAL FROM-clause item. This allows saner semantics for cases
where multiple set-returning functions are present. If they return
different numbers of rows, the shorter results are extended to match
the longest result by adding nulls. Previously the results were cycled
until they all terminated at the same time, producing a number of rows
equal to the least common multiple of the functions' periods.
(emphasis mine)

I tested with version 12.12 of Postgres and as mentioned by OP in a comment, having two unnest() works as expected when you move those to the FROM clause like so:
SELECT a, b
FROM unnest('{1,2}'::int[]) AS a,
unnest('{3,4}'::int[]) AS b;
Then you get the expected table:
a | b
--+--
1 | 3
1 | 4
2 | 3
2 | 4
As we can see, in this case the arrays have the same size.
In my case, the arrays come from a table. You can first name the name and then name column inside the unnest() calls like so:
SELECT a, b
FROM my_table,
unnest(col1) AS a,
unnest(col2) AS b;
You can, of course, select other columns as required. This is a normal cartesian product.

Excel - VLOOKUP to return each result in an array, not just the first

I am currently working between two workbooks.
In Workbook A I have the following data.
A ... D E F ... N
1.| ID | Name | Desc | Prod | Country|
2.| 12345 | Apple| Fruit| 10| US|
3.| 12346 | Celery| Veg| 150| US|
4.| 12347 | Mint| Herb| 25| FR|
I have been using the following formula from AHC in Workbook B, the aim is to perform a VLOOKUP which grabs all the ID's but only if the Country = "US".
=VLOOKUP("US", CHOOSE({2,1},Workbook A.xlsx!Table1[ID], Workbook A.xlsx!Table1[Country]), 2, FALSE)
This formula works well, however, my problem comes because the formula will only ever return the first instance in the array. For example, if I include this formula in Workbook B, Col A it will look like this:
A
1.|ID of US|
2.| 12345 |
3.| 12345 |
4.| 12345 |
5.| 12345 |
6.| 12345 |
7.| 12345 |
How would I make this formula work so that it returns each ID which matches "US", not just the first occurrence of a match?

In B2 put this formula:
You might need to adjust the rows of the ranges (I went till 100).
={ISERROR(INDEX(D$2:D$100,SMALL(IF(N$2:N$100=$A2,ROW(D1)),ROW(N1))),"")}
NOTE:
Step 1) Insert the formula only in Cell B2 without the {}
Step 2) Once the formula is inserted mark the entire formula and press Ctrl + Shift + Enter so the formula will get the {}
Step 3) Drag it down the rows as far as you need it to get the list.

Regression loop and store coefficients

I am going (1) to loop a regression over a certain criterion many times; and (2) to store a certain coefficient from each regression. Here is an example:
clear
sysuse auto.dta
local x = 2000
while `x' < 5000 {
xi: regress price mpg length gear_ratio i.foreign if weight < `x'
est sto model_`x'
local x = `x' + 100
}
est dir
I just care about one predictor, say mpg here. I want to extract coefficients of mpg from each result into one independent file (any file is OK, .dta would be great) to see if there is a trend as the threshold for weight increases. What I am doing now is to useestout to export the results, something like:
esttab * using test.rtf, replace se stats(r2_a N, labels(R-squared)) starl(* 0.10 ** 0.05 *** 0.01) nogap onecell title(regression tables)
estout will export everything and I need to edit the results. This works well for regressions with few predictors, but my real dataset has more than 30 variables and the regression will loop at least 100 times (I have a variable Distance with range from 0 to 30,000: it has the role of weight in the example). Therefore, it is really difficult for me to edit the results without making mistakes.
Is there any other efficient way to solve my problem? Since my case is not looping over a group variable, but over a certain criterion. the statsby function seems not working well here.

As #Todd has already suggested, you can just choose the particular results you care about and use postfile to store them as new variables in a new dataset. Note that a forval loop is more direct than your while code, while using xi: is superseded by factor variable notation in recent versions of Stata. (I have not changed that just in case you are using some older version.) Note evaluation of saved results such as _b[_cons] on the fly and the use of parentheses () to stop negative signs being evaluated. Some code examples elsewhere store results temporarily in local macros or scalars, which is quite unnecessary.
sysuse auto.dta, clear
tempname myresults
postfile `myresults' threshold intercept gradient se using myresults.dta
quietly forval x = 2000(200)4800 {
xi: regress price mpg length gear_ratio i.foreign if weight < `x'
post `myresults' (`x') (`=_b[_cons]') (`=_b[mpg]') (`=_se[mpg]')
}
postclose `myresults'
use myresults
list
+---------------------------------------------+
| thresh~d intercept gradient se |
|---------------------------------------------|
1. | 2000 -3699.55 -296.8218 215.0348 |
2. | 2200 -4175.722 -53.19774 54.51251 |
3. | 2400 -3918.388 -58.83933 42.19707 |
4. | 2600 -6143.622 -58.20153 38.28178 |
5. | 2800 -11159.67 -49.21381 44.82019 |
|---------------------------------------------|
6. | 3000 -6636.524 -51.28141 52.96473 |
7. | 3200 -7410.392 -58.14692 60.55182 |
8. | 3400 -2193.125 -57.89508 52.78178 |
9. | 3600 -1824.281 -103.4387 56.49762 |
10. | 3800 -1192.767 -110.9302 51.6335 |
|---------------------------------------------|
11. | 4000 5649.41 -173.9975 74.51212 |
12. | 4200 5784.363 -147.4454 71.89362 |
13. | 4400 6494.47 -93.81158 80.81586 |
14. | 4600 6494.47 -93.81158 80.81586 |
15. | 4800 5373.041 -95.25342 82.60246 |
+---------------------------------------------+
statsby (a command, not a function) is just not designed for this problem at all, so it is not a question of whether it works well.

I would suggest you look at help postfile for an example of how to aggregate the results. I agree that statsby may not be the best approach. Evaluating the interaction between mpg and weight on price may help address what would seem to be a classic question of interaction.