Accumulated value sum with Arrayformula - arrays

I have a row of values B2:F2 I want to SUM like i did in B3:F3 but with the use of Arrayformula.
formulas in row 3 with locked $B column:
Month
Jan
Feb
Mar
Apr
May
Value
15,106
15,559
10,875
21,679
18,118
Simple Cell formula
=SUM($B2:B2)
=SUM($B2:C2)
=SUM($B2:D2)
=SUM($B2:E2)
=SUM($B2:F2)
Progress: I tried this formula but it outputs the SUM of the entire range B2:F2 at once in the entire range B4:F4.
=ArrayFormula(IF(B2:F2="",,SUM(B2:$F2)))
Month
Jan
Feb
Mar
Apr
May
Value
15,106
15,559
10,875
21,679
18,118
Progress
=ArrayFormula(IF(B2:F2="",,SUM(B2:$F2)))
81,336
81,336
81,336
81,336
What is the best formula to get the same result in B3:F3 but using Arrayformula?
Make a copy of the example sheet.
Update
When tring to roll forward i discoverd the case when the value row cell are empty, like this in column J, if possible address this case in the answer

standard transposed running total fx will do:
=INDEX(TRANSPOSE(MMULT(TRANSPOSE((SEQUENCE(5)<=SEQUENCE(1, 5))*
FLATTEN(B2:F2)), SEQUENCE(5, 1, 1, 0))))
fully dynamic and maximally lightweight:
=INDEX(IF(C2:2="",,TRANSPOSE(MMULT(TRANSPOSE((
SEQUENCE( MAX(COLUMN(C2:2)*(C2:2<>""))-COLUMN(C2)+1)<=
SEQUENCE(1, MAX(COLUMN(C2:2)*(C2:2<>""))-COLUMN(C2)+1))*
FLATTEN(INDIRECT("C2:"&ADDRESS(2, MAX(COLUMN(C2:2)*(C2:2<>"")))))),
SEQUENCE( MAX(COLUMN(C2:2)*(C2:2<>""))-COLUMN(C2)+1, 1, 1, 0)))))

A simple way to calculate cumulative sum:
=ArrayFormula(IF(B1:1="",,SUMIF(COLUMN(B2:2),"<="&COLUMN(B2:2),B2:2)))

=arrayformula(mmult(if(isblank(B2:F2),0,B2:F2),if(column(B2:F2)>=transpose(column(B2:F2)),1,0)))
can produce a running sum in a row vector and can accommodate empty entries in the input range.
If you want to auto detect the number of columns in the input range, you can
replace B2:F2 with array_constrain(B2:2,1,max(arrayformula(if(isblank(B2:2),,column(B2:2))))-1) and
replace column(B2:F2) with array_constrain(column(B2:2),1,max(arrayformula(if(isblank(B2:2),,column(B2:2))))-1)
which is to say, cut the range leaving the number of rows that is the max column index of occupied cells in our range; minus 1 because we started with column 2.
(Also, as long as there is one arrayformula wrapping the whole formula, you can omit them in the nested inputs, as long as you preserve the () brackets.)
Nonetheless, there would be a (computational) efficiency concern.
In order to centralize the formula, in the above solution, we first created a filter for each desired entry in our running sum vector, 1,0,0,... for 1st entry, 1,1,0,... for 2nd entry, 1,1,1,0,... for 3rd, etc. And then, effectly, we apply a sum(filter(...)) via multiply by 1 or 0 using mmult. The array creation costs extra. The multiplication costs extra. And compared to iterated formulas that mutates cell by cell, we are not saving the multiply by 0 parts.
It may not end up being more than double or triple the runtime compared to iterated formulas. And you can experiment case by case. Small scale application is always fine. But for larger datasets, computational efficiency is something to keep in mind whenever we introduce extra computational steps, and potentially squaring the original amount when using mmult solutions.

Related

Google Sheets: Average of every other column

I’ve looked at similar questions and I think I’m close to a working solution, but it’s giving me the wrong answer. I have a spreadsheet in Google Sheets with data in all columns, but every other cell contains a dollar value and I need only the average of those cells. They start (in this version) on cell G3 and continue through most of row 3, then I intend to copy the formula to other rows with the same cells in those rows needing to be averaged as well, so if it’ll adjust as I copy that’ll be best. Here’s what I’ve worked up so far:
=AVERAGEIF(ArrayFormula(mod(column(G3:3),2)),”>0”)
It’s returning 1 as the result, when it should be about 1500. If I change the 2 to another number, the result increases with it, so I think something in mod or column is being done wrong, but I don’t have enough practice to know where I messed up.
avg of every 2nd column it's done like:
=AVERAGE(FILTER(G3:3, MOD(COLUMN(G3:3)-1, 2)=0))
TIL about the FILTER function. Thanks guys.
There is a way with ArrayFormula. I think you almost got it. I would prefer to add one more argument for AVERAGEIF to specify the range to average. I don't know if it defaults to the range in the condition.
And I would do the modulo on the difference between a column and the first column. I guess for your question it isn't needed as all column numbers are either odd or even. But using the difference is a general purpose way to apply the concept to say every nth column.
The modulo 2 of any column number with be 0 or 1. So instead of using an inequality just use 0 or 1. From your formula it looks like your dollar values must be in odd columns so the result of the modulo should be 1. But I think if you are starting at G3 then taking the column difference before applying Mod 2 changes the desired result to 0. But of course to switch to the other column change the 0 to a 1.
=AVERAGEIF(ArrayFormula(mod((COLUMN(G3:3)-COLUMN(G3)),2)),0,G3:3)

Google Sheets - new starting point for an array

In Google Sheets, running an array to transpose raw data into a readable format. The array forumula runs off of a row of data and am using the array to transpose the data into multiple rows. The issue is that for every additional row of raw data, I create five new rows of transposed data so the array formula breaks. Trying to make the array formula flex to added rows of data - any thought would be appreciated!
https://docs.google.com/spreadsheets/d/16GOsH-EUDm2IeRgUgEQ8LBEltQASfJ8hR6GXh-pqNcM/edit?usp=sharing
Rather than use an array formula, you can use an indirect formula and modulo arithmetic and integer division to do this. In k2 put the formula =indirect("A"&(3+FLOOR((row()-2)/5))) which will get the date from column A row 3 until you get down to row 7, when it starts taking from row 4. This formula can be dragged down and will jump rows every 5 as desired.
Similarly for L2 place in =indirect("B"&(3+FLOOR((row()-2)/5))) which can also be dragged down (copied) the column.
For column m we need to cycle through c2,d2,e2,f2, and g2, so we need modulo (clock) arithmentic. So in m2 place the formula =indirect(char(CODE("C")+(mod(ROW()-2,5)))&"2"). Finally for quantity we need to do both the column cycling and the incrementing rows every 5, so the formula for n2 is =indirect(char(CODE("C")+(mod(ROW()-2,5)))&(3+FLOOR((row()-2)/5))). That too can get copied on down. That should do it. There could definitely be more elegant ways.
I think not more elegant, but preserving the array formula would be =3+floor((row()-2)/5) in j2 and drag on down, and then adapt the array formula to: ={INDIRECT("A" & J2),INDIRECT("B"&J2),$C$2,INDIRECT("C"&J2);INDIRECT("A" & J2),INDIRECT("B"&J2),$D$2,INDIRECT("D"&J2);INDIRECT("A" & J2),INDIRECT("B"&J2),$E$2,INDIRECT("E"&J2);INDIRECT("A" & J2),INDIRECT("B"&J2),$F$2,INDIRECT("F"&J2);INDIRECT("A" & J2),INDIRECT("B"&J2),$G$2,INDIRECT("G"&J2)} which can be copied to 5 rows below it where the J2's become J7's and you are processing the next desired row.

Excel - rolling average across columns without zeros, no array

I am currently using this formula to calculate a rolling average score across 12 columns for either the last 3 or 6 months.
=SUM(SUMIFS($E$54:P54,$E$54:P54,LARGE(IF($E$54:P54>0,$E$54:P54),{1,2,3})))
This is an array formula and is entered via CTRL + SHIFT + ENTER.
The problem now is that I need to deploy my work book on older machines and those being ancient office computers (we are talking windows XP and Office 2003...), I find that the array is killing the entire workbook. Now, I have already taken steps to speed up the workbook via VBA (disabling events, manual formula calc, etc.), but I need a way to convert the above array formula into a non-array formula which is NOT counting zeros or empty cells as part of the average.
I tried this below but couldnt get it to work with the zeros / empty cells.
=SUM(OFFSET($E68,0,COUNT($E68:$P68)-IF(COUNT($E68:$P68)>3,3,COUNT($E68:$P68)),1,IF(COUNT($E68:$P68)>3,3,COUNT($E68:$P68))))
Picture of the sample data attached below.
Since an Average is a "Sum" divided by a "Count", this can be accomplished using the simplicity of non-Array formulas.This formula avoids the inclusion of both zeros and blank cells:
=IF(SUM(A2:C2)>0,SUM(A2:C2)/(COUNT(A2:C2)-COUNTIF(A2:C2,0)),0).
If the row is filled with only zero values, then it shows the actual numerical average as zero, which is correct.If you want to avoid the display of zero averages in cells, then use this slightly different formula:
=IF(SUM(A3:C3)>0,SUM(A3:C3)/(COUNT(A3:C3)-COUNTIF(A3:C3,0)),"").
Of course, you will need to adjust the cell ranges for the 6 month and YTD averages; these formulas deal with 3- month ranges.
For last 3-Month range average: =SUM(OFFSET($A3,0,COUNT($A3:$L3)-IF(COUNT($A3:$L3)>3,3,COUNT($A3:$L3)),1,IF(COUNT($A3:$L3)>3,3,COUNT($A3:$L3))))/(COUNT(OFFSET($A3,0,COUNT($A3:$L3)-IF(COUNT($A3:$L3)>3,3,COUNT($A3:$L3)),1,IF(COUNT($A3:$L3)>3,3,COUNT($A3:$L3))))-COUNTIF(OFFSET($A3,0,COUNT($A3:$L3)-IF(COUNT($A3:$L3)>3,3,COUNT($A3:$L3)),1,IF(COUNT($A3:$L3)>3,3,COUNT($A3:$L3))),0))
For last 6-month average: SUM(OFFSET($A2,0,COUNT($A2:$L2)-IF(COUNT($A2:$L2)>6,6,COUNT($A2:$L2)),1,IF(COUNT($A2:$L2)>6,6,COUNT($A2:$L2))))/(COUNT(OFFSET($A2,0,COUNT($A2:$L2)-IF(COUNT($A2:$L2)>6,6,COUNT($A2:$L2)),1,IF(COUNT($A2:$L2)>6,6,COUNT($A2:$L2))))-COUNTIF(OFFSET($A2,0,COUNT($A2:$L2)-IF(COUNT($A2:$L2)>6,6,COUNT($A2:$L2)),1,IF(COUNT($A2:$L2)>6,6,COUNT($A2:$L2))),0))
These are quite lengthy, but they exclude the zero value cells from your original non-Array formula.The data was assumed to begin in row A2:L2.

Get Row and Column Totals from 2d Array in Excel

I wanted to know how to get row and column totals from a 2D array in Excel. This is a fairly common thing to do but I couldn't find an answer to it by searching on row and column totals so I thought it would be worth posting it as a question.
Supposing I wanted to find the lowest column total and highest row total in the following array which is in cells A1:D3:-
1 2 3 4
5 6 7 8
9 10 11 12
my initial thoughts were along the lines of
=min(A1:D3*(column(A1:D3)={1,2,3,4}))
but this kind of simple approach doesn't work. I remembered reading that you had to use mmult in some of these situations and have seen advanced formulae using them but couldn't quite remember how. I shall try and answer my own question but other suggestions are more than welcome.
You can do it with MMULT as you mentioned. The following should work with your setup:
Smallest column
=MIN(MMULT({1,1,1},A1:D3))
Largest row:
=MAX(MMULT(A1:D3,{1;1;1;1}))
Note how many 1s in the array - for the rows calc you need a 1 for each column (i.e. 3) and vica versa for columns. Also note the order of the arrays - it won't work the other way around
Yes you have to mmult to deliver either a column array or row array containing the required totals, then use can use MAX, MIN or any other aggregate function to get the value you require.
Column totals
=MIN(MMULT(TRANSPOSE(ROW(A1:D3))^0,A1:D3))
Row Totals
=MAX(MMULT(A1:D3,TRANSPOSE(COLUMN(A1:D3))^0))
So the idea is that you create a single-row array {1,1,1} and multiply it by the 2D array to end up with an array {15,18,21,24} and take the minimum value from it.
Or create a single-column array {1;1;1;1} and multiply the original array by it to end up with an array {10;26;42} from which you get the maximum value.
Remember that mmult works like the matrix multiplication you might have learned at college where for each cell it works across the cells in the corresponding row of the first array and down the cells of the corresponding column in the second array multiplying each pair and adding them to the total. So the number of columns in the first array must always equal the number of rows in the second array.
These are, as #Scott Craner reminds me, array formulae that have to be entered with
Ctrl Shift Enter

Optimization of array function that calculates products

I have the following array formula that calculates the returns on a particular stock in a particular year:
=IF(AND(NOT(E2=E3),H2=H3),PRODUCT(IF($E$2:E2=E1,$O$2:O2,""))-1,"")
But since I have 500,000 row entries as soon as I hit row 50,000 I get an error from Excel stating that my machine does not have enough resources to compute the values.
How shall I optimize the function so that it actually works?
E column refers to a counter to check the years and ticker values of stocks. If year is different from the previous value the function will output 1. It will also output 1 when the name of stock has changed. So for example you may have values for year 1993 and the next value is 1993 too but the name of stock is different, so clearly the return should be calculated anew, and I use 1 as an indication for that.
Then I have another column that runs a cumulative sum of those 1s. When a new 1 in that previous column is encountered I add 1 to the running total and keep printing same number until I observe a new one. This makes possible use of the array function, if the column that contains running total values (E column) has a next value that is different from previous I use my twist on SUMIF but with PRODUCT IF. This will return the product of all the corresponding running total E column values.
The source of the inefficiency, I believe, is in the steady increase with row number of the number of cells that must be examined in order to evaluate each successive array formula. In row 50,000, for example, your formula must examine cells in all the rows above it.
I'm a big fan of array formulas, so it pains me to say this, but I wouldn't do it this way. Instead, use additional columns to compute, in each row, the pieces of your formula that are needed to return the desired result. By taking that approach, you're exploiting Excel's very efficient recalculation engine to compute only what's needed.
As for the final product, compute that from a cumulative running product in an auxiliary column, and that resets to the value now in column O when column P in the row above contains a number. This approach is much more "local" and avoids formulas that depend on large numbers of cells.
I realize that text is not the best language for describing this, and my poor writing skills might be adding to the challenge, so please let me know if more detail is needed.
Interesting problem, thanks.
Could I suggest a really quick and [very] dirty vba? Something like the below. Obviously, have a backup of your file before running this. This assumes you want to start calculating from row 13.
Sub calculateP()
'start on row 13, column P:
Cells(13, 16).Select
'loop through every row as long as column A is populated:
Do
If ActiveCell(1, -14).Value = "" Then Exit Do 'column A not populated so exit loop
'enter formula:
Selection.FormulaR1C1 = _
"=IF(AND(NOT(RC[-11]=R[1]C[-11]),RC[-8]=R[1]C[-8]),PRODUCT(IF(R[-11]C5:RC[-11]=R[-1]C[-11],R2C15:RC[-1],""""))-1,"""")"
'convert cell value to value only (remove formula):
ActiveCell.Value = ActiveCell.Value
'select next row:
ActiveCell(2, 1).Select
Loop
End Sub
Sorry, this is definitely not a great answer for you... in fact, even this method could be achieved more elegantly using range... but, the quick and dirty approach may help you in the interim ??

Resources